Skip to main content

MCU

The microcontroller unit (MCU) is at the core of any embedded solution and there are a wide variety of options in terms of both cost and features.

When selecting an MCU for graphics, it is important to consider the supported display interfaces, the MCU package size, and the achievable graphics performance, which depends on two main factors:

Image composition

  • The availability of graphics accelerators integrated in the MCU.
  • The availability of cache memory in the system.

Memory access and bandwidth

  • The clock frequency and the subsystem bus frequency.
  • The access to the internal flash and RAM memories.

It is also important to consider other aspects of the application, such as e.g. motor control or wireless communication, which run in addition to the graphics. These factors can influence the choice of MCU.

This page will explore the various MCU options and the parameters to consider when selecting an STM32 MCU for a GUI-driven application.

STM32 MCUs for graphics

For a complete overview of the STM32 MCUs listed above, including information about internal memory and peripherals, see the STM32 MCU portfolio.

Further reading

Frequency

The core frequency has a major impact on the performance of a graphical application in terms of screen refresh, fluidity of screens, and animations.

It impacts the amount of data that can be transferred from an internal or external memory to the display framebuffer and also the calculations and animations possible.

The higher the frequency, the more data it is possible to transfer within a given timeframe and the more complex animations can be made.

The core frequency of STM32 MCUs is up to 800MHz.

Note
The higher the frequency, the greater the power consumption.

Graphic Subsystem Frequency

It is important to differentiate the core CPU frequency from the graphic subsystem frequency. The graphic subsystem frequency includes the frequency of the internal busses, the frequency of the graphics accelerator as well as the access speed of the internal and external memories.

The graphic subsystem frequency also has a major impact on the overall graphic performance.

An example of how to calculate the performance of a graphics subsystem can be found in this article. The article focuses on framebuffers in external RAM, but the same procedure can be applied to internal RAM as well.

Embedded Hardware Acceleration Features

Different STM32 MCUs have different built-in hardware acceleration features that help in achieving high performing graphics applications.

NeoChrom GPU

NeoChrom GPU can hardware accelerate some graphical operations such as texture mapping, scaling and vector rendering. It is also known as GPU2D.

NeoChrom GPU also comes in a version called NeoChromVG GPU, which can further accelerate vector rendering.

For a detailed description of NeoChrom GPU and its capabilities, visit the article about TouchGFX on NeoChrom/NeoChromVG.

Chrom-ART

Chrom-ART is an advanced DMA that aids in doing graphical operations. It is also known as DMA2D.

The Chrom-ART accelerator, integrated in many STM32 platforms, is able to manipulate and transfer images without CPU load. It has the capability to accelerate the majority of the graphic operations, such as color filling, image copying, blending, and pixel format conversions.

The Chrom-ART accelerator is able to perform blending of two layers and convert the initial pixel formats to the desired output pixel format and transfer the result to the memory destination in only one operation.

The Chrom-ART accelerator also supports color formats with color look up tables (CLUT). This can help with saving memory.

Example of an application running on the STM32F469-EVAL board where the CPU load is decreased from 82% to 4% when the Chrom-ART is enabled:

Bird-Eat-Coin Chrom-ART example

In addition, the capability to convert from YCbCr format to RGB format is added with STM32H7 products to the Chrom-ART peripheral. This feature, combined with the JPEG hardware codec can offload the CPU when encoding and decoding JPEG images.

YCbCr to RGB Hardware performance

The Chrom-ART accelerator, with the features listed above, offers a huge advantage for graphical applications. If available in the chosen MCU, TouchGFX handles all Chrom-ART features and redirects all possible drawing operations to the Chrom-ART peripheral instead of the CPU.

The Chrom-ART peripheral is available with high performance STM32 families.

Further reading
  • Refer to AN4943 application note for more information:, Chrom-ART Hardware acceleration.
  • JPEG Hardware Codec

    Some STM32 families provide a hardware JPEG codec to encode and decode images and videos.

    This feature is important if the UI application needs to play a video file or display JPEG images.

    JPEG images generally take up less memory. The JPEG hardware codec ensures that the images can be decoded at runtime without CPU overload.

    Some TouchGFX demos utilizes the JPEG hardware codec, offloading the CPU while playing an MJPEG video.

    Hardware JPEG codec performance

    Further reading
  • Refer to AN4996 application note for more information: Hardware JPEG codec.
  • Chrom-GRC

    The STM32 Chrom-GRC™ (GFXMMU) is a peripheral in some STM32 microcontrollers that aims to efficiently support the emerging trend towards non-rectangular displays.

    The Chrom-GRC™ peripheral enables applications to reduce the amount of RAM needed for storing the framebuffer when addressing non-rectangular displays.

    In the case of a round display, the peripheral reduces the memory requirements by 20%.

    The Chrom-GRC™ peripheral is not mandatory when controlling non-square screens, but it is recommended.

    Chrom-GRC™ is also utilized when using emulated framebuffer.

    Memory optimization with Chrom-GRC peripheral

    Further reading
  • Refer to AN5051 application note for more information: Graphic memory optimization.
  • Internal Flash

    A graphical user interface application using bitmap resources needs non-volatile memory to store the data. The execution from and access to internal flash is in some cases up to several times faster than external flash.

    As the internal flash is limited in size, in many cases it is often used for storing the TouchGFX framework, screen definitions and UI logic while the bitmap data is stored in external flash.

    The portfolio of STM32 products used for graphic applications is between 0 Kbytes and up to a few Mbytes of internal flash memory.

    External memory may be required when the amount of bitmap data does not fit within internal flash.

    Further reading
    Refer to External Memories for more details.

    TouchGFX flash memory requirement:

    • Framework: 60kbytes to 100kbytes.
    • Screen definition and GUI logic: 1kbytes to 100kbytes.

    These numbers depend on the framework features used and the size and complexity of the application.

    Internal RAM

    Internal RAM can be used for storing the framebuffer(s), when the size of these fit within the available memory. Alternatively one might add external memory to the setup.

    Calculating the size of a framebuffer depends on the width, height and color depth. For example, a display with HVGA resolution (480x320) and 16 bit colors, the memory needed for one framebuffer is:

    Size of 1 framebuffer = 480 x 320 x 2 = 307,200 bytes

    The STM32 products used for graphic applications ranges from a few Kbytes and a few Mega Bytes of internal RAM.

    Further reading
    Refer to the External Memories section for more details on framebuffers in external memory.

    TouchGFX RAM requirement:

    • Framework: 10Kbytes to 30Kbytes
    • Widgets: 1Kbytes to 15Kbytes

    Memory requirements may vary from application to application.

    LCD Controller

    The choice of the MCU also depends on the display interface that will be used and the resolution. The 800x480 resolution for example can only be achieved with an efficient interface in terms of data transfer speed. RGB-TFT and MPI-DSI interfaces are often used for higher resolutions, as the bandwidth is in many cases higher than SPI or parallel 8080/6800. Small resolution displays often embed controller and GRAM and so can be connected through simple SPI or 8080/6800 interfaces.

    High resolution displays (WQVGA and above) often don’t embed controller and GRAM, therefore the controller needs to be at the microcontroller side. On STM32 MCUs embedding RGB-TFT and MIPI DSI interfaces the controller is present.

    The picture shows 4 examples of different display interfaces with/without GRAM and display controller.

    Further reading
    Refer to the Display section for more information.

    Packages & I/O

    The number of I/Os needed is dependent on the chosen display and external memories. Running a parallel display with parallel RAM/flash can require a high number of I/Os resulting in a larger package.

    Memory Interfacing

    When internal flash and RAM in the microcontroller is not sufficient, choosing the right MCU with the most suitable external memory interface becomes important. The STM32 products provide different memory controller peripherals to interface with the NOR, NAND, SRAM, SDRAM, LPSDR SDRAM, and PSRAM memories.

    Flexible Memory Controller & Flexible Static Memory Controller (FMC/FSMC)

    In addition to the support of the static RAM, the FMC adds dynamic RAM support (SDRAM) to the FSMC. The flexible memory controller (FMC) with its high external access speed and up to 32 bit data bus, allows for higher throughout from and to external RAM and hence better support of higher resolution. The FMC has an independent chip select for each memory bank. The FMC can control an external flash memory for the data and an external RAM memory for the framebuffer and heap extension for the graphical stack.

    Serial Memory Interface

    Depending on the STM32 product, the serial memory interface is embedded and allows interfacing with single, double, quad, octo, and HyperBus™ flash memories alongside QSPI, PSRAM, OPI PSRAM, and Hyper RAM memories. The serial high speed memory interface can control up to 256Mbytes when in memory mapped mode and 4Gbytes in indirect mode.

    Compared to parallel interfaces, the serial memory interface permits the connection of a lower cost external flash memory to small packages and reduces the number of used pins.

    However, the efficiency is usually lower with serial flash memory compared to parallel flash memory.

    Further reading
    Refer to AN4760 application note for more information: Quad-SPI interface on STM32 microcontrollers.

    Cortex®-M Cores

    STM32 MCUs comes in different ARC Cortex®-M architectures. Below are the most used cores for running graphics on STM32.

    Cortex®-M0+

    The Cortex®-M0+ is characterized by its simple architecture and low price. It is recommended for smaller static graphic applications, running at lower resolutions.

    Cortex®-M4

    The Cortex®-M4 contains more functionalities than the M0+ and accelerates calculations. It includes a DSP instruction set and a single precision FPU unit. These instructions offload the CPU and increases the speed of calculations.

    Cortex®-M7

    The Cortex®-M7 contains a more complex architecture but also a DSP instruction set, and comes with a more efficient FPU unit with double precision and a level1 cache memory with up to 16KB for data and instructions. The cache memory gives the possibility of having data and instructions close to the calculation unit in order to optimize the fetch time.

    Cortex®-M33

    The Cortex®-M33 is a core with advanced security features. It includes TrustZone® technology, which allows the MCU to run secure and non-secure applications on the same core. It has a simpler architecture compared to the CM7.

    Cortex®-M55

    The Cortex®-M55 is designed for AI and DSP applications and includes Helium technology for vector processing. The CM55 also includes TrustZone® technology.

    Feature overview

    FeatureCortex-M0+Cortex-M4Cortex-M7Cortex-M33Cortex-M55
    DMIPS/MHz range0.95-1.361.25-1.952.14-3.231.541.69
    Core Mark®/MHz2.463.425.014.104.40
    Digital Signal Processing (DSP) extensionNoYesYesYesYes
    Floating Point HardwareNoYes (SP)Yes (SP, DP)Yes (SP)Yes (SP, DP, HP)
    Built-in-cachesNoNoYes (option 4-64kB), I-Cache, D-CacheNoYes (option 4-64kB), I-Cache, D-Cache
    Bus ProtocolAHB Lite,Fast I/OAHB Lite, APBAXI4, AHB Lite, APB, TCMAHB, AHB Lite, APBAXI, AHB, AHB Lite, APB, TCM
    Dual Core Lock-Step SupportNoNoYesNoYes

    For further reference, check the ARM Cortex-M Processor Comparison Table.

    Level 1 cache

    The STM32H7 and STM32F7 families include up to 16 Kbytes of L1-Cache both for instructions and data. An L1-Cache stores a set of data or instructions near the CPU, so the CPU does not have to keep fetching the same data that is repeatedly used.

    Further reading
    Refer to AN4839 application note for more information: Level 1 Cache.

    Dual core

    The STM32H7 series includes the dual-core line:

    Arm® Cortex®-M7 and Cortex®-M4 cores can respectively run up to 480 MHz and 240 MHz enabling more processing and application partitioning. Dual-core STM32H7 product lines are available with an embedded SMPS for improved dynamic power efficiency.

    The second Cortex®-M4 can offload heavy calculations to open up the M7 core for the drawing/graphic operations.

    Note
    For dual core MCUs TouchGFX Generator must be enabled for a specific context. Only a single concurrent context is supported. See the TouchGFX Generator User Guide for more information.

    Bus architecture

    The majority of STM32 microcontrollers provide a 32-bits multi-AHB bus matrix interconnecting all the masters (CPU, DMAs, etc.) and the slaves (flash memory, RAM, FSMC, AHB and APB peripherals). This ensures seamless and efficient operations even when several high-speed peripherals work simultaneously.

    In addition to multi-AHB interconnect, some STM32 products embed 64-bit AXI to expand bandwidth. This yields the best compromise between performance and power consumption.

    Price

    The size of the internal flash, internal RAM, and number of pins available in the package influence the price of the MCU. Considering the requirements of the interface, resolution, performance, etc., the user can ultimately find suitable MCUs and estimate price.

    Further reading
  • See STM32 32-bit Arm Cortex MCUs for available STM32 microcontrollers.