MCU
The microcontroller unit (MCU) is at the core of any embedded solution and there are a wide variety of options in terms of both cost and features.
When selecting an MCU for graphics, it is important to consider the supported display interfaces, the MCU package size, and the achievable graphics performance, which depends on two main factors:
Image composition
- The availability of graphics accelerators integrated in the MCU.
- The availability of cache memory in the system.
Memory access and bandwidth
- The clock frequency and the subsystem bus frequency.
- The access to the internal flash and RAM memories.
It is also important to consider other aspects of the application, such as e.g. motor control or wireless communication, which run in addition to the graphics. These factors can influence the choice of MCU.
This page will explore the various MCU options and the parameters to consider when selecting an STM32 MCU for a GUI-driven application.
For a complete overview of the STM32 MCUs listed above, including information about internal memory and peripherals, see the STM32 MCU portfolio.
Further reading
- The complete STM32 MCU portfolio can be found here.
- For a more complete overview of all product lines, including prices etc., the ST MCU Finder is available here.
Frequency
The core frequency has a major impact on the performance of a graphical application in terms of screen refresh, fluidity of screens, and animations.
It impacts the amount of data that can be transferred from an internal or external memory to the display framebuffer and also the calculations and animations possible.
The higher the frequency, the more data it is possible to transfer within a given timeframe and the more complex animations can be made.
The core frequency of STM32 MCUs is up to 800MHz.
Note
Graphic Subsystem Frequency
It is important to differentiate the core CPU frequency from the graphic subsystem frequency. The graphic subsystem frequency includes the frequency of the internal busses, the frequency of the graphics accelerator as well as the access speed of the internal and external memories.
The graphic subsystem frequency also has a major impact on the overall graphic performance.
An example of how to calculate the performance of a graphics subsystem can be found in this article. The article focuses on framebuffers in external RAM, but the same procedure can be applied to internal RAM as well.
Embedded Hardware Acceleration Features
Different STM32 MCUs have different built-in hardware acceleration features that help in achieving high performing graphics applications.
NeoChrom GPU
NeoChrom GPU can hardware accelerate some graphical operations such as texture mapping, scaling and vector rendering. It is also known as GPU2D.
NeoChrom GPU also comes in a version called NeoChromVG GPU, which can further accelerate vector rendering.
For a detailed description of NeoChrom GPU and its capabilities, visit the article about TouchGFX on NeoChrom/NeoChromVG.
Chrom-ART
Chrom-ART is an advanced DMA that aids in doing graphical operations. It is also known as DMA2D.
The Chrom-ART accelerator, integrated in many STM32 platforms, is able to manipulate and transfer images without CPU load. It has the capability to accelerate the majority of the graphic operations, such as color filling, image copying, blending, and pixel format conversions.
The Chrom-ART accelerator is able to perform blending of two layers and convert the initial pixel formats to the desired output pixel format and transfer the result to the memory destination in only one operation.
The Chrom-ART accelerator also supports color formats with color look up tables (CLUT). This can help with saving memory.
Example of an application running on the STM32F469-EVAL board where the CPU load is decreased from 82% to 4% when the Chrom-ART is enabled:
In addition, the capability to convert from YCbCr format to RGB format is added with STM32H7 products to the Chrom-ART peripheral. This feature, combined with the JPEG hardware codec can offload the CPU when encoding and decoding JPEG images.
The Chrom-ART accelerator, with the features listed above, offers a huge advantage for graphical applications. If available in the chosen MCU, TouchGFX handles all Chrom-ART features and redirects all possible drawing operations to the Chrom-ART peripheral instead of the CPU.
The Chrom-ART peripheral is available with high performance STM32 families.
Further reading
JPEG Hardware Codec
Some STM32 families provide a hardware JPEG codec to encode and decode images and videos.
This feature is important if the UI application needs to play a video file or display JPEG images.
JPEG images generally take up less memory. The JPEG hardware codec ensures that the images can be decoded at runtime without CPU overload.
Some TouchGFX demos utilizes the JPEG hardware codec, offloading the CPU while playing an MJPEG video.
Further reading
Chrom-GRC
The STM32 Chrom-GRC™ (GFXMMU) is a peripheral in some STM32 microcontrollers that aims to efficiently support the emerging trend towards non-rectangular displays.
The Chrom-GRC™ peripheral enables applications to reduce the amount of RAM needed for storing the framebuffer when addressing non-rectangular displays.
In the case of a round display, the peripheral reduces the memory requirements by 20%.
The Chrom-GRC™ peripheral is not mandatory when controlling non-square screens, but it is recommended.
Chrom-GRC™ is also utilized when using emulated framebuffer.
Further reading
Internal Flash
A graphical user interface application using bitmap resources needs non-volatile memory to store the data. The execution from and access to internal flash is in some cases up to several times faster than external flash.
As the internal flash is limited in size, in many cases it is often used for storing the TouchGFX framework, screen definitions and UI logic while the bitmap data is stored in external flash.
The portfolio of STM32 products used for graphic applications is between 0 Kbytes and up to a few Mbytes of internal flash memory.
External memory may be required when the amount of bitmap data does not fit within internal flash.
Further reading
TouchGFX flash memory requirement:
- Framework: 60kbytes to 100kbytes.
- Screen definition and GUI logic: 1kbytes to 100kbytes.
These numbers depend on the framework features used and the size and complexity of the application.
Internal RAM
Internal RAM can be used for storing the framebuffer(s), when the size of these fit within the available memory. Alternatively one might add external memory to the setup.
Calculating the size of a framebuffer depends on the width, height and color depth. For example, a display with HVGA resolution (480x320) and 16 bit colors, the memory needed for one framebuffer is:
Size of 1 framebuffer = 480 x 320 x 2 = 307,200 bytes
The STM32 products used for graphic applications ranges from a few Kbytes and a few Mega Bytes of internal RAM.
Further reading
TouchGFX RAM requirement:
- Framework: 10Kbytes to 30Kbytes
- Widgets: 1Kbytes to 15Kbytes
Memory requirements may vary from application to application.
LCD Controller
The choice of the MCU also depends on the display interface that will be used and the resolution. The 800x480 resolution for example can only be achieved with an efficient interface in terms of data transfer speed. RGB-TFT and MPI-DSI interfaces are often used for higher resolutions, as the bandwidth is in many cases higher than SPI or parallel 8080/6800. Small resolution displays often embed controller and GRAM and so can be connected through simple SPI or 8080/6800 interfaces.
High resolution displays (WQVGA and above) often don’t embed controller and GRAM, therefore the controller needs to be at the microcontroller side. On STM32 MCUs embedding RGB-TFT and MIPI DSI interfaces the controller is present.

The picture shows 4 examples of different display interfaces with/without GRAM and display controller.
Further reading
Packages & I/O
The number of I/Os needed is dependent on the chosen display and external memories. Running a parallel display with parallel RAM/flash can require a high number of I/Os resulting in a larger package.
Memory Interfacing
When internal flash and RAM in the microcontroller is not sufficient, choosing the right MCU with the most suitable external memory interface becomes important. The STM32 products provide different memory controller peripherals to interface with the NOR, NAND, SRAM, SDRAM, LPSDR SDRAM, and PSRAM memories.
Flexible Memory Controller & Flexible Static Memory Controller (FMC/FSMC)
In addition to the support of the static RAM, the FMC adds dynamic RAM support (SDRAM) to the FSMC. The flexible memory controller (FMC) with its high external access speed and up to 32 bit data bus, allows for higher throughout from and to external RAM and hence better support of higher resolution. The FMC has an independent chip select for each memory bank. The FMC can control an external flash memory for the data and an external RAM memory for the framebuffer and heap extension for the graphical stack.
Serial Memory Interface
Depending on the STM32 product, the serial memory interface is embedded and allows interfacing with single, double, quad, octo, and HyperBus™ flash memories alongside QSPI, PSRAM, OPI PSRAM, and Hyper RAM memories. The serial high speed memory interface can control up to 256Mbytes when in memory mapped mode and 4Gbytes in indirect mode.
Compared to parallel interfaces, the serial memory interface permits the connection of a lower cost external flash memory to small packages and reduces the number of used pins.
However, the efficiency is usually lower with serial flash memory compared to parallel flash memory.
Further reading
Cortex®-M Cores
STM32 MCUs comes in different ARC Cortex®-M architectures. Below are the most used cores for running graphics on STM32.
Cortex®-M0+
The Cortex®-M0+ is characterized by its simple architecture and low price. It is recommended for smaller static graphic applications, running at lower resolutions.
Cortex®-M4
The Cortex®-M4 contains more functionalities than the M0+ and accelerates calculations. It includes a DSP instruction set and a single precision FPU unit. These instructions offload the CPU and increases the speed of calculations.
Cortex®-M7
The Cortex®-M7 contains a more complex architecture but also a DSP instruction set, and comes with a more efficient FPU unit with double precision and a level1 cache memory with up to 16KB for data and instructions. The cache memory gives the possibility of having data and instructions close to the calculation unit in order to optimize the fetch time.
Cortex®-M33
The Cortex®-M33 is a core with advanced security features. It includes TrustZone® technology, which allows the MCU to run secure and non-secure applications on the same core. It has a simpler architecture compared to the CM7.
Cortex®-M55
The Cortex®-M55 is designed for AI and DSP applications and includes Helium technology for vector processing. The CM55 also includes TrustZone® technology.
Feature overview
Feature | Cortex-M0+ | Cortex-M4 | Cortex-M7 | Cortex-M33 | Cortex-M55 |
---|---|---|---|---|---|
DMIPS/MHz range | 0.95-1.36 | 1.25-1.95 | 2.14-3.23 | 1.54 | 1.69 |
Core Mark®/MHz | 2.46 | 3.42 | 5.01 | 4.10 | 4.40 |
Digital Signal Processing (DSP) extension | No | Yes | Yes | Yes | Yes |
Floating Point Hardware | No | Yes (SP) | Yes (SP, DP) | Yes (SP) | Yes (SP, DP, HP) |
Built-in-caches | No | No | Yes (option 4-64kB), I-Cache, D-Cache | No | Yes (option 4-64kB), I-Cache, D-Cache |
Bus Protocol | AHB Lite,Fast I/O | AHB Lite, APB | AXI4, AHB Lite, APB, TCM | AHB, AHB Lite, APB | AXI, AHB, AHB Lite, APB, TCM |
Dual Core Lock-Step Support | No | No | Yes | No | Yes |
For further reference, check the ARM Cortex-M Processor Comparison Table.
Level 1 cache
The STM32H7 and STM32F7 families include up to 16 Kbytes of L1-Cache both for instructions and data. An L1-Cache stores a set of data or instructions near the CPU, so the CPU does not have to keep fetching the same data that is repeatedly used.
Further reading
Dual core
The STM32H7 series includes the dual-core line:
Arm® Cortex®-M7 and Cortex®-M4 cores can respectively run up to 480 MHz and 240 MHz enabling more processing and application partitioning. Dual-core STM32H7 product lines are available with an embedded SMPS for improved dynamic power efficiency.
The second Cortex®-M4 can offload heavy calculations to open up the M7 core for the drawing/graphic operations.
Note
Bus architecture
The majority of STM32 microcontrollers provide a 32-bits multi-AHB bus matrix interconnecting all the masters (CPU, DMAs, etc.) and the slaves (flash memory, RAM, FSMC, AHB and APB peripherals). This ensures seamless and efficient operations even when several high-speed peripherals work simultaneously.
In addition to multi-AHB interconnect, some STM32 products embed 64-bit AXI to expand bandwidth. This yields the best compromise between performance and power consumption.
Price
The size of the internal flash, internal RAM, and number of pins available in the package influence the price of the MCU. Considering the requirements of the interface, resolution, performance, etc., the user can ultimately find suitable MCUs and estimate price.