The microcontroller unit (MCU) is at the core of any embedded solution and there are a wide variety of options in both costs and features.
When selecting an MCU for graphics, one should consider the supported display interfaces, the MCU package, size and the achievable graphics performance which depends on two main points:
- The availability of graphics accelerators integrated in the MCU.
- The availability of cache memory in the system.
Memory access and bandwidth
- The clock frequency and the subsystem bus frequency.
- The access to the internal flash and RAM memories.
It is also important to consider the other aspects of the application (motor control, wireless, etc.), which are running in addition to the graphics. These can influence the choice of MCU.
This page will go through the different MCU options and which parameters should be considered when deciding on the STM32 MCU you should select for your GUI driven application.
The core frequency has a major impact on the performance of a graphical application in terms of screen refresh, fluidity of screens and animations.
It impacts the amount of data that can be transferred from an internal or external memory to the display framebuffer and also the calculations and animations possible.
The higher the frequency, the more data it is possible to transfer within a given timeframe and the more complex animations can be made.
The core frequencies of the STM32 products is up to 480MHz.
It is important to differentiate the core CPU frequency from the graphic subsystem frequency. The graphic subsystem frequency includes the frequency of the internal busses, the frequency of the graphics accelerator as well as the access speed of the internal and external memories.
The graphic subsystem frequency also has a major impact on the overall graphic performance.
An example of assessing the theoretical core and subsystem performance when running from internal RAM on an STM32H7 can be seen next:
- The CPU core is running at 480MHz.
- The 64-bit AXI bus frequency at 240MHz.
- The LCD-TFT display controller (LTDC) uses the 64-bit AXI bus, and does 8 transfers in 10 cycles.
- The internal RAM poses no significant latency, i.e. 0 wait states.
The bandwidth of the internal RAM when accessed by the LTDC peripheral is then:
- Bandwidth = 240 MHz x 8/10 x 8 bytes = 1.536Mbytes/s.
With such bandwidth, the internal RAM can ensure 1000 frames per second (fps) for 800x480 resolution at 32bpp color depth. Typically one would limit the transfer to the display (by adjusting pixel clock, porches, ...) to 60 frames per second, so the bandwidth of the LTDC and internal RAM is not a bottleneck.
Different STM32 MCUs have different built-in hardware acceleration features that help in achieving high performing graphics applications.
Chrom-ART is an advanced DMA that aids in doing graphical operations. It is also known as DMA2D.
The Chrom-ART accelerator, integrated in many STM32 platforms, is able to manipulate and transfer images without CPU load. It has the capability to accelerate the majority of the graphic operations, such as color filling, image copying, blending, and pixel format conversions.
The Chrom-ART accelerator is able to perform blending of two layers and convert the initial pixel formats to the desired output pixel format and transfer the result to the memory destination in only one operation.
The Chrom-ART accelerator also supports color formats with color look up tables (CLUT). This can help with saving memory.
Example of an application running on the STM32F496-EVAL board where the CPU load is decreased from 82% to 4% when the Chrom-ART is enabled:
In addition, the capability to convert from YCbCr format to RGB format is added with STM32H7 products to the Chrom-ART peripheral. This feature, combined with the JPEG hardware codec can offload the CPU when encoding and decoding JPEG images.
The Chrom-ART accelerator, with the features listed above, offers a huge advantage for graphical applications. If available in the chosen MCU, TouchGFX handles all Chrom-ART features and redirects all possible drawing operations to the Chrom-ART peripheral instead of the CPU.
The Chrom-ART peripheral is available with high performance STM32 families.
The STM32H7 and STM32F7 families provide a hardware JPEG codec to encode and decode images and videos.
This feature is important if the UI application needs to play a video file or display JPEG images.
JPEG images generally take up less memory. The JPEG hardware codec ensures that the images can be decoded at runtime without CPU overload.
Some TouchGFX demos utilizes the JPEG hardware codec, offloading the CPU while playing an MJPEG video.
The STM32 Chrom-GRC™ (GFXMMU) is a peripheral in some STM32 microcontrollers that aims to efficiently support the emerging trend towards non-rectangular displays.
The Chrom-GRC™ peripheral enables applications to reduce the amount of RAM needed for storing the framebuffer when addressing non-rectangular displays.
In the case of a round display, the peripheral reduces the memory requirements by 20%.
The Chrom-GRC™ peripheral is not mandatory when controlling non-square screens, but it is recommended.
A graphical user interface application using bitmap resources needs non-volatile memory to store the data. The execution from and access to internal flash is in some cases up to two times faster than external flash.
As the internal flash is limited in size, in many cases it is often used for storing the TouchGFX framework, screen definitions and UI logic while the bitmap data is stored in external flash.
The portfolio of STM32 products used for graphic applications is between a few Kbytes and a few Mega bytes of internal flash memory.
External memory may be required when the amount of bitmap data does not fit within internal flash.
TouchGFX flash memory requirement:
- Framework: 60kbytes to 100kbytes.
- Screen definition and GUI logic: 1 to 100Kbytes.
These numbers depend on the framework features used and the size and complexity of the application.
Internal RAM can be used for storing the framebuffer(s), when the size of these fit within the available memory. Alternatively one might add external memory to the setup.
Calculating the size of a framebuffer depends on the width, height and color depth. For example, a display with HVGA resolution (480x320) and 16 bit colors, the memory needed for one framebuffer is:
Size of 1 framebuffer = 480 x 320 x 2 = 307.200 bytes
The STM32 products used for graphic applications ranges from a few Kbytes and a few Mega Bytes of internal RAM.
TouchGFX RAM requirement:
- Framework: 10Kbytes to 30Kbytes
- Widgets: 1Kbytes to 15Kbytes
Memory requirements may vary from application to application.
The choice of the MCU also depends on the display interface that will be used and the resolution. The 800x480 resolution for example can only be achieved with an efficient interface in terms of data transfer speed. RGB-TFT and MPI-DSI interfaces are often used for higher resolutions, as the bandwidth is in many cases higher than SPI or parallel 8080/6800. Small resolution displays often embed controller and GRAM and so can be connected through simple SPI or 8080/6800 interfaces.
High resolution displays (WQVGA and above) often don’t embed controller and GRAM, therefore the controller needs to be at the microcontroller side. On STM32 MCUs embedding RGB-TFT and MIPI DSI interfaces the controller is present.
The picture shows 4 examples of different display interfaces with/without GRAM and display controller.
The number of I/Os needed is dependent on the chosen display and external memories. Running a parallel display with parallel RAM/flash can require a high number of I/Os resulting in a larger package.
When internal flash and RAM in the microcontroller is not sufficient, choosing the right MCU with the most suitable external memory interface becomes important. The STM32 products provide different memory controller peripherals to interface with the NOR, NAND, SRAM, SDRAM, LPSDR SDRAM, and PSRAM memories.
In addition to the support of the static RAM, the FMC adds dynamic RAM support (SDRAM) to the FSMC. The flexible memory controller (FMC) with its high external access speed and 8, 16 and especially 32 bit data bus, allows for higher throughout from and to external RAM and hence better support of higher resolution. The FMC has an independent chip select for each memory bank. The FMC can control an external flash memory for the data and an external RAM memory for the framebuffer and heap extension for the graphical stack.
Depending on the STM32 product, the serial memory interface is embedded and allows interfacing with single, double, quad, octo, and hyperBus flash memories alongside QSPI, PSRAM, OPI PSRAM, and Hyper RAM memories. The serial high speed memory interface can control up to 256 Mbytes when in memory mapped mode and 4Gbytes in indirect mode.
Compared to parallel interfaces, the serial memory interface permits the connection of a lower cost external flash memory to small packages and reduces the number of used pins.
For price optimization, STM32H7 and STM32F7 platforms offer value line products with limited amount of internal flash. With these products, the graphic resources will be stored in the external flash.
STM32 MCUs comes in different ARC Cortex®-M architectures. Below are the most used cores for running graphics on STM32.
The Cortex®-M0+ is characterized by its simple architecture and low price. It is recommended for smaller static graphic applications, running at lower resolutions.
The Cortex®-M4 contains more functionalities than the M0+ and accelerates calculations. It includes a DSP instruction set and a single precision FPU unit. These instructions offload the CPU and increases the speed of calculations.
The Cortex®-M7 contains a more complex architecture but also a DSP instruction set, and comes with a more efficient FPU unit with double precision and a level1 cache memory with up to 16KB for data and instructions. The cache memory gives the possibility of having data and instructions close to the calculation unit in order to optimize the fetch time.
|Digital Signal Processing (DSP) extension||No||Yes||Yes|
|Floating Point Hardware||No||Yes (SP)||Yes (SP + DP)|
|Built-in-caches||No||No||Yes (option 4-64KB), I-Cachen D-Cache|
|Bus Protocol||AHB Lite,Fast I/O||AHB Lite, APB||AXI4, AHB Lite, APB, TCM|
|Dual Core Lock-Step Support||No||No||Yes|
The STM32H7 and STM32F7 families include up to 16 Kbytes of L1-Cache both for instructions and data. An L1-Cache stores a set of data or instructions near the CPU, so the CPU does not have to keep fetching the same data that is repeatedly used.
The STM32H7 series includes the dual-core line:
Arm® Cortex®-M7 and Cortex®-M4 cores can respectively run up to 480 MHz and 240 MHz enabling more processing and application partitioning. Dual-core STM32H7 product lines are available with an embedded SMPS for improved dynamic power efficiency.
The second Cortex®-M4 can offload heavy calculations to open up the M7 core for the drawing/graphic operations.
The majority of STM32 microcontrollers provide a 32-bits multi-AHB bus matrix interconnecting all the masters (CPU, DMAs, etc.) and the slaves (flash memory, RAM, FSMC, AHB and APB peripherals). This ensures seamless and efficient operations even when several high-speed peripherals work simultaneously.
In addition to multi-AHB interconnect, some STM32 (Cortex®-M7) products embed 64-bit AXI to expand bandwidth. This yields the best compromise between performance and power consumption.
The size of the internal flash, internal RAM, and number of pins available in the package influence the price of the MCU. Considering the requirements of the interface, resolution, performance, etc., the user can ultimately find suitable MCUs and estimate price.