Skip to main content

Languages and Characters

TouchGFX enables internationalized and localized applications.

TouchGFX does this by supporting a wide range of languages and characters and by understanding text layout mechanisms, such as writing direction and contextual shaping.

Languages

The languages supported are the languages of the Unicode basic multilingual plane with the restriction that only Left-to-Right or Right-to-Left writing systems are supported. This implies that languages such as Arabic, Chinese, English and many more are supported, maybe with a few limitations. Urdu and Burmese are examples of languages that are currently not supported.

Characters

The encoding of characters is based on the Unicode standard. 16-bit Unicodes are supported, i.e. Unicodes from 0x0000 to 0xFFFF are supported. Some languages may use the Private Use Area from 0xE000-0xE3FF for special characters in a given font (e.g. Devanagari).

Writing Direction

TouchGFX supports Left-to-Right (LTR) and Right-to-Left (RTL) writing systems. There is no built-in support for Top-to-Bottom writing systems.

It should be noted that RTL does not mean that text is written backwards (compared to LTR). It means that WORDS are written starting from the right towards the left. For Arabic and Hebrew, this is the correct setting. "TouchGFX" will not be written "XFGhcuoT" but the direction of words (or collection of words) can be controlled using the RTL/LTR setting.

The handling of LTR and RTL writing inside TouchGFX applications respects mixing of the two to some degree. This is known as bidirectional script support. A subset of the official rules for bidirectional writing is supported by TouchGFX. This means that for example "10:45", "3.14159", "STMicroelectronics TouchGFX" and others are recognized and written fully LTR even in an RTL text.

For RTL text, some parts of the text must thus be written LTR. This text is found and collected; all characters that are non-RTL letters are collected. Characters such as color (:), dot (.), comma (,), space ( ) will also "tie together" two consecutive LTR parts. This is what makes sure that "10:45" is handled as a single LTR entity whereas "Mark:" (ending in a color) will get the colon to the left as Arabic and Hebrew speaking countries would expect, i.e. "<some Arabic message> :Mark" where the colon is on the left side in the RTL text.

Please note that numbers used in the Latin character set (0-9), as well as numbers used in the Arabic character set, are all handled as LTR characters to make sure that numbers show up properly on the display.

It should also be noted that the writing direction is very important when a text contains a mix of LTR entities and RTL entities. Also note, that it cannot be determined if a text is RTL or LTR by examining the characters in the text. If a text contains first a Hebrew word (RTL) and then an English word (LTR), the output on display will depend on the writing direction of the text. If the text is set to be RTL the output would look something like this: "English werbeH" because the entire text is RTL so the first word must be written to the far right, but if the text is set to be LTR the output would look something like this: "werbeH English" because the text should start with the first word at the far left. The RTL versus LTR setting cannot be determined automatically because an English text may contain Hebrew words, just like a Hebrew text may contain English words.

Another important issue regarding RTL text is the automatic swapping of parenthesis characters. These are (, ), {, }, [, ], <, >. All these are automatically swapped with the opposite character to ensure that text looks correct. Please note that there is no automatic conversion from Latin numbers to Arabic numbers. This must be done by the user before displaying the text, should this be desired.

Contextual Shaping

Certain scripts will select a different form of one or more characters/glyphs depending on the context of the character. As an example the Arabic alphabet has different contextual forms for the letters in the alphabet, depending on the position of the letter inside the word. TouchGFX supports such contextual shaping of languages by implementing a simplified set of rules for combining characters. Also, some diacritics are placed using custom logic to determine the vertical position - this is particularly true for Arabic, Thai and Devanagari.

List of Supported Languages

It is difficult to provide an exhaustive list of all supported languages. In general, standard glyphs without special re-ordering or positioning rules are supported. Some languages, such as Hindi (Devanagari) and Arabic, with special rules have been included in TouchGFX.

Left-to-Right Languages

Simple languages using Latin characters

In general, simple languages using characters and glyphs that do not require special re-ordering or positioning are supported. These languages include, but is not limited to, these:

  • Bosnian, Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Hungarian, Italian, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Serbian, Slovenian, Slovak, Spanish, Swedish, Turkish, Ukrainian

Simple languages using special character sets

Some languages still follow simple positioning rules, but use a different set of characters and glyphs. These are also support and include, but is not limited to, these:

  • Chinese, Greek, Japanese, Russian

Other

  • Thai has limited support. Tone marks are positioned (vertically) using TouchGFX rules.
  • Hindi (Devanagari) has limited support. Some characters may be placed slightly wrong, but text should not be unreadable.

Right-to-Left Languages

Simple languages using special character sets

  • Hebrew, Indonesian, Kazakh

Languages with different ligatures for different forms (isolated, initial, middle, final)

  • Arabic (Sequences of more than four characters are not recognized and converted to one ligature. These are: Sallallahou Alayhe Wasallam, Jallajalalouhou and Rial Sign). Some diacritics may be placed slightly incorrect.
  • Farsi
  • Malay (ݢ "Keheh with dot above" only supported in isolated form)

Unsupported Languages

The following languages are known to be unsupported because they rely on extensive use of ligatures, digraphs and vertical positioning:

  • Urdu, Burmese