Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

Doesn’t it cause a problem to have only UTF-16 string APIs, instead of UTF-32 char APIs?

April 26, 2017APIs cause char doesn’t problem String utf-16 utf-32

0

Posted

Doesn’t it cause a problem to have only UTF-16 string APIs, instead of UTF-32 char APIs?

1 Answer

0

Posted

Almost all international functions (upper-, lower-, titlecasing, case folding, drawing, measuring, collation, transliteration, grapheme-, word-, linebreaks, etc.) should take string parameters in the API, not single code-points (UTF-32). Single code-point APIs almost always produce the wrong results except for very simple languages, either because you need more context to get the right answer, or because you need to generate a sequence of characters to return the right answer, or both. For example, any Unicode-compliant collation (See Unicode Technical Standard #10: Unicode Collation Algogrithm (UCA)) must be able to handle sequences of more than one code-point, and treat that sequence as a single entity. Trying to collate by handling single code-points at a time, would get the wrong answer. The same will happen for drawing or measuring text a single code-point at a time; because scripts like Arabic are contextual, the width of x plus the width of y is not equal to the width of xy. Onc