Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

Why are some people opposed to UTF-16?

April 26, 2017people UTF utf-16

0

10 Posted

Why are some people opposed to UTF-16?

2 Answers

0

Posted

East Asians (Chinese, Japanese, and Koreans) are understandably nervous about UTF-16, which sometimes requires two code units to represent a single character. They have are well acquainted with the problems that variable-width codes (such as SJIS) have caused. However, there are some important differences between the mechanisms: • Overlap • In SJIS, there is overlap between the high unit values and the low unit values, and between the low unit values and the single unit values. This causes a number of problems: • It causes false matches. For example, searching for an “a” may match against the second unit of a Japanese character. • It prevents efficient random access. To know whether you are on a character boundary, you have to search backwards to find a known boundary. • It makes the text extremely fragile. If a unit is dropped from a high-low pair, many following characters can be corrupted. • In UTF-16, high, low, and single units are all completely disjoint.

0

Posted

People familiar with variable width East Asian character sets such as Shift-JIS ( SJIS) are understandably nervous about UTF-16, which sometimes requires two code units to represent a single character. They are well acquainted with the problems that variable-width codes have caused.