Why are some people opposed to UTF-16?
East Asians (Chinese, Japanese, and Koreans) are understandably nervous about UTF-16, which sometimes requires two code units to represent a single character. They have are well acquainted with the problems that variable-width codes (such as SJIS) have caused. However, there are some important differences between the mechanisms: • Overlap • In SJIS, there is overlap between the high unit values and the low unit values, and between the low unit values and the single unit values. This causes a number of problems: • It causes false matches. For example, searching for an “a” may match against the second unit of a Japanese character. • It prevents efficient random access. To know whether you are on a character boundary, you have to search backwards to find a known boundary. • It makes the text extremely fragile. If a unit is dropped from a high-low pair, many following characters can be corrupted. • In UTF-16, high, low, and single units are all completely disjoint.