Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

What speaks against encoding a distinct character? It would make it easier for software to recognize the digraph, and there would seem to be enough space in the Unicode Standard?

April 26, 2017character digraph distinct easier encoding recognize Software speaks

0

Posted

What speaks against encoding a distinct character? It would make it easier for software to recognize the digraph, and there would seem to be enough space in the Unicode Standard?

1 Answer

0

Posted

While it may seem that there is a lot of available space in the Unicode Standard, there are a number of issues. First, while the upper- and lowercase versions of a single digraph like “xy” only constitute a couple of characters, there are many languages in which digraphs may be treated specially. Second, each addition to the standard requires updates to the data tables and to all implementations and fonts that support the digraph. Third, there is the problem that people will not represent data consistently; some will use the new digraph character and some will not—you can count on that. Fourth, existing data will not magically update itself to make use of the new digraph. Because of these considerations and others, there will be situations in which it will be necessary to represent data using the decomposed form anyway—as for example when passing around normalized data on the Internet. In summary, the addition of a new digraph character has a fairly substantial (and costly) set of cons