Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

Doesn canonical equivalence mean that no Unicode-conformant process can treat canonically equivalent sequences differently in any way?

April 26, 2017Canonical equivalence equivalent mean process sequences treat unicode-conformant

0

10 Posted

Doesn canonical equivalence mean that no Unicode-conformant process can treat canonically equivalent sequences differently in any way?

1 Answer

0

10 Posted

No. That is too strong a statement about canonical equivalence. Let’s take a look at a simple example: <00C1> a-acute and the sequence <0041 0301> a+combining acute are canonically equivalent sequences. However, that doesn’t mean that “no Unicode-conformant processs should treat them differently in any way.” A Unicode-conformant process could declare that it does not interpret combining marks, in which case, for it, <0041 0301> is a sequence of <0041> plus an uninterpreted character. And trivially, a Unicode-conformant process allocating a buffer for character storage clearly has to treat <00C1> and <0041 0301> differently, since the amount of storage required differs. What canonical equivalence is supposed to mean is that if a Unicode- conformant process interprets all the code points involved in the canonical equivalence, it should not insist on an interpretive difference in the two as constituting some kind of character meaning difference. Thus what is non-conformant would be for Pr