Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

My script does not sort right because the characters were assigned to Unicode code points in the wrong order. What can I do about that?

April 26, 2017characters code Points right Script sort Unicode wrong

0

Posted

My script does not sort right because the characters were assigned to Unicode code points in the wrong order. What can I do about that?

1 Answer

0

Posted

There is a misunderstanding here: Linguistically meaningful sorting is done not by comparing code point values (an approach which would fail even for English), but by assigning multi-level weights to characters or sequences of characters and then comparing those weights on each level. There are many algorithms and implementations for this; the standard Unicode Collation Algorithm (UCA) comes with a default weight table for all assigned characters as well as a tailoring mechanism that describes how this table can be modified to conform to local conventions, where necessary.