Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

What would be the worst possible case for normalizing to NFC?

0
10 Posted

What would be the worst possible case for normalizing to NFC?

0
10

The worst performance for most implementations would be (in the 1M character example I’ve been using) something like: a base character followed by a list of 999,999 characters with CCC != 0, skewed towards non-BMP characters, sorted by CCC in reverse order. CCC is the Unicode Combining Class Property. For good measure, you could throw in one of the cases where NFC is not the shortest form, as above. The reason for this is that most implementations just sort the combining marks using a bubble sort. That has very good behavior for short sequences, and rather bad behavior for long ones. If an implementation really wanted to protect against the worst case, it would take the minor cost of a test for long sequences, and use a different sorting algorithm for that case.

Related Questions

What is your question?

*Sadly, we had to bring back ads too. Hopefully more targeted.

Experts123