What would be the worst possible case for normalizing to NFC?
The worst performance for most implementations would be (in the 1M character example I’ve been using) something like: a base character followed by a list of 999,999 characters with CCC != 0, skewed towards non-BMP characters, sorted by CCC in reverse order. CCC is the Unicode Combining Class Property. For good measure, you could throw in one of the cases where NFC is not the shortest form, as above. The reason for this is that most implementations just sort the combining marks using a bubble sort. That has very good behavior for short sequences, and rather bad behavior for long ones. If an implementation really wanted to protect against the worst case, it would take the minor cost of a test for long sequences, and use a different sorting algorithm for that case.
Related Questions
- Is it possible to count the number of cases the resident does in the first year of the residency education and enter them in the case log system?
- Is it acceptable for the Feasibility Study to use the worst case scenario for key values and/or manufacturers?
- Is it possible to request for special snacks in the case a participant is allergic to various foods?