Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

Do Unicode blocks exactly match the blocks defined in ISO/IEC 10646?

April 26, 2017blocks defined IEC ISO Unicode

0

Posted

Do Unicode blocks exactly match the blocks defined in ISO/IEC 10646?

1 Answer

0

Posted

For the most part they do, but there are several principled exceptions. First, the Unicode blocks for Basic Latin and the Latin-1 Supplement are extended to incorporate the control characters, since the Unicode Standard prints out all the code points for the control characters, as well as the graphic characters. Unicode: 0000..007F; Basic Latin 10646: 0020-007E BASIC LATIN Unicode: 0080..00FF; Latin-1 Supplement 10646: 00A0-00FF LATIN-1 SUPPLEMENT There is a similar distinction for the special cases of the Byte Order Mark at U+FEFF and the two noncharacters at the very end of the BMP. Unicode: FE70..FEFF Arabic Presentation Forms-B 10646: FE70-FEFE ARABIC PRESENTATION FORMS-B Unicode: FFF0..FFFF Specials 10646: FFF0-FFFD SPECIALS Second, for Hangul syllables, 10646 defines a block that ends at the last encoded Hangul syllable, but the Unicode rules for block definitions require ending a block at an even 16-character boundary: Unicode: AC00..D7AF; Hangul Syllables 10646: AC00-D7A3 HANGU