Do Unicode blocks exactly match the blocks defined in ISO/IEC 10646?
For the most part they do, but there are several principled exceptions. First, the Unicode blocks for Basic Latin and the Latin-1 Supplement are extended to incorporate the control characters, since the Unicode Standard prints out all the code points for the control characters, as well as the graphic characters. Unicode: 0000..007F; Basic Latin 10646: 0020-007E BASIC LATIN Unicode: 0080..00FF; Latin-1 Supplement 10646: 00A0-00FF LATIN-1 SUPPLEMENT There is a similar distinction for the special cases of the Byte Order Mark at U+FEFF and the two noncharacters at the very end of the BMP. Unicode: FE70..FEFF Arabic Presentation Forms-B 10646: FE70-FEFE ARABIC PRESENTATION FORMS-B Unicode: FFF0..FFFF Specials 10646: FFF0-FFFD SPECIALS Second, for Hangul syllables, 10646 defines a block that ends at the last encoded Hangul syllable, but the Unicode rules for block definitions require ending a block at an even 16-character boundary: Unicode: AC00..D7AF; Hangul Syllables 10646: AC00-D7A3 HANGU