I understand that all Unicode characters are 16 bits, and that the high byte is used to switch between code blocks. Is that correct?
Absolutely not! Unicode characters may be encoded at any code point from U+0000 to U+10FFFF. The size of the code unit used for expressing those code points may be 8 bits (for UTF-8), 16 bits (for UTF-16), or 32 bits (for UTF-32) [See UTF & BOM]. Even when Unicode characters are expressed with 16-bit code units, there is no concept of a high byte switching values between “code pages” expressed in the low byte. The entire 16-bit value expresses the entire character, period.
Related Questions
- I understand that all Unicode characters are 16 bits, and that the high byte is used to switch between code blocks. Is that correct?
- Why can I not type characters used in older Gurmukhi (for example, the SGGS) using Unicode?
- How many bits are used to represent Unicode, ASCII, UTF-16, and UTF-8 characters?