Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

I understand that all Unicode characters are 16 bits, and that the high byte is used to switch between code blocks. Is that correct?

April 26, 2017Bits byte characters code high Switch understand Unicode Used

0

Posted

I understand that all Unicode characters are 16 bits, and that the high byte is used to switch between code blocks. Is that correct?

1 Answer

0

Posted

Absolutely not! Unicode characters may be encoded at any code point from U+0000 to U+10FFFF. The size of the code unit used for expressing those code points may be 8 bits (for UTF-8), 16 bits (for UTF-16), or 32 bits (for UTF-32) [See UTF & BOM]. Even when Unicode characters are expressed with 16-bit code units, there is no concept of a high byte switching values between “code pages” expressed in the low byte. The entire 16-bit value expresses the entire character, period.