Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

What are code points, code units, supplementary characters, and all this other stuff?

April 26, 2017characters code Points stuff Supplementary units

0

10 Posted

What are code points, code units, supplementary characters, and all this other stuff?

1 Answer

0

0 Posted

A coded character set is a character set (a collection of characters) where each character has been assigned a unique number. At the core of the Unicode standard is a coded character set that assigns the letter “A” the number 004116 and the letter “€” (the symbol for the euro currency) the number 20AC16. The Unicode standard always uses hexadecimal numbers, and writes them with the prefix “U+”, so the number for “A” is written as “U+0041”. Code points are the numbers that can be used in a coded character set. A coded character set defines a range of valid code points, but doesn’t necessarily assign characters to all those code points. The valid code points for Unicode are U+0000 to U+10FFFF. Unicode 4.0 assigns characters to 96,382 of these more than a million code points. Supplementary characters are characters with code points in the range U+10000 to U+10FFFF, that is, those characters that could not be represented in the original 16-bit design of Unicode. The set of characters from