What is the definition of UTF-8?
A. UTF-8 is the byte-oriented encoding form of Unicode. For details of its definition, see Section 2.5 “Encoding Forms” and Section 3.9 ” Unicode Encoding Forms ” in the Unicode Standard. See, in particular, Table 3-6 UTF-8 Bit Distribution and Table 3-7 Well-formed UTF-8 Byte Sequences, which give succinct summaries of the encoding form. Also see sample code which implements conversions between UTF-8 and other encoding forms. Make sure you refer to the latest version of the Unicode Standard, as the Unicode Technical Committee has tightened the definition of UTF-8 over time to more strictly enforce unique sequences and to prohibit encoding of certain invalid characters. There is an Internet RFC 3629 about UTF-8. UTF-8 is also defined in Annex D of ISO/IEC 10646.
Related Questions
- Charter schools by definition are autonomous institutions. Why then do charter schools have to submit the Language Census and why do they have to submit it in conjunction with a particular LEA?
- What is the Advanced Credit Repair definition of "clearing up" a consumers credit report or improving their credit profile?
- What is the definition of UTF-8?