Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

What is a UTF?

April 26, 2017UTF

0

10 Posted

What is a UTF?

2 Answers

0

Posted

A Unicode transformation format (UTF) is an algorithmic mapping from every Unicode scalar value to a unique byte sequence. (The SCSU compression method is not a UTF because the same string can map to very many different byte sequences, depending on the capabilities of the compressor.) Since every Unicode coded character sequence maps to a unique sequence of bytes in a given UTF, a reverse mapping can be derived. Thus every UTF supports lossless round tripping: mapping from any Unicode coded character sequence S to a sequence of bytes and back will produce S again. To ensure round tripping, a UTF mapping must also map the 16-bit values that are not valid Unicode values to unique byte sequences. These invalid 16-bit values are FFFE, FFFF, and unpaired surrogates.

0

Posted

A Unicode transformation format (UTF) is an algorithmic mapping from every Unicode code point (except surrogate code points) to a unique byte sequence. The ISO/IEC 10646 standard uses the term “UCS transformation format” for UTF; the two terms are merely synonyms for the same concept. Each UTF is reversible, thus every UTF supports lossless round tripping: mapping from any Unicode coded character sequence S to a sequence of bytes and back will produce S again. To ensure round tripping, a UTF mapping must also map all code points that are not valid Unicode characters to unique byte sequences. These invalid code points are the 66 noncharacters (including FFFE and FFFF), as well as unpaired surrogates. The SCSU compression method, even though it is reversible, is not a UTF because the same string can map to very many different byte sequences, depending on the particular SCSU compressor.