Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

How can a captioner tell if they have UTF-16, UTF-8, or ANSI ASCII?

April 26, 2017ANSI ASCII captioner utf-16 utf-8

0

Posted

How can a captioner tell if they have UTF-16, UTF-8, or ANSI ASCII?

1 Answer

0

Posted

Obviously this UTF-8 scheme is indistinguishable from 8-bit expressions such as ANSI ASCII (eg. Latin1) in which all characters are 8 bits and all characters beyond 127 have the high bit set. So somehow the captioner has to know if characters in a document are expressed according to UTF-8 or ANSI ASCII or some other scheme. Unfortunately, there is no sure fire way to tell, but here’s what to look for: • Check accented characters, are they rendered correctly? If they are missing or wrong you are probably have UTF-8 when ANSI ASCII was expected. • If you have smile faces and goofy blotchy characters, you probably have UTF-8 when ANSI ASCII was expected. • If all you have in an entire document is one or two characters the odds are you have UTF-16 when UTF-8 or ANSI ASCII was expected. The reason is that UTF-16 expresses every character with two bytes, the first of which is often a null (zero) in Western languages — the null is often considered the end of a document by UTF-8 and ANSI ASCI