How can a captioner tell if they have UTF-16, UTF-8, or ANSI ASCII?
Obviously this UTF-8 scheme is indistinguishable from 8-bit expressions such as ANSI ASCII (eg. Latin1) in which all characters are 8 bits and all characters beyond 127 have the high bit set. So somehow the captioner has to know if characters in a document are expressed according to UTF-8 or ANSI ASCII or some other scheme. Unfortunately, there is no sure fire way to tell, but here’s what to look for: • Check accented characters, are they rendered correctly? If they are missing or wrong you are probably have UTF-8 when ANSI ASCII was expected. • If you have smile faces and goofy blotchy characters, you probably have UTF-8 when ANSI ASCII was expected. • If all you have in an entire document is one or two characters the odds are you have UTF-16 when UTF-8 or ANSI ASCII was expected. The reason is that UTF-16 expresses every character with two bytes, the first of which is often a null (zero) in Western languages — the null is often considered the end of a document by UTF-8 and ANSI ASCI