What does Unicode conformance require?
Chapter 3 discusses this in detail. Here’s a very informal version: 1) Unicode code units are 16 bits long; deal with it. 2) Byte order is only an issue in files. 3) If you don’t have a clue, assume big-endian. 4) Loose surrogates don’t mean jack. 5) Neither do U+FFFE and U+FFFF. 6) Leave the unassigned codepoints alone. 7) It’s OK to be ignorant about a character, but not plain wrong. 8) Subsets are strictly up to you. 9) Canonical equivalence matters. 10) Don’t garble what you don’t understand.