What are the porting issues I need to watch out for with UTF-8?
If you port to UTF-8, all code that does not try to interpret byte values greater than 0x7F will work, because ASCII and UTF-8 are identical up to 0x7F. However, watch for anything that truncates strings or buffers at places other than ‘\n’ or ‘\0’ or at space or syntax characters from the ASCII range. Truncations based on character counting are inherently dangerous, because UTF-8 is a multi-byte encoding. Also watch out for jumps into the middle of a string. Many kinds of inner loop code exist for which the code does not need to be aware of the multi-byte nature of UTF-8, for example a simple copy operation like: while (–len && *d++ = *s++); if (!len) *d = 0; will work correctly either for ASCII or for UTF-8.