How is a Unicode string represented in ICU4C?
A Unicode string is currently represented as UTF-16. The endianess of UTF-16 is platform dependent. You can guarantee the endianess of UTF-16 by using a converter. UTF-16 strings can be converted to other Unicode forms by using a converter or with the UTF conversion macros. ICU does not use UCS-2. UCS-2 is a subset of UTF-16. UCS-2 does not support surrogates, and UTF-16 does support surrogates. This means that UCS-2 only supports UTF-16’s Base Multilingual Plane (BMP). The notion of UCS-2 is deprecated and dead. Unicode 2.0 in 1996 changed its default encoding to UTF-16. If you need to do a quick and easy conversion between UTF-16 and UTF-8, UTF-32 or an encoding in wchar_t, you should take a look at unicode/ustring.h. In that header file you will find u_strToWCS, u_strFromWCS, u_strToUTF8, u_strFromUTF8, u_strToUTF32 and u_strFromUTF32 functions. These functions are provided for your convenience instead of using the ucnv_* API. You can also take a look at the UTF_*, UTF8_*, UTF16_* a
Related Questions
- I use a servlet to export WML on the Nokia Toolkit. Why can I output a Chinese String (GB2313 or UNICODE)?
- I use a servlet to export WML on the Nokia Toolkit. Why can I output a Chinese String (GB2313 or UNICODE)?
- How can I avoid the Unicode problem & display an encrypted string?
- How can I avoid the Unicode problem & display an encrypted string?
- Wheres the large string version of the Unicode NSIS?
- How is a Unicode string represented in ICU4C?