Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

How is a Unicode string represented in ICU4C?

April 26, 2017icu4c represented String Unicode

0

Posted

How is a Unicode string represented in ICU4C?

1 Answer

0

Posted

A Unicode string is currently represented as UTF-16. The endianess of UTF-16 is platform dependent. You can guarantee the endianess of UTF-16 by using a converter. UTF-16 strings can be converted to other Unicode forms by using a converter or with the UTF conversion macros. ICU does not use UCS-2. UCS-2 is a subset of UTF-16. UCS-2 does not support surrogates, and UTF-16 does support surrogates. This means that UCS-2 only supports UTF-16’s Base Multilingual Plane (BMP). The notion of UCS-2 is deprecated and dead. Unicode 2.0 in 1996 changed its default encoding to UTF-16. If you need to do a quick and easy conversion between UTF-16 and UTF-8, UTF-32 or an encoding in wchar_t, you should take a look at unicode/ustring.h. In that header file you will find u_strToWCS, u_strFromWCS, u_strToUTF8, u_strFromUTF8, u_strToUTF32 and u_strFromUTF32 functions. These functions are provided for your convenience instead of using the ucnv_* API. You can also take a look at the UTF_*, UTF8_*, UTF16_* a