Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

How should the UTF-8 mode be activated?

April 26, 2017Activated mode utf-8

0

Posted

How should the UTF-8 mode be activated?

2 Answers

0

Posted

If your application is soft converted and does not use the standard locale-dependent C multibyte routines (mbsrtowcs(), wcsrtombs(), etc.) to convert everything into wchar_t for processing, then it might have to find out in some way, whether it is supposed to assume that the text data it handles is in some 8-bit encoding (like ISO 8859-1, where 1 byte = 1 character) or UTF-8. Once everyone uses only UTF-8, you can just make it the default, but until then both the classical 8-bit sets and UTF-8 may still have to be supported. The first wave of applications with UTF-8 support used a whole lot of different command line switches to activate their respective UTF-8 modes, for instance the famous xterm -u8. That turned out to be a very bad idea. Having to remember a special command line option or other configuration mechanism for every application is very tedious, which is why command line options are not the proper way of activating a UTF-8 mode. The proper way to activate UTF-8 is the POSIX

0

Posted

If your application supports both some 8-bit character set (ISO 8859-*, KOI-8, etc.) and UTF-8, then it has to find out in some way whether it is supposed to use the UTF-8 mode or not. Hopefully, in a few years everyone will only be using UTF-8 and you can just make it the default, but until then both the classical 8-bit sets and UTF-8 have to be supported. Current applications use a whole lot of different command line switches to activate their respective UTF-8 mode, for instance: • xterm command line option “-u8” and X resource “XTerm*utf8: 1” • gnat/gcc command line option “-gnatW8” • stty command line option “iutf8” • mined command line option “-U” • xemacs elisp package to convert between UTF-8 and the internally used MULE encoding • vim ‘fileencoding’ option • less environment variable LESSCHARSET=utf-8 Having to remember a special command line option or other configuration mechanism for every application is very tedious, so some standardization is urgently required here. If you