How Does Unicode Handle South Asian Languages?
As a standard, Unicode is more concerned with characters and scripts than with languages. The characters used by different South Asian languages are therefore organized by script. Most often, characters in the same script have numbers that are next to each other in a character block . Currently (in Unicode 5.1) there are twelve “Indic script” character blocks in Unicode. They are: Bengali, Devanagari, Gujarati, Gurmukhi, Kannada, Limbu, Malayalam, Oriya, Sinhala, Syloti Nagri, Tamil, and Telugu. The characters used by right-to-left languages like Urdu, Pashto, and Sindhi are found on the Arabic character code chart. Tibetan has its own code chart.