FAQ

Q: What is Unicode?

A: Unicode is the universal character encoding, maintained by the Unicode Consortium. This encoding standard provides the basis for processing, storage and interchange of text data in any language in all modern software and information technology protocols.

Q: What is the scope of Unicode?

A: Unicode covers all the characters for all the writing systems of the world, modern and ancient. It also includes technical symbols, punctuations, and many other characters used in writing text. The Unicode Standard is intended to support the needs of all types of users, whether in business or academia, using mainstream or minority scripts.

Q: How many languages are covered by Unicode?

A: It’s hard to say, because Unicode encodes characters for scripts, rather than languages per se. Many scripts (especially the Latin script) are used to write a large number of languages (e.g both French and English). The easiest answer is that Unicode has encoded scripts for all of the world’s languages (see Supported Scripts for the full list). Unicode also includes many historic scripts used to write long-dead languages, as well as lesser-used regional scripts that may be used as a second (or even third) way to write a particular language. However, encoding a script and having full support on a given computer or phone are not the same thing and there is still more work to be done there.

Q: What still needs to be done to support languages on computers and phones?

A: Supporting a language on a computer or phone starts by encoding its script (characters), but also needs the development of keyboards and other input tools, fonts, language data (e.g “date formats”) and translations for this language data. This has to be done for many hundreds of languages and this work is far from complete.

See More FAQ on our Technical Site