Unicode provides a unique number for every character, no matter what the platform, program, or language is.

Characters before Unicode

Fundamentally, computers just deal with numbers.  They store letters and other characters by assigning a number for each one.  Before the Unicode standard was developed, there were many different systems, called character encodings, for assigning these numbers.  These earlier character encodings were limited and did not cover characters for all the world’s languages. Even for a single language like English, no single encoding covered all the letters, punctuation, and technical symbols in common use.  Pictographic languages, such as Japanese, were a challenge to support with these earlier encoding standards.

Early character encodings also conflicted with one another.  That is, two encodings could use the same number for two different characters, or use different numbers for the same character.  Any given computer might have to support many different encodings. However, when data is passed between computers and different encodings it increased the risk of data corruption or errors.

Character encodings existed for a handful of “large” languages.  But many languages lacked character support altogether.

The emergence of the Unicode Standard and access to tools supporting it are among the most significant recent global software trends.

Each character in a language is assigned a unique code.  Check out the complete list (warning:  there are close to 150,000 and counting!).


Unicode characters — A Global Standard to Support ALL the World’s Languages

The Unicode standard was developed to address these issues.  The standard was created on an encoding foundation large enough to support the writing systems used by all the world’s languages.  Over the years the Unicode standard encoding has been steadily expanded and now includes languages like Cherokee, Mongolian, and ancient Egyptian hieroglyphics.  Beyond simply providing a standardized system of character codes, the Unicode Consortium has expanded the scope of its efforts to include standard “locale” data, such as how a date is formatted in Arabic or Swahili, and code libraries that assist programmers to develop.

The variety of languages found on the Web today is thanks to the character support provided by Unicode, which enables computers to support virtually every language in use in the world today, and for users and programmers to develop content in their own native language.

We provide a unique code for every character, in every language, in every program, on every platform.

This includes popular languages like English and Mandarin, but also endangered languages like Navajo.  And we can’t do it alone. Help us preserve the world’s heritage by Adopting a Character or becoming a Member.