About the Unicode Consortium

The Unicode Consortium is the standards body for the internationalization of software and services. Deployed on more than 20 billion devices around the world, Unicode also provides the solution for internationalization and the architecture to support localization.

Quick Facts

Founded in 1988, incorporated in 1991
Public benefit, 501(c)3 non-profit organization
Open source standards, data, and software development
Orchestrates the contributions of 100s of professionals, expert volunteers, and language experts
30+ organizational members across corporate, academic, and governmental institutions
Funded by membership dues and donations

Operating Values

Local solutions require global collaboration

Localization respects and empowers users
Interoperability across platforms serves you – and the greater good
Transparency and open source ensure: Reliability — Security — Stability

How Did Unicode Get its Name?

The Unicode Consortium started out as the standards body for character encoding and derives its name from three main goals:

universal (addressing the needs of world languages)
uniform (fixed-width codes for efficient access), and
unique (bit sequence has only one interpretation into character codes)

Since that time, it has expanded to be far more than character encoding. Its work now includes the character properties and algorithms (the ‘instructions’ for how characters work), language and locale data for internationalization, and production software libraries to make everything accessible to programs.

Characters before Unicode

Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before the Unicode standard was developed, there were many different systems, called character encodings, for assigning these numbers. These earlier character encodings were limited and did not cover characters for all the world’s languages. Even for a single language like English, no single encoding covered all the letters, punctuation, and technical symbols in common use. Pictographic languages, such as Japanese, were a challenge to support with these earlier encoding standards.

Early character encodings also conflicted with one another. That is, two encodings could use the same number for two different characters, or use different numbers for the same character. Any given computer might have to support many different encodings. However, when data is passed between computers and different encodings it increased the risk of data corruption or errors.

Character encodings existed for a handful of “large” languages. But many languages lacked character support altogether.