About the Unicode Consortium
The Unicode Consortium is the standards body for the internationalization of software and services. Deployed on more than 20 billion devices around the world, Unicode also provides the solution for internationalization and the architecture to support localization.
Quick Facts
- Founded in 1988, incorporated in 1991
- Public benefit, 501(c)3 non-profit organization
- Open source standards, data, and software development
- Orchestrates the contributions of 100s of professionals, expert volunteers, and language experts
- 30+ organizational members across corporate, academic, and governmental institutions
- Funded by membership dues and donations
Operating Values
- Local solutions require global collaboration
- Localization respects and empowers users
- Interoperability across platforms serves you – and the greater good
- Transparency and open source ensure: Reliability — Security — Stability
How Did Unicode Get its Name?
The Unicode Consortium started out as the standards body for character encoding and derives its name from three main goals:
- universal (addressing the needs of world languages)
- uniform (fixed-width codes for efficient access), and
- unique (bit sequence has only one interpretation into character codes)
Since that time, it has expanded to be far more than character encoding. Its work now includes the character properties and algorithms (the ‘instructions’ for how characters work), language and locale data for internationalization, and production software libraries to make everything accessible to programs.
Characters before Unicode
Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before the Unicode standard was developed, there were many different systems, called character encodings, for assigning these numbers. These earlier character encodings were limited and did not cover characters for all the world’s languages. Even for a single language like English, no single encoding covered all the letters, punctuation, and technical symbols in common use. Pictographic languages, such as Japanese, were a challenge to support with these earlier encoding standards.
Early character encodings also conflicted with one another. That is, two encodings could use the same number for two different characters, or use different numbers for the same character. Any given computer might have to support many different encodings. However, when data is passed between computers and different encodings it increased the risk of data corruption or errors.
Character encodings existed for a handful of “large” languages. But many languages lacked character support altogether.