The Unicode CLDR Survey Tool is open for submission for version 42. CLDR provides
key building blocks for software to support the world’s languages (dates, times,
numbers, sort-order, etc.) For example, all major browsers and all modern mobile
phones use CLDR for language support. (See
Who uses CLDR?)
Via the online Survey Tool, contributors supply data for their
languages — data that is widely used to support much of the world’s software.
This data is also a factor in determining which languages are supported on
mobile phones and computer operating systems.
Version 42 is focusing on:
Unicode 15.0 additions:
emoji, script names, collation data (Chinese & Japanese), …
New Languages: Adding Haryanvi, Bhojpuri, Rajasthani at a Basic level.
Up-leveling: Xhosa, Hinglish (Hindi-Latin), Nigerian Pidgin, Hausa, Igbo, Yoruba, and Norwegian Nynorsk.
Person Name Formatting: for handling the wide variety in the way that people’s names work in different languages.
People may have a different number of
names, depending on their culture–they might have only one name
(“Zendaya”), two (“Albert Einstein”), or three or more.
People may have multiple words in a
particular name field, eg “Mary Beth” as a given name, or “van Berg” as
Some languages, such as Spanish, have
two surnames (where each can be composed of multiple words).
The ordering of name fields can be
different across languages, as well as the spacing (or lack thereof) and
Name formatting need to be adapted to
different circumstances, such as a need to be presented shorter or
longer; formal or informal context; or when talking about someone, or
talking to someone, or as a monogram (JFK).
Submission of new data opened recently, and is slated to finish on
June 22. The new data then enters a vetting phase, where contributors work out
which of the supplied data for each field is best. That vetting phase is slated
to finish on July 6. A public alpha makes the draft data available around August
17, and the final release targets October 19.
Each new locale starts with a small set of Core data, such as a
list of characters used in the language. Submitters of those locales need to
bring the coverage up to Basic level (very basic basic dates, times, numbers,
during the next submission cycle. In version 41, the following levels were
Suitable for full UI internationalization
Afrikaans, … Čeština, … Dansk, … Eesti, … Filipino, … Gaeilge, … Hrvatski, Indonesia, … Jawa, Kiswahili, Latviešu, … Magyar, …Nederlands, … O‘zbek, Polski, … Română, Slovenčina, … Tiếng Việt, … Ελληνικά, Беларуская, … ᏣᎳᎩ, Ქართული, Հայերեն, עברית, اردو, … አማርኛ, नेपाली, … অসমীয়া, বাংলা, ਪੰਜਾਬੀ, ગુજરાતી, ଓଡ଼ିଆ, தமிழ், తెలుగు, ಕನ್ನಡ, മലയാളം, සිංහල, ไทย, ລາວ, မြန်မာ, ខ្មែរ, 한국어, … 日本語, …
Suitable for full “document content” internationalization, such as formats in a spreadsheet.
Binisaya, … Èdè Yorùbá, Føroyskt, Igbo, IsiZulu,
Kanhgág, Nheẽgatu, Runasimi, Sardu, Shqip, سنڌي, …
Suitable for locale selection, such as choice of
language in mobile phone settings.
Asturianu, Basa Sunda, Interlingua, Kabuverdianu,
Lea Fakatonga, Rumantsch, Te reo Māori, Wolof, Босански
(Ћирилица), Татар, Тоҷикӣ, Ўзбекча (Кирил), کٲشُر, कॉशुर
(देवनागरी), …, মৈতৈলোন্, ᱥᱟᱱᱛᱟᱲᱤ, 粤语 (简体)
* Locales are variants for different countries or scripts.
to help the Unicode Consortium’s work on digitally disadvantaged languages