Unicode CLDR 36 provides an update to the key building blocks for
software supporting the world’s languages. CLDR data is used by all
major software
systems
for their software internationalization and localization, adapting
software to the conventions of different languages for such common software
tasks.

CLDR 36 included a
full Survey Tool data collection phase
, adding approximately 32K new
translated fields, with significant increases in moderate and/or modern coverage
for: ceb (Cebuano), ha (Hausa / Latin script), ig (Igbo), kok (Konkani), qu
(Quechua), to (Tongan), yo (Yoruba). Seed data was added for several new
languages: cic (Chickasaw), mus (Muscogee), osa (Osage, Osage script); an
(Aragonese), su (Sundanese, Latin script), szl (Silesian).

Enhancements in v36 include:

  • New Emoji 13 draft candidates’ names and search keywords are
    included in this release to support smooth adoption of the upcoming Emoji
    release (scheduled for release in 2020Q1 as part of Unicode 13)
  • New measurement units and patterns: dot-per-centimeter,
    dot-per-inch, em, megapixel, pixel, pixel-per-centimeter, pixel-per-inch;
    decade; therm-us; bar, pascal; and a pattern for combining units in a
    multiplicative relationship, such as “newton-meter”.
  • Locale IDs:
    • Extended Language Matching to have fallbacks for many
      encompassed languages.
    • Added more languageAliases from the BCP47 language subtag
      registry, for deprecated languages.
  • A new test directory added for localeIdentifiers,
    graphemeClusters (for currently supported Indic languages) and
    transliterations.

There are some infrastructure changes to be aware of, including:

  • The cldr repository has moved from subversion to git, and
    queries using Trac no longer work. See
    CLDR Change Requests
    for new information.
  • The data in the cldr repository now preserves votes for
    inherited data, indicated with “↑↑↑”. In order to generate CLDR in the
    previous form without “↑↑↑” and with proper minimization, a new tool
    GenerateProductionData is available.
    Note: Release data that has been
    processed with GenerateProductionData is available in a parallel

    cldr-staging
    repository, with the same release tags.


The Common Locale Data Repository (CLDR) provides key building
blocks for software to support the world’s languages, with the largest and most
extensive standard repository of locale data available. This data is used by a
wide spectrum of companies
for their software internationalization and localization, adapting software to
the conventions of different languages for such common software tasks as:

  • Locale-specific patterns for formatting and parsing: dates,
    times, time zones, numbers and currency values, measurement units,…
  • Translations of names: languages, scripts, countries and
    regions, currencies, eras, months, weekdays, day periods, time zones,
    cities, and time units, emoji characters and sequences (and search
    keywords),…
  • Language & script information: characters used; plural cases;
    gender of lists; capitalization; rules for sorting & searching; writing
    direction; transliteration rules; rules for spelling out numbers; rules for
    segmenting text into graphemes, words, and sentences; keyboard layouts;…
  • Country information: language usage, currency information,
    calendar preference, week conventions, …
  • Validity: Definitions, aliases, and validity information for
    Unicode locales, languages, scripts, regions, and extensions,…

Over 136,000 characters are available for adoption, to
help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]