[beta image] The beta version of
Unicode CLDR version 38
is now available. The data will not be changed except for showstoppers, but the
LDML v38 spec
can still be changed. The final release of v38 is planned for October 28, 2020.
If you find any problems, please file a ticket.

Unicode CLDR provides an update to the key building blocks for
software supporting the world’s languages. CLDR data is used by all
major software
(including all mobile phones) for their software
internationalization and localization, adapting software to the conventions of
different languages.

CLDR v38 includes:

  • Enhancements to existing locale data:
    adding support for units of measurement in inflected languages (phase 1),
    adding annotations (names and search keywords) for Unicode symbols that are
    non-emoji (~400), and annotations for  Emoji v13.1. 
  • Survey Tool upgrades: substantial
    performance improvements, plus structured forum entries to improve
    coordination among translators.

LDML v38 includes:

  • To make the canonicalization of locale
    identifiers clear and unambiguous, provided major restructuring of the
    specification for it. (This was done in concert with fixes to the alias data
    to work better with the specification.)
  • To support inflected units of measurement:
    • minimalPairs adds new elements
      caseMinimalPairs and
    • unit adds a new element gender
    • grammaticalData adds new elements
      deriveCompound, and deriveComponent
    • unitPattern adds a new attribute case
    • grammaticalCase, grammaticalGender, grammaticalDefiniteness add a new attribute
    • compoundUnitPattern1 adds new attributes case and gender
    • compoundUnitPattern adds a new attribute case
  • To allow for overriding dictionary-based segmentation breaks, added the

    Unicode Dictionary Break Exclusion Identifier
    , with the new key “dx”.
  • For picking the correct units of measurement for locales, defined the userPreferences skeleton more precisely.
  • For accurate plural categories in compact numbers, added the ‘c’ operand to plural rules to provide formatting for languages such as French.

See additional details in the draft CLDR v38 Release note.

The overall changes to the data items were:

Added Deleted Changed Total
155,131 33,805 45,895 2,175,821

Over 140,000 characters are available for adoption
to help the Unicode Consortium’s work on digitally disadvantaged languages