Regex image Regular expressions are a powerful tool for using patterns to search and modify text, and are vital in many programs, programming languages, databases, and

Starting in 1999, UTS #18: Unicode Regular Expressions has supplied guidelines and conformance
levels for supporting Unicode in regular expressions. The new version 21 broadens the scope of properties for regular expressions (regex) to allow for properties of strings (such as for emoji sequences). For example, the following matches all emoji flags except the French flag:


Among the improvements are:

  • Provides a new

    Annex D: Resolving Character Classes with Strings
    for handling negations
    of sets of strings.
  • Updates the full property list to include
    the latest UCD properties, plus Emoji properties and UTS #39 properties.
  • Removes obsolete text passages, and makes
    editorial changes for clarity.

Over 140,000 characters are available for adoption
to help the Unicode Consortium’s work on digitally disadvantaged languages