Character Properties, Case Mappings & Names FAQ

submited by
Style Pass
2024-12-09 21:30:04

Several Unicode Technical Standards and Unicode Technical Reports also define their own properties, which are listed separately. There is also the large collection of data specifically for Unified ideographs, called the “Unihan” Database, which forms a separate subset of the Unicode character properties. It's structure and contents are significantly different so that it isn't generally included when talking about the “UCD”. [AF]

The Unicode Character Database (UCD) is a collection of plain text files, updated for every release of the Unicode Standard. Those plain text files contain information about the properties of every Unicode character. All files for the most up-to-date version of the UCD are always located at https://www.unicode.org/Public/UCD/latest/ on the Unicode website. This location also includes the Unihan database.

The details can be found in UAX #44, Unicode Character Database. (The Unihan database is documented in UAX #38, The Unicode Han Database (Unihan).) See also the FAQ page.

Leave a Comment