tr23-15.html
1598 lines<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head><base href="https://www.unicode.org/reports/tr23/tr23-15.html">
<title>UTR #23: The Unicode Character Property Model</title>
<link rel="stylesheet" type="text/css" href="https://www.unicode.org/reports/reports-v2.css">
<style type="text/css">
<!--
blockquote.tus { border-style:solid; border-width:.25pt; background-color:#F0F0F0; padding-left:.75em; padding-right:.5em; font-size:90% }
dd { margin-bottom: 0.75em }
-->
</style>
</head>
<body>
<table class="header">
<tr>
<td class="icon" style="width:38px; height:35px">
<a href="https://www.unicode.org/">
<img border="0" src="https://www.unicode.org/webscripts/logo60s2.gif" align="middle"
alt="[Unicode]" width="34" height="33"></a>
</td>
<td class="icon" style="vertical-align:middle">
<a class="bar"> </a>
<a class="bar" href="https://www.unicode.org/reports/"><font size="3">Technical Reports</font></a>
</td>
</tr>
<tr>
<td colspan="2" class="gray"> </td>
</tr>
</table>
<!-- BEGIN OF DOCUMENT TITLE, DATE AND VERSION -->
<div class="body">
<h2 align="center">Unicode® Technical Report #23</h2>
<h1 align="center">The Unicode Character Property Model</h1>
<table class="simple" width="90%">
<tr>
<td>Editors</td>
<td>Ken Whistler (<a href="mailto:ken@unicode.org">ken@unicode.org</a>),
Asmus Freytag (<a href="mailto:asmus@unicode.org">asmus@unicode.org</a>)</td>
</tr>
<tr>
<td>Date</td>
<td>2022-11-09</td>
</tr>
<tr>
<td>This Version</td>
<td><a href="https://www.unicode.org/reports/tr23/tr23-15.html">
https://www.unicode.org/reports/tr23/tr23-15.html</a></td>
</tr>
<tr>
<td>Previous Version</td>
<td><a href="https://www.unicode.org/reports/tr23/tr23-13.html">
https://www.unicode.org/reports/tr23/tr23-13.html</a></td>
</tr>
<tr>
<td>Latest Version</td>
<td><a href="https://www.unicode.org/reports/tr23/">https://www.unicode.org/reports/tr23/</a></td>
</tr>
<tr>
<td>Revision</td>
<td><a href="#Modifications">15</a></td>
</tr>
</table>
<!-- BEGIN OF DOCUMENT FRONT MATTER -->
<h4>Summary</h4>
<p><i>This document presents a conceptual model of character properties
defined in the Unicode Standard. The model also covers properties for enumerated character sequences as well as string functions.</i></p>
<h4>Status</h4>
<!-- NOT YET APPROVED
<p class="changed"><i>This document is a <b><font color="#ff3333"> proposed update
of a previously approved Unicode Technical Report</font></b>. This document
may be updated, replaced, or superseded by other documents at any time.
Publication does not imply endorsement by the Unicode Consortium. This
is not a stable document; it is inappropriate to cite this document as other
than a work in progress.</i></p>
END NOT YET APPROVED -->
<!-- APPROVED -->
<p><i>This document has been reviewed by Unicode members
and other interested
parties, and has been approved for publication by the Unicode Consortium.
This is a stable document and may be used as reference material or cited as
a normative reference by other specifications.</i></p>
<!-- END APPROVED -->
<blockquote>
<p><i><b>A Unicode Technical Report (UTR)</b> contains
informative material. Conformance to the Unicode Standard does not
imply conformance to any UTR. Other specifications, however, are
free to make normative references to a UTR.</i></p>
</blockquote>
<p><i>Please submit corrigenda and other comments with the online reporting
form [<a href="https://www.unicode.org/reporting.html">Feedback</a>].
Related information that is useful in understanding this document is found in the
<a href="#References">References</a>.
For the latest version of the Unicode Standard, see [<a href="https://www.unicode.org/versions/latest/">Unicode</a>].
For a list of current Unicode Technical Reports, see [<a href="https://www.unicode.org/reports/">Reports</a>].
For more information about versions of the Unicode Standard, see [<a href="https://www.unicode.org/versions/">Versions</a>].</i></p>
<h3><i>Contents</i></h3>
<ol class="toc">
<li><a href="#Scope">Scope</a></li>
<li><a href="#Overview">Overview</a>
<ul class="toc">
<li>2.1 <a href="#Origin">Origin of Character Properties</a></li>
<li>2.2 <a href="#Context">Character Behavior in Context</a></li>
<li>2.3 <a href="#Relation">Relation of Character Properties to Algorithms</a></li>
<li>2.4 <a href="#CodePointProperties">Code Point Properties and
Abstract Character Properties</a></li>
<li>2.5 <a href="#StringProperties">Properties Applied to Strings</a></li>
<li>2.6 <a href="#Normative">Normative Properties</a></li>
<li>2.7 <a href="#Informative">Informative Properties</a></li>
<li>2.8 <a href="#Referring">Referring to Properties</a></li>
<li>2.9 <a href="#CharacterDatabase">The Unicode Character Database</a></li>
</ul>
</li>
<li><a href="#Definitions">Definitions</a>
<ul class="toc">
<li>3.1 <a href="#PropertiesDefinitions">Properties and Property Values</a></li>
<li>3.2 <a href="#PropertyValueTypeDefinitions">Types of Property Values</a></li>
<li>3.3 <a href="#PropertyTypeDefinitions">Types of Properties</a></li>
<li>3.4 <a href="#ConformanceStatusDefinitions">Conformance Status of Properties</a></li>
<li>3.5 <a href="#PropertyClassificationDefinitions">Classification of Properties</a></li>
<li>3.6 <a href="#StringDefinitions">Strings</a></li>
<li>3.7 <a href="#PropertyStringsDefinitions">Properties of Strings</a></li>
<li>3.8 <a href="#StringFunctionsDefinitions">String Functions</a></li>
<li>3.9 <a href="#StringFunctionClassificationDefinitions">Classification of
String Functions</a></li>
<li>3.10 <a href="#OtherDefinitions">Other Definitions</a></li>
</ul>
</li>
<li><a href="#Conformance">Conformance-related Considerations</a>
<ul class="toc">
<li>4.1 <a href="#Requirements">Conformance Requirements</a></li>
<li>4.2 <a href="#Algorithms">Algorithms and Character Properties</a></li>
<li>4.3 <a href="#Overriding">Overriding Properties and Higher-level
Protocols</a></li>
</ul>
</li>
<li><a href="#Maintenance">Updating Character Properties and Extending the
Standard</a>
<ul class="toc">
<li>5.1 <a href="#Updating">Updating Properties</a></li>
<li>5.2 <a href="#Guarantees">Stability Guarantees</a></li>
<li>5.3 <a href="#Consistency">Consistency of Properties</a></li>
<li>5.4 <a href="#Provisional">Provisional Properties</a></li>
<li>5.5 <a href="#Unmaintained">Stabilized Properties</a></li>
</ul>
</li>
<li><a href="#SpecialValues">Special Property Values</a>
<ul class="toc">
<li>6.1 <a href="#NA">Not Applicable Value</a> </li>
<li>6.2 <a href="#Default">Default Values</a></li>
<li>6.3 <a href="#Preliminary">Preliminary Property Assignments</a></li>
</ul>
</li>
</ol>
<ul class="toc">
<li><a href="#References">References</a></li>
<li><a href="#Acknowledgements">Acknowledgements</a></li>
<li><a href="#Modifications">Modifications</a></li>
</ul>
<hr>
<h2>1. <a name="Scope" href="#Scope">Scope</a></h2>
<p>This report presents a general overview and
typology of character properties and property values, as well as those of
properties of enumerated character sequences and string functions.
This description of the Unicode character property model is not intended to
supersede the normative information on properties in The Unicode
Standard [<a href="#Unicode">Unicode</a>], nor the existing body
of technical reports and documentation files in the Unicode Character
Database [<a href="#UCDDoc">UCDDoc</a>] that provide detailed descriptions for
particular character properties or properties of enumerated character sequences and
string functions. Instead it focuses on the overall model behind and common aspects
of all of these.</p>
<p>This report specifically covers formal <b>character properties</b>, which
are those attributes of characters specified according to the
definitions set forth in this report. Such formal character properties are only a subset
of character properties in the generic sense, and they further subdivide into those properties
defined in the Unicode Standard or Unicode Character Database, and those defined by related
standards. Also included in the scope are formal <strong>properties of enumerated
character sequences</strong> and <strong> string functions.</strong></p>
<h2>2. <a name="Overview" href="#Overview">Overview</a></h2>
<p>At its most basic, a character
property relates a character to a value. Thus, a property can be considered
a function that maps from code points to specific
property values. These concepts can be readily extended to mapping a specific sequence
of characters to a property value, or to generic string functions that algorithmically
map arbitrary strings or substrings to property values. To keep the discussion simple,
the basic concepts are introduced in the context of properties of individual characters
or code points.</p>
<h3>2.1 <a name="Origin" href="#Origin">Origin of Character Properties</a></h3>
<p>The Unicode Standard views character semantics as inherent to the
definition of a character, and conformant processes are required to take these
into account when interpreting characters. </p>
<blockquote class="tus">
<p><i>D3 Character semantics:</i> The semantics of a character are
determined by its identity, normative properties, and behavior.</p>
</blockquote>
<blockquote>
<p><b>Note:</b> Quotations from the core specification of the Unicode Standard
are cited in this indented boxed style for clarity. Definition numbers or conformance
clause numbers in those citations are as in the core specification.</p>
</blockquote>
<p>The assignment of character semantics in the Unicode Standard is based on
character behavior. Other character set standards leave it to the
implementer, or to unrelated secondary standards, to assign character
semantics to characters. In contrast, the Unicode Standard supplies a
rich set of character attributes, called properties, for each character
contained in it. Many properties are specified in relation to
processes or algorithms that interpret them, in order to implement the
character behavior. There are character behaviors that are specific to a
particular text process and that have not been formally defined in the
Unicode Standard. Implementations often provide internal definitions of
character properties to achieve the desired behavior. Implementers may find
many of the concepts discussed here applicable to such cases.</p>
<h3>2.2 <a name="Context" href="#Context">Character Behavior in Context</a></h3>
<p>The interpretation of some properties (such as whether a character is a
digit or not) is
largely independent of context, whereas the interpretation of others (such as
directionality) is applicable to a character sequence as a whole, rather than
to the individual characters that compose the sequence.</p>
<p>Other examples that require context include title casing, and the
classification of
punctuation or symbols for
script assignments. The line breaking rules
of <i><a href="https://www.unicode.org/reports/tr14/">UAX<span class="changedspan"> </span>#14
Unicode Line Breaking Algorithm</a>
</i>[<a href="#LineBreak">LineBreak</a>]
involve character pairs and triples, and in certain cases, longer sequences.
The glyph(s) defined by a combining character sequence are the result of
contextual analysis in the display shaping engine. Isolated character
properties typically only tell part of the story. Characters that are constituent
elements of an enumerated list of character sequences obviously exist in the context
of such sequences. However, the property defined for specific, enumerated lists of
sequences discussed below is different from the kind of algorithmic context discussed
here. In fact, algorithms may be defined to evaluate the contexts surrounding not only
individual characters or code points, but also the context surrounding certain
enumerated character sequences.</p>
<p>In some cases, the expected character behavior depends on external context,
such as the type and nature of the document, the language of the text, or the
cultural expectations of the user. Properties modeling such behaviors
may be specified in separate standards, as is the case for
the <i><a href="https://www.unicode.org/reports/tr10/">UTS #10 Unicode Collation Algorithm</a></i>
[<a href="#UCA">UCA</a>]. Where a reasonably generic set of property values
can be assigned, for example for [<a href="#LineBreak">LineBreak</a>], such properties may
be defined as part of [<a href="#Unicode">Unicode</a>].
Such properties and any algorithms related to them define useful default
behavior, which can be further customized or tailored to meet more specific
requirements.</p>
<h3>2.3 <a name="Relation" href="#Relation">Relation of Character Properties to Algorithms</a></h3>
<p>When modeling character behavior with computer processes, formal character
properties are assigned to achieve the expected results. Such
modeling depends heavily on the algorithms used to produce these results. In some cases, a given character
property is specified in close conjunction with a detailed specification of an
algorithm. In other cases, algorithms are implied but not specified, or there
are several algorithms that can make use of the same general character
property, such as the classification of characters by
General_Category or Indic_Syllabic_Type.
Such general properties may
require occasional implementation-specific adjustments in character property
assignment to make all algorithms work correctly. This can usually be achieved
by overriding specific properties for specific algorithms.
(See also <a href="#Overriding">Section 4.3</a> "Overriding Properties via Higher-level
Protocols")</p>
<p>When assigning character properties for use with a given algorithm, it may
be tempting to assign somewhat arbitrary values to some characters, as long as
the algorithm happens to produce the expected results. Proceeding in
this way hides the nature of the character and limits the re-use of character
properties by related processes. Therefore, instead of tweaking the properties
to simply make a particular algorithm easier, the Unicode Standard pays
careful attention to the essential underlying linguistic identity of the
character. However, not all aspects of a character’s identity are relevant in
all circumstances, and some characters can be used in many different ways,
depending on context or circumstance. This means the formal character
properties alone are not sufficient to describe the complete range of
desirable or acceptable character behaviors.</p>
<blockquote>
<p><b>Note:</b> In some cases, the relevant algorithm is not defined in the
Unicode standard. For example, the algorithm that converts strings of digits into
numerical values is not defined in the Unicode Standard, but
implementations will nevertheless refer to the numeric_value property.</p>
</blockquote>
<h3>2.4 <a name="CodePointProperties" href="#CodePointProperties">Code Point And Abstract Character Properties</a></h3>
<p>Code point properties are properties of code points per se: in
a character encoding standard these are independent of any assignment of actual
abstract characters to those code points. In most character encoding standards, these are
trivial, but in the Unicode Standard they are not. </p>
<p>Examples of code point properties include:
</p>
<ul>
<li>Code point XXX is a surrogate code point.</li>
<li>Code point XXX is a private use code point.</li>
<li>Code point XXX is a reserved code point.</li>
<li>Code point XXX is reserved for encoding format control characters.</li>
<li>Code point XXX is earmarked for encoding a RTL script.</li>
<li>Code point XXX is a Pattern_Syntax code point.</li>
<li>Code point XXX is a Pattern_Whitespace code point.</li>
<li>Code point XXX is located on Plane 1.</li>
</ul>
<p>These statements remain true of a code point whether or not there is a
particular abstract character assigned to them.
For example, they track status of the code points:
whether any abstract character is assigned to them or can be assigned to them, and so on.
Essentially, whenever code points are designated or ranges are reserved in
some way, code point properties are assigned.</p>
<p>Character properties are those properties that abstract
characters have independent of any consideration of their encoding.</p>
<p>Examples of character properties, not limited to formal properties, include:</p>
<ul>
<li>G is an alphabetic character.</li>
<li>G is in the Latin script.</li>
<li>G is an uppercase letter.</li>
<li>G is not used in hexadecimal expressions.</li>
<li>G collates after F in the English alphabet.</li>
<li>G was putatively invented by Spurius Carvilus Ruga ca. 300.</li>
<li>G commonly represents the velar voiced stop in orthographies.</li>
<li>G is not a punctuation character.</li>
<li>G denotes giga in the SI system of nomenclature.</li>
<li>G has no diacritic.</li>
<li>G is a base character.</li>
<li>G is not a combining character.</li>
</ul>
<p>By virtue of encoding the abstract character LATIN CAPITAL LETTER G
at the code point U+0047, this universe of character properties, some known
and obvious, others obscure or even undiscovered, are associated with that code point. </p>
<p>Some of those character properties are generic and systematic
enough to be useful or even necessary in the implementation of general text processing algorithms
— those are the ones that the Unicode Standard formalizes as properties in the
Unicode Character Database. </p>
<p>General text processing algorithms and the
programming APIs through which they are accessed must be prepared to deal
with any code point, even one that is unassigned to any characters at the
time the implementation was created. As a result, they nearly always need to
properly handle each and every code point for any character property, even if they
only associate a property value of 'unknown' or 'inapplicable' to unassigned
or unsupported code points.</p>
<p>This requirement leads to the use of the unifying concept
of <strong>Encoded Character Property</strong> in the Unicode character property model. An
encoded character property combines the concept of a code point property
associating ranges of code points with default values of a property, with
the concept of a character property associating specific values to the
assigned characters. This unified model correlates well with the reality of
Unicode-based implementations, which must supply some value for each and
every code point. In addition, this unified concept simplifies most of the
definitions that are built on top of it, since it is no longer necessary to
separately account for definitions applying to character properties vs. code
point properties.</p>
<h3>2.5 <a name="StringProperties" href="#StringProperties">Properties Applied to Strings</a></h3>
<p>Character and code point properties are defined such that all assigned characters and
all code points have a defined property value, even if that value is "N/A"
("does not apply"). Assigned characters and code points each form a finite set.
This is generally not true for strings. Because there is no inherent, fixed limit to the
length of a string, the number of possible sequences is in principle not bounded. Some
properties for strings can be described algorithmically, via String Functions, and such
properties can be said to apply to every possible string. Other properties apply only to
a specific set of strings which is listed explicitly. In this latter case, the properties
are referred to as properties of an <strong>enumerated set of strings</strong>. These
concepts are elaborated below in
Section 3.6, <a href="#StringDefinitions">Strings</a>, and
Section 3.7, <a href="#PropertyStringsDefinitions">Properties of Strings</a>.</p>
<h3>2.6 <a name="Normative" href="#Normative">Normative Properties</a></h3>
<p>In Chapter 3, <i>Conformance</i>, The Unicode Standard [<a href="#Unicode">Unicode</a>]
defines a <em>Normative Property</em> as "a Unicode character property used in
the specification of the standard" (definition <em>D33</em>) and provides the
following explanation:</p>
<blockquote class="tus">
<p ALIGN="JUSTIFY">Specification that a character property is <i>normative</i>
means that implementations which claim conformance to a particular version
of the Unicode Standard and which make use of that particular property must
follow the specifications of the standard for that property for the
implementation to be conformant. For example, the Bidi_Class property is required for conformance whenever
rendering text that requires bidirectional layout, such as Arabic or Hebrew.</p>
<p ALIGN="JUSTIFY">Whenever a normative process depends on a
property in a specified way, that property is designated as
normative.</p>
<p ALIGN="JUSTIFY">The fact that a given Unicode character property
is normative does <i>not</i> mean that the values of the property will
never change for particular characters. Corrections and extensions
to the standard in the future may require minor changes to normative
values, even though the Unicode Technical Committee strives to
minimize such changes...</p>
<p ALIGN="JUSTIFY">Some of the normative Unicode algorithms depend
critically on particular property values for their behavior.
Normalization, for example, defines an aspect of textual
interoperability that many applications rely on to be absolutely
stable. As a result, some of the normative properties disallow any
kind of overriding by higher-level protocols. Thus the
decomposition of Unicode characters is both normative and <i>not
overridable</i>; no higher-level protocol may override these values,
because to do so would result in non-interoperable results for the
normalization of Unicode text. Other normative properties, such as
case mapping, are <i>overridable</i> by higher-level protocols,
because their intent is to provide a common basis for behavior.
Nevertheless, they may require tailoring for particular local cultural conventions
or particular implementations.</p>
</blockquote>
<p>By making a property normative and non-overridable, the Unicode Standard guarantees that
conformant implementations can rely on other conformant
implementations to interpret the character in the same way. This is most
useful for those properties where the Unicode Standard provides precise rules
for the interpretation of characters based on their properties, such as
the decompositions and their use by the Normalization forms [<a href="#Normal">Normal</a>].</p>
<blockquote>
<p><b>Note</b>: One trivial, but important example of conformant
implementation is runtime access to information from the Unicode Character Database
[<a href="#UCD">UCD</a>]. For
normative properties exposed by a conformant implementation,
conformance requires the returned values to match the values defined by
the Unicode Consortium.</p>
</blockquote>
<p>For some character properties, such as the general category, the Unicode
standard does not define what model of processing the property is intended to
support, nor does it specify the required consequences of a character being
defined as
"Letter Other" as opposed to "Symbol Other", for example. In the
absence of such definition, the only effect of conformance that can be
rigorously tested
is whether a conformant implementation of a character property
function returns the correct
value to its caller. However, many implementations use such normative
properties for their own purposes and guaranteed access to this information
helps interoperability.</p>
<p> For information on which properties are
normative, see the documentation
file for the Unicode Character Database [<a href="#UCDDoc">UCDDoc</a>].</p>
<p> For more information on overriding normative properties, see
Section 4.3 <a href="#Overriding"><i>Overriding properties via
Higher-level Protocols</i></a>.</p>
<h3>2.7 <a name="Informative" href="#Informative">Informative Properties</a></h3>
<p>The Unicode Standard [<a href="#Unicode">Unicode</a>]
defines an <em>Informative Property</em> as "a Unicode character property whose
values are provided for information only" (definition <em>D35</em>) and
provides the following explanation:</p>
<blockquote class="tus">
<p align="justify">A conformant implementation is free to use or change informative property values as it
may require, while remaining conformant to the standard. An implementer has the option of establishing a
protocol to convey that particular informative
properties are being used in distinct ways. </p>
<p align="justify">Informative properties capture expert implementation experience. When an informative property is
explicitly specified in the Unicode Character Database, its use is strongly <i>
recommended</i> for implementations to encourage comparable behavior between
implementations. Note that it is possible for an informative property in one
version of the Unicode Standard to become a normative property in a
subsequent version of the standard if its use starts to acquire conformance
implications in some part of the standard. [emphasis added].</p>
</blockquote>
<p>Properties may be informative for two main reasons:</p>
<ol>
<li>The exact nature or applicability of the property may be unclear. In some cases, the precise set of characters to which it
applies may also not be well-determined.</li>
<li>Existing implementations show a range of behaviors for the same
character, many or all of which may be equally useful choices on the part of
their designers.</li>
</ol>
<p>In some cases, properties are too tentative to be published as
informative properties. In that case they may be explicitly designated as <i>
provisional</i>.</p>
<h3>2.8 <a name="Referring" href="#Referring"> Referring to Properties</a></h3>
<p>The Property Aliases [<a href="#Alias">Alias</a>] and Property Value
Aliases [<a href="#ValueAlias">ValueAlias</a>] define
a set of names and abbreviations, called <em>aliases</em>, that are used to refer to properties and
property values. These names can be used for XML formats of data in
the <a href="https://www.unicode.org/ucd/">Unicode
Character Database</a> [<a href="#UCD">UCD</a>], for regular-expression
property tests, and other programmatic textual descriptions of Unicode data.
The names themselves are not normative, except where they correspond to
normative properties in the UCD. However, other standards may make normative
references to both normative and informative aliases. For more information, see
<a href="https://www.unicode.org/reports/tr18/">UTS
#18: <i>Unicode Regular Expressions</i></a> [<a href="#RegEx">RegEx</a>].</p>
<p>There is one abbreviated name and one long
name for most of the properties. Additional aliases may be added at
any time. The property <em>value</em> names are <i>not</i> unique across properties. For
example, <b>AL</b> means Arabic Letter for the Bidi_Class property, and <b>AL</b>
means Alpha_Left for the Combining_Class property, and <b>AL</b> means
Alphabetic for the Line_Break property. In addition, some property names may
be the same as some property value names. For example, <b>cc</b> means
Combining_Class property, and <b>cc</b> means the General_Category property
value Control. The combination of property value and property name is,
however, unique. </p>
<p>The aliases may be translated in appropriate environments, and additional
aliases may be used. The case distinctions, whitespace, and '_' in the
property names are not normative. Unless a specific form is required in a
particular application, all forms are equivalent. For further information see Section 5.9 <a href="https://www.unicode.org/reports/tr44/#Matching_Rules">Matching Rules</a> in <a href="https://www.unicode.org/reports/tr44/">UAX #44 Unicode Character Database</a> [<a href="#UCDDoc">UCDDoc</a>].</p>
<p>[<a href="#Unicode">Unicode</a>] Section 3.1 gives
a prescription for referencing properties: </p>
<blockquote class="tus">
<p><b><i> References to Unicode Character Properties</i></b></p>
<p> Properties and property values have defined names and
abbreviations, such as</p>
<blockquote>
<p>Property: General_Category (gc)<br>
Property Value: Uppercase_Letter (Lu)</p>
</blockquote>
<p>To reference a given property and
property value, these aliases are used, as in this example:</p>
<blockquote>
<p>The property value
Uppercase_Letter from the General_Category property, as
specified in Version 14.0.0 of the Unicode Standard.</p>
</blockquote>
<p>Then cite that version of the
standard, using the standard citation format that is provided for each version
of the Unicode Standard.</p>
</blockquote>
<p>Additional <a href="https://www.unicode.org/versions/#References" title="Reference Examples">reference examples</a> are available online.</p>
<h3>2.9 <a name="CharacterDatabase" href="#CharacterDatabase">The Unicode Character Database</a></h3>
<p>The Unicode Character Database [<a href="#UCD">UCD</a>] is the main
repository for machine-readable character properties. It consists of a
number of files containing property data along with a documentation file explaining the organization of the database and the format and meaning of the
property data. The main file, "The Unicode
Character Database" [<a href="#UCDDoc">UCDDoc</a>] explains the overall organization of the current
version of the UCD and tells which files contain which properties.</p>
<p>While the Unicode Consortium strives to minimize
changes to character property data, occasionally the character properties for
already encoded characters must be
updated. When this situation occurs, the relevant data files of the Unicode
Character Database are revised. The revised data files are posted on the Unicode
Web site as an update version of the standard.</p>
<p>A visual documentation of character code point, character name and
reference glyph, together with excerpts from some of the character
properties and augmented by additional annotations can be found in the Character
Code [<a href="#Charts">Charts</a>].</p>
<h2>3. <a name="Definitions" href="#Definitions">Definitions</a></h2>
<dl>
<dt>The following presents a consistent set of definitions related to
character properties. Where possible, these definitions match the formal
definitions in Chapter 3, <i>Conformance,</i> in [<a href="#Unicode">Unicode</a>].
In those cases, the original number of the definition is given at the
end of each definition in square brackets. As much as possible, the
definition numbers in this document will be retained as new definitions are
added. When referring to these definitions in other
contexts, it is customary to prefix the term 'Unicode' to the defined term
to indicate the context. For example, 'Character Property' becomes
'Unicode Character Property', etc.</dt>
</dl>
<h3>3.1 <a name="PropertiesDefinitions" href="#PropertiesDefinitions">Properties and Property Values</a></h3>
<dl>
<dt>PD1. Property</dt>
<dd>A named attribute of an entity in the Unicode Standard, associated with
a defined set of values. [D19]</dd>
<dt>PD2. Code Point Property</dt>
<dd>A property of code points. [D20]</dd>
<dd>A code point property defines a set of values and a mapping from each
Unicode code point to one of the values of the set.</dd>
<dt>PD3. Abstract Character Property</dt>
<dd>A property of abstract characters. [D21]</dd>
<dt>PD4. Encoded character property.</dt>
<dd>A property of encoded characters in the Unicode Standard. [D22]<br><br>
An encoded character property defines a set of values and a mapping from each
Unicode code point to one of the values of the set.</dd>
<dd>Encoded character properties typically map a default value to any code point not
assigned to a character.</dd>
</dl>
<p><i>In the rest of this document, as in the Unicode Standard, the term
'character property', or the term 'property' without qualifier includes both
character and code point properties and their combined form, the encoded
character properties.</i></p>
<dl>
<dt><i>PD5. Property Value</i></dt>
<dd>One of the set of values associated with a property. [D23 - but there
limited to 'encoded character property']<br>
<br>
For example, the East Asian Width [<a href="#EAW">EAW</a>]
property has the possible values "Narrow", "Neutral",
"Wide", "Ambiguous" and "Unassigned". See [<a href="#Alias">Alias</a>]
and [<a href="#ValueAlias">ValueAlias</a>] for a list of labels for
properties and their values respectively.</dd>
</dl>
<h3>3.2 <a name="PropertyValueTypeDefinitions" href="#PropertyValueTypeDefinitions">Types of Property Values</a></h3>
<dl>
<dt>PD6. Explicit Property Value</dt>
<dd>A value for an encoded character property which is explicitly
associated with a code point in one of the data files of the Unicode
Character Database. [D24]</dd>
<dt>PD7. Implicit Property Value</dt>
<dd>A value for an encoded character property which is given by a generic rule or by an "otherwise" clause in one of the data files of the Unicode Character Database. [D25]</dd>
<dt>PD8. Default Property Value</dt>
<dd>The value (or in some cases small set of values) of a property
associated with unassigned code points or with encoded characters for which the
property is irrelevant. [D26]</dd>
<dd><b>Note:</b> There may be more than one default value per property, with
different values for different ranges, as in the Bidi property.</dd>
</dl>
<h3>3.3 <a name="PropertyTypeDefinitions" href="#PropertyTypeDefinitions">Types of Properties</a></h3>
<dl>
<dt>PD9. Enumerated Property</dt>
<dd>A property with a small set of
named values. [D27]</dd>
<dd>As characters are added to the Unicode Standard, the set of values may
need to be extended in the future, but
enumerated properties, such as the LineBreak property have a relatively fixed set of possible values.</dd>
<dt>PD10. Closed Enumeration</dt>
<dd>An enumerated property for which the set of values is closed and will not be extended for future versions of the Unicode Standard.
[D28]</dd>
<dd><b>Note</b>: Currently, the General_Category and Bidi_Class properties are the only closed
enumerations, other than Boolean properties.</dd>
<dt>PD11. Boolean Property</dt>
<dd>A closed enumerated property whose set of values is limited to 'true'
and 'false'. [D29]</dd>
<dd>The presence or absence of the property is the essential
information.<br>
<br>
A Boolean property is sometimes called a 'single valued' property since
'false' often has the meaning of 'this property does not apply'.</dd>
<dt>PD12. Numeric Property</dt>
<dd>A numeric property is a property
whose value is a number that can take on any integer, or real value.
[D30]</dd>
<dd>An example is the Numeric_Value property. There is no
implied limit to the number of possible distinct values for the property,
except the limitations on representing integers or real numbers
in computers.</dd>
<dt>PD13. String-Valued Property </dt>
<dd>A property whose value is a string. [D31]</dd>
<dd>A string-valued property is one for which the <b>co-domain</b>, or set of values, consists of strings. (See PD32.)</dd>
<dd>The Canonical_Decomposition
property is a string-valued property.</dd>
</dl>
<blockquote>
<p><b>Note:</b> Properties classed in [<a href="#UCDDoc">UCDDoc</a>] as type "String-valued"
are string-valued properties. However, some properties classed as "Miscellaneous"
are also string-valued properties.</p>
</blockquote>
<dl>
<dt>PD13a: Identifier Property</dt>
<dd>A string-valued property that represents a member of a namespace, with additional
rules defining identifier well-formedness, uniqueness and comparison.</dd>
<dd>For example, a Unicode character name is part of a namespace that also includes
name aliases and named sequences.
Special rules are defined for comparison of names and for determination of uniqueness,
as well as for which characters are permissible.
See Section 4.8 Name in [<a href="#Unicode">Unicode</a>].</dd>
<dt>PD14. Catalog property</dt>
<dd>A property that is an
enumerated property, typically unrelated to
an algorithm, that may be extended
in each successive version of the Unicode
Standard. [D32]<br><br>
Examples are the Age, Block, and Script properties.
Additional new values may be added to the set
of enumerated values each time the standard is revised.
Each new Unicode version adds a new value for Age.
When a new block is added to the standard, a corresponding new value is added
to the Block property. Likewise, when a new script is added, a corresponding
new value of the Script property is also added.
</dd>
<dt>PD15. Miscellaneous property</dt>
<dd>A property not of the type Boolean, Enumerated, Numeric,
String-valued, Identifier, or Catalog.</dd>
<dd>The Script_Extensions property is a miscellaneous property.</dd>
</dl>
<blockquote>
<p><b>Note:</b> Actually, some properties classed in [<a href="#UCDDoc">UCDDoc</a>] as type "Miscellaneous"
can also be considered string-valued properties. The <i>Jamo_Short_Name</i> property is such an example.
The distinction is that most properties currently designated to be of type
"String-valued" are conceived of as mapping from some Unicode
character to some other Unicode character (or sequence of characters) for the purposes of such
operations as case mapping, case folding, or normalization of strings, whereas
the string values of
Miscellaneous properties tend to be just arbitrary strings.</p>
</blockquote>
<h3>3.4 <a name="ConformanceStatusDefinitions" href="#ConformanceStatusDefinitions">Conformance Status of Properties</a></h3>
<dl>
<dt>PD16. Normative Property</dt>
<dd>A [Unicode character] property used in the specification of the Unicode
standard. [D33]</dd>
<dd><b>Note</b>: A normative process that depends on a property in a normative and
testable way is usually sufficient reason to designate a property
as normative. For
example, the interpretation of the <i>bidirectional class</i> is precisely
defined in [<a href="#Bidi">Bidi</a>].</dd>
<dd>If a process does not interpret a given character, it may remain unaware
of its properties. However, it is recommended that processes use carefully-chosen default values for characters that they do not handle.</dd>
<dd>See also Section 2.6, <a href="#Normative">Normative Properties</a>.</dd>
<dt></dt>
<dt>PD17. Overridable Property</dt>
<dd>A normative property whose values may be overridden by conformant higher-level protocols.
[D34]</dd>
<dd>See Section 4.3 <a href="#Overriding"><i>Overriding properties via
Higher-level Protocols</i></a>.</dd>
<dt>PD18. Informative Property</dt>
<dd>A [Unicode character] property whose values are provided for information
only. [D35]
</dd>
<dd><b>Note</b>: Informative properties capture expert implementation
experience and their use is strongly recommended by the Consortium,
but there are no requirements on implementations of the Unicode
Standard.</dd>
<dd>See also Section 2.7, <a href="#Informative">Informative Properties</a>.</dd>
<dt>PD19. <a name="ProvisionalProperty"> Provisional Property</a></dt>
<dd>A [Unicode character] property whose values are unapproved and tentative,
and which may be incomplete or otherwise not in a usable state. [D36]<br>
<br>
Provisional properties may be removed from future versions of the standard,
without prior notice.</dd>
<dd>See also Section 5.4, <a href="#Provisional">Provisional Properties</a>.</dd>
</dl>
<h3>3.5 <a name="PropertyClassificationDefinitions" href="#PropertyClassificationDefinitions">Classification of Properties</a></h3>
<p><i>The following definitions do not define character or code point properties, but properties of
such properties. In the definitions in this section, the
term 'code point' is used inclusively to mean code point for a code point property and character for
a character property, respectively.</i></p>
<dl>
<dt>PD20. Context-dependent Property</dt>
<dd>A property that applies to a code point in the context of a longer code point sequence. [D37]<br>
<br>
For example, the lower case mapping of Greek sigma depends on the
surrounding characters.
See also PD33: <a href="#ContextDependentStringFunction"><i>C</i></a><i><a href="#ContextDependentStringFunction">ontext-dependent
String Function</a>.</i></dd>
<dt>PD21. Context-independent Property</dt>
<dd>A property that is not context-dependent: it applies to a code point in isolation.
[D38]</dd>
<dt>PD22. Stable Transformation</dt>
<dd>A transformation <i> T</i> on a property <i> P</i> is stable with respect to an
algorithm <i>A</i>, if the result of the algorithm on the transformed property
<i>A</i>(<i>T</i>(<i>P</i>)) is the same as the original result <i>A</i>(<i>P</i>) for all code points.
[D39]</dd>
<dt>PD23. Stable Property</dt>
<dd>A property is stable with respect to a particular algorithm or process,
as long as possible changes in the assignment of property values are restricted in such a manner that the result
of the algorithm on the property continues to be the same as the original result for all
previously assigned code points. [D40]</dd>
<dd>For example, while the absolute values of the canonical combining
classes are <i>not</i> guaranteed to be the same between versions of the Unicode
Standard, their relative values will be maintained. As a result, the
Canonical Combining Class, while not immutable, is a stable
property with respect to the Normalization Forms as defined in [<a href="#Normal">Normal</a>].</dd>
<dd><b>Note:</b> As new characters are assigned to previously unassigned code
points, replacing any default values for these code points with actual
property values must maintain stability.</dd>
<dt>PD24. Fixed Property</dt>
<dd>A property whose values (other than the default value), once associated
with a character or other designated code
point, are fixed and
will not be changed, except to correct obvious or clerical errors. [D41]</dd>
<dd>For a fixed property, any default values can be replaced without
restriction by actual property values, as new characters are assigned to
previously unassigned code points. Examples of fixed properties are Age or Hangul Syllable Type. </dd>
<dd><b>Note:</b> Designating a property as fixed does not imply stability
or immutability, see below.
While the age of a character, for example, is established by the version of the Unicode
Standard at which it was added, errors in the published listing of the property value
could be
corrected. For some other
properties, there are explicit stability guarantees that prohibit the
correction even of such errors. See Section
5.2 <i><a href="#Guarantees">Stability Guarantees</a></i>.
</dd>
<dt>PD25. Immutable Property</dt>
<dd>
A fixed property that is also subject to a stability guarantee
preventing <i>any</i> change in the published listing of property values
other than assignment of new values to formerly unassigned code points.
[D42]</dd>
<dd>
An immutable property is trivially stable with respect to <i>all</i>
algorithms. An example of an immutable property is the Unicode character
name. See Section 5.2 <i><a href="#Guarantees">Stability Guarantees</a></i>.
</dd>
<dd><b>Note:</b> Because character names are values of an immutable property, misspellings
and incorrect names will <i>never</i> be corrected. Any errata will be noted in a
comment in the names list, and, where needed, an informative character
name alias will be
provided.
</dd>
<dt>PD26. Stabilized Property</dt>
<dd>A property which is neither extended to new characters, nor maintained
in any other manner, but which is retained in the Unicode Character
Database. [D43]</dd>
<dd>A stabilized property is also a
fixed property.</dd>
<dt>PD27. Deprecated Property</dt>
<dd>A property whose use by implementations is discouraged.
[D44]<br>
<br>
One of the reasons a property may be deprecated is because a different
combination of properties better expresses the intended semantics. </dd>
<dd>Where
sufficiently widespread legacy support exists for the deprecated property,
not all implementation may be able to discontinue the use of the deprecated
property. In such a case, a deprecated property may be extended to new
characters, so as to maintain it in a usable and consistent state.</dd>
<dt>PD28. Simple Property</dt>
<dd>A property whose values are specified directly in the Unicode Character
Database (or elsewhere in the Unicode Standard) and whose values cannot be
derived from other simple properties. [D45]</dd>
<dt>PD29. Derived Property</dt>
<dd>A property whose values are algorithmically derived from some
combination of simple properties. [D46]</dd>
</dl>
<dl>
<dt>PD30. Property Alias</dt>
<dd>A unique identifier for a particular [Unicode character] property. [D47]</dd>
<dd>The set of property aliases forms a namespace. See Section 2.8 <a href="#Referring">Referring to Properties</a>.</dd>
<dt>PD31. Property Value Alias</dt>
<dd>A unique identifier for a particular enumerated value for a particular
[Unicode character] property. [D48]<br>
<br>
The set of property value aliases for each property form a separate
namespace. Values from different properties may have non-unique names. As a
trivial example, the property value aliases for all Boolean properties are
'true' and 'false'.<br>
<br>
See also Section 2.8 <a href="#Referring">Referring to Properties</a>.</dd>
</dl>
<h3>3.6 <a name="StringDefinitions" href="#StringDefinitions">Strings</a></h3>
<p>This section introduces definitions for strings, which are needed for the
discussion of properties of strings and the role of string functions in the
character property model.</p>
<dl>
<dt>PD32. String</dt>
<dd>An ordered sequence of zero or more code points.</dd>
<dd>At its most general, a string is any <i>coded character sequence</i>
but extending the concept to encompass the
empty sequence. Character mappings are common
examples of properties for which the values are <i>strings</i> but
not necessarily <i>Unicode strings</i>.</dd>
<dd>All code points in a <i>string</i> are from the same character encoding.</dd>
<dt>PD32a. Empty String</dt>
<dd>A string consisting of exactly zero code points.</dd>
<dd>Note that in principle any empty string is equivalent to
any other empty string, so in many contexts, an instance of an empty string
is simply referred to as <i>the</i> empty string.</dd>
</dl>
<p>The following three string-related definitions are
specified in Chapter 3, Conformance, of the Unicode Standard [<a href="#Unicode">Unicode</a>].</p>
<dl>
<dt>PD32b. Code Unit Sequence</dt>
<dd>An ordered sequence of one or more code units. [D78]</dd>
<dd>A code unit sequence may consist of a single code unit.</dd>
<dt>PD32c. Unicode String</dt>
<dd>A code unit sequence containing code units of a particular Unicode encoding form. [D80]</dd>
<dd>A single Unicode string must contain only code units from a single Unicode
encoding form. It is not permissible to mix forms within a string.</dd>
<dt>PD32d. Coded Character Sequence</dt>
<dd>An ordered sequence of one or more code points [D12].</dd>
<dd>A coded character sequence is also known as a <strong>coded character representation</strong>.</dd>
<dd>Normally a coded character sequence consists of a sequence of encoded characters, but it may also
include noncharacters or reserved code points.</dd>
</dl>
<p>Those definitions were originally developed to focus on the <i>identity</i> of encoded
characters and of
sequences of encoded characters, in the context of specifying Unicode encoding forms and
other concepts of the Unicode Standard. As such, the formal definitions do not include
zero-length sequences
as part of their definitions. Where these definitions are used in Chapter 3,
the <i>absence</i> of a character
is generally not pertinent to the explication.</p>
<p>In programming contexts, however, strings are almost always defined to
<i>include</i> the empty string as part of the class or type definition.
This is more elegant for implementations of strings and for the design of string-based APIs,
including those supporting the implementation of character properties. This distinction is important
for the discussion of the Unicode character property model. When the concept of
character properties is extended to deal with the properties of Unicode strings,
as well as single characters, implementations need to take the
empty string into account.</p>
<p>In the Unicode character property model, the primary concern is
with properties of characters (or code points), rather than the very limited concept
of properties which might apply directly to code units. To avoid clumsiness of
terminology, instead of using the formal definition, "coded character sequence," the
term <i>Unicode string</i> is simply stipulated, in this context, to also refer to
a coded character sequence, instead of only to a code unit sequence.</p>
<p>Furthermore, in the subsequent
discussion of properties of strings, for simplicity of presentation, any
mention of a <i>Unicode string</i> is also stipulated to extend to include
the <i>empty string</i>.</p>
<h3>3.7
<a name="PropertyStringsDefinitions" href="#PropertyStringssDefinitions">Properties of Strings</a></h3>
<p><i>None of the following definitions are found in the Unicode Standard at
this point; they extend the existing definitions to cover properties for character sequences.</i></p>
<dl>
<dt>PD32e. Enumerated Set of Strings</dt>
<dd>A set of Unicode strings enumerated by an explicit, finite list of members.</dd>
<dd>This definition is specified as a set, rather than as a list, because typically it
is not meaningful to implementations for the <i>same</i> sequences to be included
multiple times.</dd>
<dd>Note that an empty string may explicitly be listed as a member of the set,
as appropriate for certain edge cases.</dd>
<dd>This definition contrasts with sets of strings defined by a rule or
definition, such as <em>Combining Character Sequences</em> [D56].</dd>
<dt>PD32f. Property of Strings</dt>
<dd>A character property whose domain extends to Unicode strings, as opposed to single
code points.</dd>
<dd>The same categorizations of property types, values, and statuses apply as for
encoded character properties.</dd>
<dt>PD32g. Explicit Property of Strings</dt>
<dd>A Property of Strings for which each value is specified explicitly
for each member of a particular Enumerated Set of Strings.</dd>
<dd>An example of an Explicit Property of Strings is <b>RGI_Emoji_Flag_Sequence</b>.
That is a simple Boolean property, but its domain is the set of emoji flag sequences
explicitly listed in the data file emoji-sequences.txt. If a particular sequence is
listed in that file, then the value of the RGI_Emoji_Flag_Sequence property for that
sequence is True. Otherwise, it is False for any other Unicode string, including the
empty string.</dd>
<dd>RGI_Emoji_Flag_Sequence is also an example of a normative Property of Strings:
it is formally defined in a Unicode specification, is maintained in a data file updated
with each release, and has conformance implications for implementations of emoji.</dd>
<dd>Typically, an Explicit Property of Strings will be of type Boolean: either
a given sequence is a member of the set or not. However, in principle, properties of
more complex types could also be defined to apply to members of an Enumerated Set
of Strings.</dd>
<dt>PD32h. Algorithmic Property of Strings</dt>
<dd>A Property of Strings whose values are determined by a String Function applied
to the entire string (offsets 0 and n).</dd>
<dd>An example of an Algorithmic Property of Strings is <b>isLowercase</b>. That
property is defined in Section 3.13, Default Case Algorithms of the Unicode Standard
[<a href="#Unicode">Unicode</a>]. It has type Boolean, and is either True or False
for <i>any</i> Unicode string, but its value is determined by an algorithm that
involves casing the Unicode string and examining the result of that operation.</dd>
<dd>Another example of an Algorithmic Property of Strings would be
<b>isEmojiFlagSequence</b>. That property is not formally defined in Unicode
Technical Standard #51, Unicode Emoji [<a href="#UTS51">UTS51</a>], but it is
implied by definition ED-14 in that specification, for <b>emoji flag sequence</b>.
The BNF which defines an emoji flag sequence can be applied algorithmically to any given Unicode string
to determine whether that sequence meets the formal syntactic
definition or not. That determination does not require checking against an
explicit, enumerated character sequence set. And in fact, the entire point of
the Explicit Property of Strings, RGI_Emoji_Flag_Sequence, by contrast,
is to allow for picking out of the domain of all possible syntactically correct emoji
flag sequences, just the precise set listed in emoji-sequences.txt as
recommended for general interchange [RGI]. The RGI status is not algorithmically
derivable, and can only be specified by providing an Enumerated
Set of Strings to test against.</dd>
</dl>
<h3>3.8 <a name="StringFunctionsDefinitions" href="#StringFunctionsDefinitions">String Functions</a></h3>
<p><i>None of the following definitions is found in the Unicode Standard at
this point, however, they are useful in the context of discussing Unicode
algorithms and their relation to properties.</i></p>
<dl>
<dt>PD33. Offset</dt>
<dd>An offset into a string is a number from 0 to <i>n</i> where <i>n</i>
is the length of the string in code points.
It indicates a position that is logically between code points.
An offset of 0 indicates
the position before the first code point in the string, and an offset of <i>n</i>
indicates the position after the last code point in the string.</dd>
</dl>
<p>Dealing with offsets at the level of code units is the concern of lower-level
implementation processes, which must deal with the details of character encoding forms. For
the purposes of the character property model, strings are simply defined abstractly in
terms of encoded character sequences and code points.</p>
<dl>
<dt>PD34. [Definition removed]</dt>
<dd> </dd>
<dt>PD35. String Function</dt>
<dd>A string function is a function whose input is a string <i>S</i> and
two offsets <i>a</i> and <i>b</i>, with <i>a</i> ≤ <i>b</i>.</dd>
<dt>PD36. Text Boundary Property</dt>
<dd>A string function whose value is defined for
a particular offset.<br>
<br>
Text boundary functions are also called segmentation functions, because they
are commonly used to return segments of text between boundaries. A simple text
boundary function, like IsBreak(S,a,b) minimally returns a Boolean value.
However, other text boundary functions may return additional information. For example, a word-selection boundary function may return whether the
previous segment contained a letter, or a linebreak function may return
information on the relative priority of the break.</dd>
</dl>
<h3>3.9
<a name="StringFunctionClassificationDefinitions" href="#StringFunctionClassificationDefinitions">Classification of String
Functions</a></h3>
<dl>
<dt>PD37. Context-independent String Function</dt>
<dd>Given a string <i>S</i>, and
offsets <i>a</i> and <i>b</i>, a context-independent string
function is any string function <i> F</i> for which <i>F</i>(<i>S,a,b</i>)
is independent of the content of <i>S </i>before <i>a</i> and after <i>b</i>.<br>
<br>
In other words, the input to a context-independent function is fully
defined by the code points between the given offsets.</dd>
<dt>PD38. <a name="ContextDependentStringFunction">Context-dependent String
Function</a></dt>
<dd>A context-dependent string function is a
string function that is not context-independent.<br>
<br>
In other words, the input to a context-dependent string function requires
additional information, such as information about the code points surrounding the code point range defined
by the offsets as well as the
code points in the range. Any text boundary function of the form <i>B </i>(<i>S,x,x</i>)
is by definition context dependent.</dd>
<dt>PD39. String Transform</dt>
<dd>A string-valued string function.</dd>
<dt>PD40. Idempotent String Function (Folding)</dt>
<dd>A string transform <i>F</i>, with the property that repeated
applications of the same function <i>F</i> produce the same output: <i>F</i>(<i>F</i>(<i>S</i>)) =
<i>F</i>(<i>S</i>)
for all input strings<i> S</i>.</dd>
<dd>Such a string function is also called a
folding.</dd>
<dd>A folding establishes an equivalence relation,
whereby X ≡ Y if and only if F(X) = F(Y). This equivalence relation
partitions the set of all strings into the set of equivalence classes for
the relation. Conversely, any partition of strings can be used to generate a
folding, by choosing one element of each partition to be the "target member"
that the members of that partition map to.
<p>The notation toX(s) may be used for the
folding, and isX(s) for the corresponding binary function, defined such that
isX(s) if and only if toX(s) = s. For example, toNFC() is the folding that
converts to NFC format, while isNFC() is the test for whether a string is in
that format. </dd>
<dd>A well known example of a
folding function is case folding. For case folding, the equivalence class
consists of all case variations, including upper, lower, title case and
mixed case. In the case of Unicode case folding, the target member is chosen
to be the lowercase character.</dd>
<dd>Folding functions may be context
dependent. Normalization is an
example of a context dependent folding. </dd>
<dt>PD41. Code Point Count
Preserving String Function</dt>
<dd>A string function whose result is a string containing the same number of
code <i>points</i> as its input is a count preserving string
function.</dd>
<dt>PD42. Buffer Length
Preserving String Function</dt>
<dd>A string function whose result is a string containing the same number of
code <i>units</i> as its input is a buffer length preserving string function.</dd>
</dl>
<h3>3.10 <a name="OtherDefinitions" href="#OtherDefinitions">Other Definitions</a></h3>
<dl>
<dt>PD43. Higher-level Protocol</dt>
<dd>Any agreement on the interpretation of Unicode characters that extends
beyond the scope of the Unicode Standard. [D16]</dd>
</dl>
<h2>4. <a name="Conformance" href="#Conformance">Conformance-related Considerations</a></h2>
<p>This technical report does not define conformance requirements, but the
following subsections discuss and summarize the conformance requirements
related to character properties stated in the Unicode Standard. Where applicable, the number of the corresponding conformance clause or definition is given in square brackets.</p>
<h3>4.1 <a name="Requirements" href="#Requirements">Conformance Requirements</a></h3>
<p>In Chapter 3, Conformance, The Unicode Standard [<a href="#Unicode">Unicode</a>] states<i> </i>that<i>
"A process shall interpret a coded character sequence according to
the character</i> <i>semantics established by this standard, if that process
does interpret that coded character sequence."</i> [C4] The
semantics of a character are established by taking its coded representation,
character name and representative glyph in context and are further defined by
its normative properties and behavior. Neither character name nor
representative glyphs can be relied upon absolutely; a character may have a
broader range of use than the most literal interpretation of its character
name, and the representative glyph is only indicative of one of a range of
typical glyphs representing the same character.</p>
<h3>4.2 <a name="Algorithms" href="#Algorithms">Algorithms and Character Properties</a></h3>
<p>Unicode algorithms are specified
as an idealized series of steps (rules) performed on an input of character
codes and their associated properties. [<a href="#Unicode">Unicode</a>] states:</p>
<blockquote class="tus">
<ul>
<li>An implementation claiming conformance to a Unicode algorithm
need only guarantee
that it produces the same results as those specified in the logical
description of
the process; it is not required to follow the actual described
procedure in detail. This
allows room for alternative strategies and optimizations in
implementation. See [C18].</li>
</ul>
</blockquote>
<p>As long as the same results are
achieved, the implementation is also not required to use the actual
properties published in the [<a href="#UCD">UCD</a>].
<i>Overriding</i>
a property value therefore does not necessarily imply an actual change in
property assignments, merely that the conformant implementation of an
algorithm now produces the same results as if the property values had been
changed in the description of the ideal algorithm.</p>
<h3>4.3 <a name="Overriding" href="#Overriding">Overriding Properties via
Higher-level Protocols</a></h3>
<p>In discussing
character semantics, the Unicode Standard [<a href="#Unicode">Unicode</a>]
makes this statement about overriding
properties and character behavior:</p>
<blockquote class="tus">
<p>Some normative behavior is default behavior; this behavior can be
overridden by higher-level protocols. However, in the absence of such
protocols, the behavior must be observed so as to follow the character
semantics. See [D3].</p>
</blockquote>
<p>Overrides by a higher-level
protocol can conceptually take many forms, including, but not limited to:</p>
<ul>
<li>providing artificial context for an algorithm
that defines a
context-dependent string function</li>
<li>applying the algorithm on a substring</li>
<li>emulating the effect of
format control characters in markup</li>
<li>reassigning a different
property value to a character during processing or rendering</li>
<li>changing the result of a
string function for particular inputs</li>
</ul>
<p>Where overrides involve normative
properties, specific restrictions apply, for example:</p>
<blockquote class="tus">
<p>• The character combination properties and the canonical ordering
behavior cannot be overridden by higher-level protocols. See [D3].</p>
</blockquote>
<p>For additional examples of higher-level protocols as well as restrictions on them see section 4.3 in <a href="https://www.unicode.org/reports/tr9/">
UAX #9: <i>Unicode Bidirectional
Algorithm</i></a> [<a href="#Bidi">Bidi</a>].
There are some normative properties that are fully overridable, for example
General Category.</p>
<p>On the other hand, any and all informative properties may be overridden.
However, if doing so changes the result of a Unicode Algorithm, any
implementation wishing to conform to that algorithm
must indicate that overrides have been applied.</p>
<h2>5. <a name="Maintenance" href="#Maintenance">Updating Properties and Extending the Standard</a></h2>
<h3>5.1 <a name="Updating" href="#Updating">Updating Properties</a></h3>
<p>Updates to properties of the Unicode Character Database can be required for three reasons:</p>
<ol>
<li>To cover new characters added to the Unicode Standard</li>
<li>To add new properties</li>
<li>To change the assigned values for a property for some characters</li>
</ol>
<p>While the Unicode Consortium endeavors to keep the values of all character
properties as stable as possible, some circumstances may arise that
require changing them. Changing a character's property assignment may
impact existing
implementations and is therefore done judiciously and with
great care, only when there is no better alternative.</p>
<p>In particular, as Unicode encodes less well-documented scripts, such as
those for minority languages, the exact
character properties and behavior may not be known when the script
is first encoded. The properties for such characters are
expected to be changed as information becomes available.</p>
<p>As
implementation experience grows, it may become necessary to readjust property values. As much as possible, such readjustments are compatible
with established practice. Occasionally, a character property is
changed to prevent incorrect generalizations of a character's use based on its nominal property values. For example, U+200B
ZERO WIDTH SPACE was originally classified as a space character (General
Category=Zs), but is now classified as a Formal Control (gc=Cf) to
distinguish this line break control from space characters.</p>
<p>In other cases, there may have been unintentional mistakes in the
original information that require corrections.</p>
<p>The [<a href="#UTC">UTC</a>]
carefully weighs the costs of a change against the benefit of the correction. In
addition, all updates to properties
are subject to the stability guarantees described in the next section.</p>
<h3>5.2 <a name="Guarantees" href="#Guarantees">Stability Guarantees</a></h3>
<p>Unicode guarantees the stability of character assignments;
that is, the <i>identity</i> of a character encoded at a given location will
remain the same. Once a character is encoded, its properties may still
be changed, but <i>not</i> in such a way as to change
the fundamental identity of the character.</p>
<p>For example, the representative glyph for
U+0041 "A" could not be changed to "B"; the general
category for U+0041 "A" could not be changed to Ll <i>(lowercase
letter);</i> and the decomposition mapping for U+00C1 (Á) could not be
changed to <U+0042, U+0301> (B, ´).</p>
<p>In addition, for some properties, one or more of the following aspects are
guaranteed to be invariant:</p>
<ul>
<li> stability of assignment </li>
<li> stability of result when applying the property</li>
<li> stability of set of values for a property</li>
<li> stability of relation to another property</li>
<li> stability of file formats</li>
</ul>
<p>For the most up-to-date
specification of all stability guarantees in effect see the
Unicode Character Encoding Stability
Policy [<a href="#Stability">Stability</a>]. Note that the status of a property
as normative does not imply a stability guarantee.</p>
<h4>5.2.1 <a name="StabilityofAssignment" href="#StabilityofAssignment">Stability of Assignment</a></h4>
<p>Stability of assignment is the characteristic of an <i>immutable</i> property. For
example, once a character is encoded, its code point and name are
immutable properties. An immutable property
allows software and documents to refer to its values without needing to track
future updates to the Standard. One side effect of an immutable property is
that errors in property values cannot be fixed. For example, mistakes in naming are
annotated in the
Unicode character names list in a note or by using an
alias, but the formal name remains unchanged, even in cases of clear-cut
typographical errors.</p>
<p>Because Code_Point is an immutable property, if a character is ever
found to be unnecessary, or a mistaken duplicate of an existing
character, it will not be removed. Instead, it can be given an additional
property, <i>deprecated</i>, and its use strongly discouraged.
However, the interpretation of all existing documents containing
the character remains the same.</p>
<h4>5.2.2 <a name="StabilityofResult" href="#StabilityofResult">Stability of Result when Applying the Property</a></h4>
<p>Stability of result is the characteristic of a <i>stable</i> property. For
example, once a character is encoded, its canonical combining class and
decomposition (canonical or compatibility) are stable with respect to
normalization. Stability with respect to normalization is defined in such
a way that if a string contains only characters from a given version of the
Unicode Standard (say Unicode 3.2), and it is put into a normalized form
in accordance with that version of Unicode, then it will be in normalized
form when normalized according to any future version of Unicode.</p>
<p>However, unlike character code and
character name, some properties that are guaranteed to be stable may be corrected in
<i>
exceptional</i> circumstances that are clearly defined by the Unicode
Character Encoding Stability Policy [<a href="#Stability">Stability</a>]. In addition to other
requirements, the correction must be of an obvious mistake, such as a
typographical error, and any alternative would have to violate the stability of the
identity of the character in question. Allowing such carefully restricted exceptions obviates the need for
encoding duplicate characters simply to correct clerical or other clear-cut
errors in property assignments.</p>
<h4>5.2.3 <a name="StabilityofSett" href="#StabilityofSett">Stability of Set of Values for a Property</a></h4>
<p>For most properties, additional property values may be created and assigned to
both new and existing characters. For example additional line breaking classes
will be assigned if characters are discovered to require line breaking
behavior that cannot be expressed with the existing set of classes. For other
properties the set of values is guaranteed to be fixed, or their range is
limited. For example, the set of values for the General_Category or
Bidirectional_Class is fixed, while combining classes are limited to the values 0 to 254.</p>
<h4>5.2.4 <a name="StabilityofRelation" href="#StabilityofRelation">Stability of Relation to Another Property</a></h4>
In many cases, once a character has a certain value for one property, it is
likely to have a particular value for a given other property. These relations
are used by the Unicode Consortium in assigning properties to new characters,
and in evaluating properties for internal consistency. In some cases, such
dependencies are explicitly guaranteed and stable.
<p>For example, all characters other than those of General Category M* have the
combining class 0.</p>
<h4>5.2.5 <a name="StabilityofFormat" href="#StabilityofFormat">Stability of File Formats</a></h4>
<p>In principle, the way the property information is presented in the Unicode
Character Database is independent of the way this information is defined.
However, as the Unicode Standard gets updated, it becomes easier for
implementations to track updates if file formats remain unchanged and other
aspects of the way the data are organized can remain stable. For the majority
of properties, such stability is an informal goal of the development process,
but in a few cases, some aspects of the data organization are covered by
formal stability guarantees.</p>
<p>For example, Canonical and Compatibility mappings are always in canonical order,
and the resulting recursive decomposition will also be in canonical
order. Canonical mappings are also always limited either to a single value or to
a pair. The second character in the pair cannot itself have a
canonical mapping.</p>
<p>As an alternative to the legacy conventions of semicolon-separated text files, the Unicode Character Database is now also available as a single XML file. See <a href="https://www.unicode.org/reports/tr42/">UAX #42 Unicode Character Database</a> in XML [<a href="#XML">XML</a>].</p>
<h3>5.3 <a name="Consistency" href="#Consistency">Consistency of Properties</a></h3>
<p> In an ideal world, all character properties would be
perfectly self-consistent, and related properties would be consistent with
each other over the entire range of code points. However, The Unicode Standard
is the product of many compromises. It has to strike a balance between
uniformity of treatment for similar characters, and compatibility with existing
practice for characters inherited from legacy encodings. Because of this
balancing act, one can expect a certain number of anomalies in character
properties.</p>
<p>Sometimes it may be advantageous for an implementation to
purposefully override some of the anomalous property values, increasing the
efficiency and uniformity of algorithms—as long as the results they
produce do not conflict with those specified by the normative properties of
this standard. See Chapter 4, <i>Character Properties</i> in [<a href="#Unicode">Unicode</a>] for some
examples.</p>
<p>Property values assigned to new
characters added to the Unicode Standard are generally defined so that related
characters are given consistent values, unless deliberate exceptions are
needed. For some properties, definite links between that property and
one or more other properties are defined. For example, for the LineBreak
property, many line break classes are defined in relation to General Category
values.</p>
<p>There are some properties that are interrelated
or that are derived from a combination of other properties, with or without
a list of explicit exceptions. When properties are assigned to newly
assigned characters, or when properties are adjusted, it is necessary to
take into account all existing relevant properties, any derivational
relations to derived properties, and all property stability guarantees.</p>
<h3>5.4 <a name="Provisional" href="#Provisional">Provisional Properties</a></h3>
<p>Some of the information provided about characters in the Unicode Character
Database constitutes provisional data. Provisional property data may capture
partial or preliminary information. Such data may contain errors or omissions,
or otherwise not be ready for systematic use; however, provisional property
data are included in the data files for distribution partly to encourage
review and improvement of the information. For example, a number of the tags
in the Unihan database provide provisional property values of various sorts
about Han characters.</p>
<h3>5.5 <a name="Unmaintained" href="#Unmaintained">Stabilized Properties</a></h3>
<p> Occasionally, as the
standard matures, and new characters, properties or algorithms are defined, the
information presented in an existing property may be better represented via other
properties, or it may no longer make sense to extend the property to new characters.
Such a property may then no longer be maintained in future versions of the
Unicode Standard. In that case, it will be designated as <i>stabilized</i>. For
backwards compatibility, a stabilized property will remain part of the Unicode
Character Database, but will not be updated or corrected.</p>
<p>An example of a stabilized property is Hyphen.</p>
<h2>6. <a name="SpecialValues" href="#SpecialValues">Special Property Values</a></h2>
<h3>6.1 <a name="NA" href="#NA">Not Applicable Value</a></h3>
<p>Limited properties apply to only a subset of characters. Where these
properties are implemented as a partition of the Unicode code space, the characters to which the property does not apply are given a special value denoting that
the property does not apply. The "not applicable" value may be the explicit
value "NA" or, for some properties, take other values such as "XX".</p>
<h3>6.2 <a name="Default" href="#Default">Default Values</a></h3>
<p>Implementations often need specific properties for <i>all</i> code points,
including those that are unassigned. To meet this need, the Unicode standard
assigns default properties to ranges of unassigned code points.</p>
<p>All implementations of the Unicode Standard should endeavor to handle
additions to the character repertoire gracefully. In some cases this may
require that an implementation attempts to 'anticipate' likely property values
for code points for which characters have not yet been defined, but where
surrounding characters exist that make it probable that similar characters
will be assigned to the code point in question.</p>
<p>There are three strategies:</p>
<ol>
<li>Rely on the recommendation from the Unicode Consortium. For example, for
the Bidirectional Class, the Unicode Consortium has published recommended
default values for all code points. For details of these recommendations
for various properties see [<a href="#UCDDoc">UCDDoc</a>].</li>
<li>Treat the unassigned areas of a given character block as if they had
property values common to other characters of the block. A variation of
this scheme bridges small gaps in the allocation inside a block by using
the property values for the characters bracketing the hole.</li>
<li>Give an unassigned <i>code point </i>an implementation defined default property
that will result in graceful if not completely correct behavior, if
an encoded character is later assigned at that code point.</li>
</ol>
<p>Each of these strategies has advantages and drawbacks, and none can
guarantee that the behavior of an implementation that is conformant to a prior
version of the Unicode Standard will support characters added in a later
version of the Unicode Standard in precisely the same way as an implementation
that is conformant to the later version. The most that can be hoped for is
that the earlier implementation will behave more gracefully in such circumstances.</p>
<p>In principle, default values are temporary: they are superseded by final assignments
once characters are assigned to a given code point.</p>
<p>For noncharacter code points, a character property function would return the same
value as the default value for unassigned characters.</p>
<h3>6.3 <a name="Preliminary" href="#Preliminary">Preliminary Property Assignments</a></h3>
<p>Sometimes, a determination and assignment of property values can be made,
but the information on which it was based may be incomplete or preliminary. In
such cases, the property value may be changed when better information becomes
available. Currently, there is no machine readable way to provide information
about the confidence of a property assignment; however, the text of the
Standard or a Technical Report defining the property may provide general
indications of preliminary status of property assignments where they are
known.</p>
<p>This is distinct from <a href="#ProvisionalProperty">provisional properties</a>,
where the entire property is preliminary.</p>
<h2><a name="References" href="#References">References</a></h2>
<table class="noborder" cellpadding="8">
<tr>
<td class="nb">[<a name="Alias">Alias</a>]</td>
<td class="nb">Property Aliases<br>
<a href="https://www.unicode.org/Public/UCD/latest/ucd/PropertyAliases.txt">https://www.unicode.org/unicode/Public/UCD/latest/ucd/PropertyAliases.txt</a></td>
</tr>
<tr>
<td class="nb" vAlign="top" width="1">[<a name="Bidi">Bidi</a>]</td>
<td class="nb" vAlign="top">Unicode
Standard Annex #9: <i>The Unicode Bidirectional Algorithm<br>
</i> <a href="https://www.unicode.org/reports/tr9/">https://www.unicode.org/reports/tr9/</a></td>
</tr>
<tr>
<td class="nb" vAlign="top" width="1">[<a name="Charts">Charts</a>]</td>
<td class="nb" vAlign="top">The online code charts can be found at <a href="https://www.unicode.org/charts/">https://www.unicode.org/charts/</a>
An index to characters names with links to the corresponding chart is
found at <a href="https://www.unicode.org/charts/charindex.html">https://www.unicode.org/charts/charindex.html</a></td>
</tr>
<tr>
<td class="nb" vAlign="top" width="1">[<a name="EAW">EAW</a>]</td>
<td class="nb" vAlign="top">Unicode Standard Annex #11:<i> East Asian
Width<br>
</i><a href="https://www.unicode.org/reports/tr11/">https://www.unicode.org/reports/tr11/</a></td>
</tr>
<tr>
<td class="nb" vAlign="top" width="1">[<a name="FAQ">FAQ</a>]</td>
<td class="nb" vAlign="top">Unicode Frequently Asked Questions<br>
<a href="https://www.unicode.org/faq/">https://www.unicode.org/faq/<br>
</a><i>For answers to common questions on technical issues.</i></td>
</tr>
<tr>
<td class="nb" vAlign="top" width="1">[<a name="Glossary">Glossary</a>]</td>
<td class="nb" vAlign="top">Unicode Glossary<a href="https://www.unicode.org/glossary/"><br>
https://www.unicode.org/glossary/<br>
</a><i>For explanations of terminology used in this and other documents.</i></td>
</tr>
<tr>
<td class="nb" vAlign="top" width="1">[<a name="LineBreak">LineBreak</a>]</td>
<td class="nb" vAlign="top">Unicode Standard Annex #14:<i> Unicode Line Breaking
Algorithm<br>
</i><a href="https://www.unicode.org/reports/tr14/">https://www.unicode.org/reports/tr14/</a></td>
</tr>
<tr>
<td class="nb">[<a name="Normal">Normal</a>]</td>
<td class="nb">Unicode Standard Annex #15: <i>Unicode Normalization Forms</i><br>
<a href="https://www.unicode.org/reports/tr15/">https://www.unicode.org/unicode/reports/tr15/</a></td>
</tr>
<tr>
<td class="nb">[<a name="RegEx">RegEx</a>]</td>
<td class="nb">Unicode Technical Standard #18: <i>Unicode Regular Expressions</i><br>
<a href="https://www.unicode.org/reports/tr18/">https://www.unicode.org/unicode/reports/tr18/</a></td>
</tr>
<tr>
<td class="nb" vAlign="top" width="1">[<a name="Stability">Stability</a>]</td>
<td class="nb" vAlign="top">Unicode Character Encoding Stability Policy<br>
<a href="https://www.unicode.org/policies/stability_policy.html">
https://www.unicode.org/policies/stability_policy.html</a></td>
</tr>
<tr>
<td class="nb" vAlign="top" width="1">[<a name="UCA">UCA</a>]</td>
<td class="nb" vAlign="top">Unicode Technical Standard #10: <i>Unicode Collation Algorithm</i><br>
<a href="https://www.unicode.org/reports/tr10/">https://www.unicode.org/reports/tr10/</a></td>
</tr>
<tr>
<td class="nb" vAlign="top" width="1">[<a name="UCD">UCD</a>]</td>
<td class="nb" vAlign="top">About the Unicode Character Database<br>
<a href="https://www.unicode.org/ucd/">https://www.unicode.org/ucd/<br>
</a><i>For an overview of the Unicode Character Database</i></td>
</tr>
<tr>
<td class="nb" vAlign="top" width="1">[<a name="UCDDoc">UCDDoc</a>]</td>
<td class="nb" vAlign="top">Unicode Standard Annex #44: <i>Unicode Character Database</i><br>
<a href="https://www.unicode.org/reports/tr44/">
https://www.unicode.org/reports/tr44/</a><br>
<i>For documentation of the contents of the Unicode Character Database and its associated files</i></td>
</tr>
<tr>
<td class="nb" vAlign="top" width="1">[<a name="Unicode">Unicode</a>]</td>
<td class="nb" vAlign="top">The Unicode Standard<br>
<i>For the latest version see:</i><br>
<a href="https://www.unicode.org/versions/latest/">
https://www.unicode.org/versions/latest/</a><br>
<i>For Version 15.0 see:</i> The Unicode Consortium. The
Unicode Standard, Version 15.0.0 (Mountain View, CA: The Unicode Consortium, 2022. ISBN 978-1-936213-32-0).<br>
<a href="https://www.unicode.org/versions/Unicode15.0.0/">https://www.unicode.org/versions/Unicode15.0.0/</a></td>
</tr>
<tr>
<td class="nb" vAlign="top" width="1">[<a name="Unihan">Unihan</a>]</td>
<td class="nb" vAlign="top">Unicode Standard Annex #38: <i>Unicode Han Database (Unihan)</i><br>
<a href="https://www.unicode.org/reports/tr39/">
https://www.unicode.org/reports/tr38/</a><br>
<i>The database itself is available online at</i><br>
<a href="https://www.unicode.org/Public/UCD/latest/ucd/Unihan.zip">
https://www.unicode.org/Public/UCD/latest/ucd/Unihan.zip</a> (large download)</td>
</tr>
<tr>
<td class="nb" vAlign="top" width="1">[<a name="UTC">UTC</a>]</td>
<td class="nb" vAlign="top">The Unicode Technical Committee<br>
<i>For more
information see</i> <a href="https://www.unicode.org/consortium/utc.html">
https://www.unicode.org/consortium/utc.html</a></td>
</tr>
<tr>
<td class="nb" vAlign="top" width="1">[<a name="UTS51">UTS51</a>]</td>
<td class="nb" vAlign="top">Unicode Technical Standard #51: <i>Unicode Emoji</i><br>
<a href="https://www.unicode.org/reports/tr51/">https://www.unicode.org/reports/tr51/</a></td>
</tr>
<tr>
<td class="nb">[<a name="ValueAlias">ValueAlias</a>]</td>
<td class="nb">Property Value Aliases<br>
<a href="https://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt">https://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt</a></td>
</tr>
<tr>
<td class="nb" vAlign="top" width="1">[<a name="XML">XML</a>]</td>
<td class="nb" vAlign="top">Unicode Standard Annex #42: <i>Unicode Character Database in XML</i><br>
<a href="https://www.unicode.org/reports/tr42/">https://www.unicode.org/reports/tr42/</a><br>
<i>The XML version of the database is available online at</i><br>
<a href="https://www.unicode.org/Public/UCD/latest/ucdxml/">https://www.unicode.org/Public/UCD/latest/ucdxml/</a></td>
</tr>
</table>
<h2><a name="Acknowledgements" href="#Acknowledgements">Acknowledgements</a></h2>
<p>Asmus Freytag was the initial author of this report, with additional
content provided by Ken Whistler.</p>
<p>The editors wish to thank
Mark Davis for his extensive
contributions and insightful
comments, and Dr. Julie Allen for extensive copy-editing. Ivan Panchenko
provided a careful copyedit and list of typos to fix for Revision 15.</p>
<h2><a name="Modifications" href="#Modifications">Modifications</a></h2>
<p>The following summarizes
modifications from the previous version of this document.</p>
<p><b>Revision 15 [AF, KW]</b></p>
<ul>
<li>Reissued.</li>
<li>Minor editing.</li>
</ul>
<p>Previous revisions can be accessed with the “Previous Version” link in the header.</p>
<hr>
<p class="copyright">© 2022 Unicode, Inc. All Rights Reserved.
The Unicode Consortium makes no expressed or implied warranty of any kind,
and assumes no liability for errors or omissions. No liability is assumed
for incidental and consequential damages in connection with or arising out
of the use of the information or programs contained or accompanying this
technical report. The Unicode <a href="https://www.unicode.org/copyright.html">Terms of Use</a> apply.</p>
<p class="copyright">Unicode and the Unicode logo are trademarks of Unicode, Inc., and are
registered in some jurisdictions.</p>
</div>
</body>
</html>
Rendered documentLive HTML preview