tr60-2.html
1036 lines<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head><base href="https://www.unicode.org/reports/tr60/tr60-2.html">
<title>UAX #60: Data for non Han Ideographic Scripts</title>
<link rel="stylesheet" type="text/css" href="https://www.unicode.org/reports/reports-v2.css">
</head>
<body>
<table class="header">
<tr>
<td class="icon" style="width:38px; height:35px">
<a href="https://www.unicode.org/">
<img border="0" src="https://www.unicode.org/webscripts/logo60s2.gif" align="middle"
alt="[Unicode]" width="34" height="33"></a>
</td>
<td class="icon" style="vertical-align:middle">
<a class="bar"> </a>
<a class="bar" href="https://www.unicode.org/reports/"><font size="3">Technical Reports</font></a>
</td>
</tr>
<tr>
<td colspan="2" class="gray"> </td>
</tr>
</table>
<div class="body">
<h2 class="uaxtitle"><span class="removed">Proposed</span> Draft Unicode®
Standard Annex #60</h2>
<h1>DATA FOR NON HAN IDEOGRAPHIC SCRIPTS</h1>
<table class="simple" width="90%">
<tbody>
<tr>
<td valign="top" width="20%">Version</td>
<td valign="top" >Unicode 18.0.0</td>
</tr>
<tr>
<td valign="top">Editor</td>
<td valign="top">Michel Suignard</td>
</tr>
<tr>
<td valign="top">Date</td>
<td valign="top" class="changed">2026-02-04</td>
</tr>
<tr>
<td valign="top">This Version</td>
<td valign="top" class="changed">
<a href="https://www.unicode.org/reports/tr60/tr60-2.html">https://www.unicode.org/reports/tr60/tr60-2.html</a></td>
</tr>
<tr>
<td valign="top">Previous Version</td>
<td valign="top" class="changed">
<a href="https://www.unicode.org/reports/tr60/tr60-1.html">https://www.unicode.org/reports/tr60/tr60-1.html</a></td>
</tr>
<tr>
<td valign="top">Latest Version</td>
<td valign="top"><a href="https://www.unicode.org/reports/tr60/">https://www.unicode.org/reports/tr60/</a></td>
</tr>
<tr>
<td valign="top">Latest Proposed Update</td>
<td valign="top">
<a href="https://www.unicode.org/reports/tr60/proposed.html">https://www.unicode.org/reports/tr60/proposed.html</a></td>
</tr>
<tr>
<td valign="top">Revision</td>
<td valign="top"><a href="#Modifications"><span class="changed">2</span></a></td> </tr>
</tbody>
</table>
<h4 style="margin-top: 1em;">Summary</h4>
<p><em>This document describes the Sources and other ancillary data for non
Han Ideographic Scripts, including Jurchen, Nüshu, <span class="changed">Seal, </span>and Tangut.</em></p>
<h4 class="status">Status</h4>
<!-- NOT YET APPROVED -->
<p class="changed"><em>This is a<strong><font color="#ff3333"> draft </font></strong>document which may be updated, replaced, or superseded by other documents at any time. Publication does not imply endorsement by the Unicode Consortium. This is not a stable document; it is inappropriate to cite this document as other than a work in progress.</em></p>
<!-- END NOT YET APPROVED -->
<!-- APPROVED
<p><em>This document has not yet been reviewed by Unicode members and other interested parties, and has been approved for publication by the Unicode Consortium. This is a stable document and may be used as reference material or cited as a normative reference by other specifications.</em></p>
END APPROVED -->
<blockquote>
<p><em><strong>A Unicode Standard Annex (UAX)</strong> forms an integral part of the Unicode Standard, but is published online as a separate document. The Unicode Standard may require conformance to normative content in a Unicode Standard Annex, if so specified in the Conformance chapter of that version of the Unicode Standard. The version number of a UAX document corresponds to the version of the Unicode Standard of which it forms a part.</em></p>
</blockquote>
<p><em>Please submit corrigenda and other comments with the online reporting
form [<a href="https://www.unicode.org/reporting.html">Feedback</a>].
Related information that is useful in understanding this annex is found in Unicode Standard Annex #41,
“<a href="https://www.unicode.org/reports/tr41/tr41-36.html">Common References for Unicode Standard Annexes</a>.”
For the latest version of the Unicode Standard, see [<a href="https://www.unicode.org/versions/latest/">Unicode</a>].
For a list of current Unicode Technical Reports, see [<a href="https://www.unicode.org/reports/">Reports</a>].
For more information about versions of the Unicode Standard, see [<a href="https://www.unicode.org/versions/">Versions</a>].
For any errata which may apply to this annex, see [<a href="https://www.unicode.org/errata/">Errata</a>].</em></p>
<h4>Contents</h4>
<ul class="toc">
<li>1 <a href="#Introduction">Introduction</a></li>
<li>2 <a href="#Mechanics">Mechanics</a>
<ul class="toc">
<li>2.1 <a href="#DatafilesDesign">Data files Design</a></li>
<li>2.2 <a href="#Datafiles">Data files for Jurchen, Nüshu, <span class="changed">Seal</span>, and Tangut</a></li>
</ul>
</li>
<li>3 <a href="#PropertyTypes">Property Types</a>
<ul class="toc">
<li>3.1 <a href="#Sources">Sources</a></li>
<li>3.2 <a href="#RadicalStroke">Radical-Stroke Counts</a></li>
<li>3.3 <a href="#Readings">Readings</a></li>
<li>3.4 <a href="#Numeric" class="changed">Numeric values</a></li>
<li>3.5 <a href="#OtherData">Other Data</a></li>
</ul>
</li>
<li>4 <a href="#Properties">Scripts Properties</a>
<ul class="toc">
<li>4.1 <a href="#Jurchen">Jurchen</a></li>
<li>4.2 <a href="#Nushu">Nüshu</a></li>
<li class="changed">4.3 <a href="#Seal">Seal</a></li>
<li>4.4 <a href="#Tangut">Tangut</a></li>
</ul>
</li>
<li>5 <a href="#History">History</a></li>
<li><a href="#Acknowledgements">Acknowledgements</a></li>
<li><a href="#Modifications">Modifications</a></li>
</ul>
<hr>
<h2>1 <a name="Introduction" href="#Introduction">Introduction</a></h2>
<p>This document is a guide to information including sources and other
ancillary data related to ideographic scripts other than Han. Historically,
a summary and often incomplete version of that information was provided in the data file
preambles related to these scripts. This document formalizes these elements in a
structure similar to what is done for Han characters in UAX #38 Unihan Han
Database information. In common with Han ideographs, elements of
these other ideographic scripts are encoded using algorithmic names,
including the name of the script and a multi-digit notation indicating the
hexadecimal value of the code point, therefore providing little information
about the identity of the character. The ancillary data provided by the
related data files define additional information such as the various sources
for the ideograph identity, and other ancillary information, such as
the reading and radical-stroke index. While sources are always provided, the
ancillary information varies between scripts. Similar to the
Unihan database, this information could grow in the future, such as adding sources or
other type<span class="changed">s</span> of data related to specific code points.</p>
<p>The scripts covered by this document include Jurchen, Nüshu, <span class="changed">Seal</span>, and Tangut,
referred as 'covered scripts' in the rest of this document.
Note that while another East Asian encoded script, Khitan
Small Script, had properties documented in the various encoding proposals,
especially
<a href="https://www.unicode.org/L2/L2016/16113r-n4725r-khitan-small-script.pdf">
https://www.unicode.org/L2/L2016/16113r-n4725r-khitan-small-script.pdf</a>,
they were not surfaced in any Unicode data files. Nothing precludes their
addition in the future, as it would improve the knowledge related to that
script.</p>
<p>This document is a guide to these data files, one per covered script, describing their mechanics, the nature of their contents, and the status of the various properties.
One the main goal of this document is to provide a single point of
reference for all property information related to the covered scripts.</p>
<h2>2 <a name="Mechanics" href="#Mechanics">Mechanics</a></h2>
<h3>2.1 <a name="DatafilesDesign" href="#DatafilesDesign">Data files Design</a></h3>
<p>The data files consist of a number of fields containing data for each
of the covered script's ideographs included in the Unicode Standard. The fields, all of which correspond to properties, have names that consist entirely of ASCII letters and digits with no spaces or other punctuation except for underscore. For historical reasons, they all start with a lowercase <code>k</code>.</p>
<p>All data in these data files is stored in UTF-8 using Normalization Form C (NFC). Note, however, that the “Syntax” descriptions below, used for validation of property values, operate on Normalization Form D (NFD), primarily because that makes the regular expressions simpler.</p>
<h3>2.2 <a name="Datafiles" href="#Datafiles">Data files for Jurchen, Nüshu, <span class="changed">Seal</span>, and Tangut</a></h3>
<p>Included with the [<a href="../tr41/tr41-36.html#UCD">UCD</a>] are <span class="removed">three</span> <span class="changed">four</span> files
called <code>JurchenSources.txt</code>, <code>NushuSources.txt</code>, <code class="changed">
SealSources.txt</code>, and <code>TangutSources.txt</code>.
These files are single text files, in UTF-8, NFC, and using Unix line
endings which contain the values for all properties related to each of the
covered scripts. Properties are described by categories in this document but
are nevertheless included in a single file per script (unlike, for example
the Unihan database which is made of multiple files for the Han script). <span class="changed">For most scripts, the</span><span class="removed">All</span> properties use a 'k' prefix followed by the
four-letter abbreviated version of the script name as described in
<code>PropertyValueAliases.txt</code>. For example, for the <span class="changed">Jurchen</span> script,
the prefix is <span class="changed">'kJURC'</span>, and an example of property value is <tt class="changed">kJURC_Src</tt>.
<span class="changed">The notable exception is the Tangut script where the original 'TGT' abbreviation has been kept.</span></p>
<blockquote class="reviewnote">Review Note: There is ongoing work to clarify the status of these data sets in term of Unicode properties. These data files
currently contain data which is only in
scope for the script they are addressing. In that aspect they are
different from typical Unicode properties which encompass the whole
Unicode repertoire. As such, they were not subject to the typical
constraints of Unicode properties, such stability, consistency, etc.
By moving these data definitions in a UAX, the use of their status as
Normative or Informative creates a stability requirement that may not be
desired. At this moment, only properties that are essential to
identities are qualified as 'normative', all others are qualified as
'provisional'. However, it may be desirable to make all properties of
type 'Radical-Stroke-Counts' informative to be consistent with the
similar Unihan property. This may also require an update in UAX #44 concerning
the description of these properties.</blockquote>
<p>In this file, blank lines may be ignored; lines beginning with <code>#</code>
are comment lines used to provide the header and footer. Each of the remaining lines
is one entry, with three<span class="removed">,</span> tab-separated fields: the Unicode Scalar Value, the property name, and the value for the property for the given Unicode Scalar Value. For most of the properties, if multiple values are possible, the values are separated by spaces. No
ideograph may have more than one instance of a given property associated
with it, and no empty properties are included in these data files.</p>
<p>There is no formal limit on the lengths of any of the property values.
Any Unicode character may be used in the property values except for control
characters (especially tab, newline, and carriage return). </p>
<p>The data lines are sorted by Unicode Scalar Value and property-type as primary and secondary keys, respectively.</p>
<p>The file’s header includes a summary of the properties each of these data files contains.</p>
<h2>3 <a name="PropertyTypes" href="#PropertyTypes">Property Types</a></h2>
<p>The data in these data files serves a multitude of purposes, and the properties are grouped into categories according to the purpose they fulfill.
A general discussion of the various categories is provided here, followed by a detailed description of the individual properties, alphabetically arranged.
Among these categories, because the source information is essential in
determining identity for characters which have algorithmically constructed
names, the status of source related properties is 'normative'; all other
properties have a 'provisional status.</p>
<!-- Section 3.1 -->
<h3>3.1 <a name="Sources" href="#Sources">Sources</a></h3>
<p>Sources are among the normative parts of these data files and
refer to ideograph collections which identif<span class="removed">ies</span><span class="changed">y</span> encoded characters. These
sources are defined as
<tt>kJURC_Src </tt>for Jurchen, <tt>kNSHU_DubenSrc</tt> for Nüshu,
<tt class="changed">kSEAL_CCZSrc</tt>, <tt class="changed">kSEAL_DYCSrc</tt>, <tt class="changed">kSEAL_QJZSrc</tt>, <tt class="changed">kSEALTHXSrc</tt>
<span class="changed">for Seal,</span> and <tt class="changed">kTGT_MergedSrc</tt>
for Tangut. These sources were typically documented in
the encoding proposals for these scripts. Detailed descriptions of the syntax used for these sources
are to be found in <a href="#Properties">Section 4</a>, <em>Script Properties</em>, below.</p>
<!-- Section 3.2 -->
<h3>3.2 <a name="RadicalStroke" href="#RadicalStroke">Radical-Stroke Counts</a></h3>
<p>Two of the scripts include radical-stroke counts: Jurchen with <tt>kJURC_RSUnicode</tt>
and Tangut with <tt class="changed">kTGT_RSUnicode</tt>. All the radical-stroke properties used
here are loosely derived
from the radical system introduced by the 18th-century <em>Kangxi
Dictionary</em> and used in the Unihan database for the Han ideographs. Each
Tangut ideograph is assigned one of the 883 Tangut components, and each
Jurchen ideographs is assigned one of the 51 Jurchen
radicals. In all these cases, unlike Han, the component or radical assignment
is never a semantic signifier; it is solely based on the ideograph’s structure
and is mainly meant to facilitate lookup of a specific ideograph in these
large lists.
The same two scripts also include a stroke count, and unlike the Han
equivalent, the count includes the component or radical. It should be noted
that the Nüshu repertoire is ordered by stroke counts (one to sixteen) but
this is not reflected in any property. <span class="changed">Finally, the Seal script specifies a
Radical entry <tt>kSEAL_Rad</tt> which does not include a count and therefore is not
included in this category.</span></p>
<!-- Section 3.3 -->
<h3>3.3 <a name="Readings" href="#Readings">Readings</a></h3>
<p>Two of the scripts include a reading property: Jurchen with
<tt>kJURC_NCReading</tt> and Nüshu with <tt>kNSHU_Reading</tt>. Any attempt at
providing a reading or set of readings for an ideograph is bound to be
fraught with difficulty,
because the readings will vary over time and from place to place, even
within a language. However, because these readings have been documented in
the encoding proposals and related to well-known sources, these are provided
when available in these data files.</p>
<!-- Section 3.4 -->
<h3 class="changed">3.4 <a name="Numeric" href="#Numeric">Numeric</a> values</h3>
<p class="changed">Two of the scripts include a numeric property: Jurchen
with
<tt>kJURC_Numeric</tt> and Tangut with <tt>kTGT_Numeric</tt>. This
only applies to a few characters.</p>
<!-- Section 3.5 -->
<h3>3.5 <a name="OtherData" href="#OtherData">Other Data</a></h3>
<p>This category includes properties that are typically specific to a given
script. Currently <span class="removed"> only one property is</span>
<span class="changee">two properties are </span>defined in this category:
<tt class="removed">kJURC_Numeric</tt> <span class="changed"> <tt>kSEAL_MCJK</tt>
and <tt>kSEAL_Rad</tt></span> used for <span class="removed">Jurchen</span>
<span class="changed">Seal</span>.</p>
<!-- Section 4 -->
<h2>4 <a name="Properties" href="#Properties">Scripts Properties</a></h2>
<p>Below is a listing of all properties for each of the covered scripts. Each of
these
lists is ordered alphabetically, with information on the property contents
and syntax.</p>
<p>For each property we give the following information in the alphabetical listing: its <em>Property</em> tag, its Unicode <em>Status</em>, its <em>Category</em> as defined above, the Unicode version in which it was <em>Introduced</em>, its <em>Delimiter</em>, its <em>Syntax</em>, and its <em>Description</em>.</p>
<p>The <em>Property</em> name is the tag used in the data files to mark instances of this property.</p>
<p>The Unicode <em>Status</em> is either <em>Normative</em>, <em>Informative</em>, or <em>Provisional</em>, depending on whether it is a normative part of the standard, an informative part of the standard, or neither. We may also include <em>Deprecated</em> as a Unicode Status if the property is no longer to be used.</p>
<p>Properties which allow multiple property values have a <em>Delimiter</em> defined as “space” (<code>U+0020</code> <span class="name">SPACE</span>). Properties which do not have multiple property values have this defined as “N/A.” Some properties do not currently have multiple values in the data but may do so in the future.</p>
<p>For most properties with multiple values, the order of the values is arbitrary and has no particular significance. The most common order in such cases is alphabetical
or numerical.</p>
<p>Validation is done as follows: The entry is split into subentries using the <em>Delimiter</em> (if defined), and each subentry converted to Normalization Form D (NFD). The value is valid if and only if each normalized subentry matches the property’s <em>Syntax</em> regular expression. Note that any given property’s <em>Syntax</em> is not guaranteed to be stable and may change in the future.</p>
<p>Finally, the <em>Description</em> contains not only a description of what the property contains, but also source information, known limitations, methodology used in deriving the data, and so on.</p>
<h3>4.1 <a name="Jurchen" href="#Jurchen">Jurchen</a></h3>
<p>The properties covered in the table are:
<a href="#kJURC_NCReading">kJURC_NCReading</a>,
<a href="#kJURC_Numeric">kJURC_Numeric</a>,
<a href="#kJURC_RSUnicode">kJURC_RSUnicode</a>, and
<a href="#kJURC_Src">kJURC_Src</a>.</p>
<!-- START MAIN TABLE JURCHEN -->
<!-- kJUR_NCReading -->
<table summary="kJURC_NCReading" border="1" cellpadding="2" width="100%">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%"><a name="kJURC_NCReading" id="kJURC_NCReading"><strong>kJURC_NCReading</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Provisional</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%">Readings</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">18.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Syntax</td>
<td width="90%">
[^\t"]+</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Default</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">Reading given in Nǚzhēnwén Cídiǎn (Jīn), although it can
be expressed in any Unicode character the value is typically a single
string of Latin characters with optional parenthesis<br></td>
</tr>
</table><br>
<!-- kJURchenNumeric -->
<table summary="kJURC_Numeric" border="1" cellpadding="2" width="100%">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%"><a name="kJURC_Numeric" id="kJURC_Numeric"><strong>kJURC_Numeric</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Provisional</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%"><span class="removed">Other Data</span><span class="changed">Numeric
values</span></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">18.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%" style="height: 46px">Syntax</td>
<td width="90%" style="height: 46px">
[1-9]\d{0,4}</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%" style="height: 46px">Default</td>
<td width="90%" style="height: 46px">N</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">Numeric value of the Jurchen character. It only applies
to a few characters.<br></td>
</tr>
</table><br>
<!-- kJURC_RSUnicode -->
<table summary="kJURC_RSUnicode" border="1" cellpadding="2" width="100%">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%"><a name="kJURC_RSUnicode" id="kJURC_RSUnicode"><strong>kJURC_RSUnicode</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Provisional</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%"><span class="removed">Sources</span><span class="changed">Radical-Strokes</span></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">18.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Syntax</td>
<td width="90%">
[1-9]\d{0,1}\.[1-9]\d{0,1} </td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Default</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">The first number is the radical number, and the second
number is the total stroke count.</td>
</tr>
</table><br>
<!-- kJURC_Src -->
<table summary="kJUR_Src" border="1" cellpadding="2" width="100%">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%"><a name="kJURC_Src" id="kJURC_Src"><strong>kJURC_Src</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Normative</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%"><span class="removed">Description</span><span class="changed">Sources</span></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">18.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Syntax</td>
<td width="90%">
NC:\d{3}\.\d{2}(,\d{3}\.\d{2})?<br>
| SJ-B\:\d{3}[A-Z]\.\d<br>| JJ\:\d{3}<br>| N5131\:X-\d{4} </td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Default</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">The Jurchen sources are made of the following categories: <br>
<br>
NC Jīn Qǐzōng 金啓孮, Nǚzhēnwén Cídiǎn 女真文辞典 (Beijing: Wenwu chubanshe, 1984).<br>
The first number is the page number in Nǚzhēnwén Cídiǎn, the second number is the order of the entry on that page.<br>
There are multiple entries for some characters in the NC source,
but this document only references a single entry for each character.<br>
SJ-B Berlin copy of the Sino-Jurchen Vocabulary. <br>
The first number is the folio,
the second number is the position in # the folio.<br>
JJ Jin Guangping 金光平 and Jīn Qǐzōng 金啓孮, "Nüzhen Yuyan Wenzi Yanjiu" 真语言文字研究 (Beijing: Wenwu chubanshe, 1980).<br>
The number indicates the page reference.<br>
N5131-X Sun Bojun, Nie Hongyin, Jing Yongshi,<br>
"A Supplementary Proposal to Encode the Jurchen Characters in UCS" (WG2 N5131),<br>
The sequence number is defined in WG2 N5131.<br>
</td>
</tr>
</table><br>
<h3>4.2 <a name="Nushu" href="#Nushu">Nüshu</a></h3>
<p>The properties covered in the table are:
<a href="#kNSHU_DubenSrc">kNSHU_DubenSrc</a> and
<a href="#kNSHU_Reading">kNSHU_Reading</a>.</p>
<!-- START MAIN TABLE NUSHU -->
<!-- kNSHU_DubenSrc -->
<table summary="kNSHU_DubenSrc" border="1" cellpadding="2" width="100%">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%">
<a name="kNSHU_DubenSrc" id="kNSHU_DubenSrc"><strong>kNSHU_DubenSrc</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Normative</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%">Sources</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">10.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%" style="height: 46px">Syntax</td>
<td width="90%">[1-9]\d\.\d{2}</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%" style="height: 46px">Default</td>
<td width="90%" >N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">The only source documented in the file is Nǚshū Dúběn (NSDB)
女书读本 ‘Nüshu Reader’:<br>
the first number is the page number in the NDSB,<br>
the second number is the order of the item on that page.<br>
While other sources have been
mentioned in discussion about the proposal such as
Nüshu Yongzi Bijiaon[NSYZBJ] 女书用字比较 "A Comparison of characters used for
writing Women's Script", they are not documented in the data file.<br>
</td>
</tr>
</table><br>
<!-- kNSHU_Reading -->
<table summary="kNSHU_Reading" border="1" cellpadding="2" width="100%">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%">
<a name="kNSHU_Reading" id="kNSHU_Reading"><strong>kNSHU_Reading</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Provisional</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%">Readings</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">10.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Syntax</td>
<td width="90%">[a-z]+[1-9]\d{0,1}</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Default</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">Reading based on Nüshu Duben [NDSB], the numeric value after
ascii text indicates the tones in five-degree contour tone marks<br></td>
</tr>
</table><br>
<h3 class="changed">4.3 <a name="Seal" href="#Seal">Seal</a></h3>
<p class="changed">The properties covered in the table are:
<a href="#kSEAL_CCZSrc">kSEAL_CCZSrc</a>,
<a href="#kSEAL_DYCSrc">kSEAL_DYCSrc</a>,
<a href="#kSEAL_MCJK">kSEAL_MCJK</a>,
<a href="#kSEAL_QJZSrc">kSEAL_QJZSrc</a>,
<a href="#kSEAL_Rad">kSEAL_Rad</a>, and
<a href="#kSEAL_THXSrc">kSEAL_THXSrc</a>.</p>
<!-- START MAIN TABLE SEAL -->
<!-- kSEAL_CCZSrc -->
<table summary="kSEAL_CCZSrc" border="1" cellpadding="2" width="100%" class="changed">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%">
<a name="kSEAL_CCZSrc" id="kSEAL_CCZSrc"><strong>kSEAL_CCZSrc</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Normative</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%">Sources</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">18.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Syntax</td>
<td width="90%">C-\d{5}</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Default</td>
<td width="90%" >N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">Chenchangzhi CCZ ((陳昌治單行 本) source, one version of the
Daxu Ben Shuowen Jiezi. The so-called "newly added characters (新附字)"
specific to Daxu Ben versions are here numbered in the character
sequence as 5-digit numbers.<br>
</td>
</tr>
</table><br>
<!-- kSEAL_DYCSrc -->
<table summary="kSEAL_DYCSrc" border="1" cellpadding="2" width="100%" class="changed">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%">
<a name="kSEAL_DYCSrc" id="kSEAL_DYCSrc"><strong>kSEAL_DYCSrc</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Normative</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%">Sources</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">18.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Syntax</td>
<td width="90%">D-\d{5}</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Default</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">Duan Zhu (DYC) source created by Duan Yucai (段玉裁).<br></td>
</tr>
</table><br>
<!-- kSEAL_MCJK -->
<table summary="kSEAL_MCJK" border="1" cellpadding="2" width="100%" class="changed">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%">
<a name="kSEAL_MCJK" id="kSEAL_MCJK"><strong>kSEAL_MCJK</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Provisional</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%">Other data</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">18.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Syntax</td>
<td width="90%">[0-9A-F]{4,5}</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Default</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">Each Seal character is associated with a single CJK
Unified ideograph which may be used to refer to the Seal character. The
value corresponds to the code point of the CJK Unified ideograph in
hexadecimal notation. That ideograph value may be associated with
multiple Seal characters.<br></td>
</tr>
</table><br>
<!-- kSEAL_QJZSrc -->
<table summary="kSEAL_QJZSrc" border="1" cellpadding="2" width="100%" class="changed">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%">
<a name="kSEAL_QJZSrc" id="kSEAL_QJZSrc"><strong>kSEAL_QJZSrc</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Normative</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%">Sources</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">18.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Syntax</td>
<td width="90%">K-\d{5}</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Default</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">Qi Junzao (QJZ) (祁嶲藻刻本) source, one version of Xiaoxu
Ben, work originally authored by Xu Kai (徐鍇)<br></td>
</tr>
</table><br>
<!-- kSEAL_Rad -->
<table summary="kSEAL_Rad" border="1" cellpadding="2" width="100%" class="changed">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%">
<a name="kSEAL_Rad" id="kSEAL_Rad"><strong>kSEAL_Rad</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Provisional</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%">Other data</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">18.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Syntax</td>
<td width="90%">\d{1,3}\.[0-9A-F]{5}</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Default</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">These values provide the radical number associated with
each Seal character (1 to 540) along with the radical code point
expressed in hexadecimal notation, the two values separated by a dot. In
the Seal script, unlike many other scripts, the radicals are not encoded
separately because they are just regular Seal characters. They are
identified as being the first (and sometime unique) element of the
group constituted of all elements sharing the same radical value.<br></td>
</tr>
</table><br>
<!-- kSEAL_THXSrc -->
<table summary="kSEAL_THXSrc" border="1" cellpadding="2" width="100%" class="changed">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%">
<a name="kSEAL_THXSrc" id="kSEAL_THXSrc"><strong>kSEAL_THXSrc</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Normative</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%">Sources</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">18.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Syntax</td>
<td width="90%">TH-(\d{5}|X\d{3}|Y\d{3}</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Default</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">Tenghuaxie THX (額勒 布藤花榭本) source, one version of the
Daxu Ben Shuowen Jiezi. The so-called "newly added characters (新附字)"
specific to Daxu Ben versions are here numbered in the character
sequence as 3-digit numbers prefixed with 'X'. Because this source is
also used to enumerate all encoded Seal characters, including characters
not part of DaxuBen, those additional characters are
referenced by 3-digit numbers prefixed by 'Y'.<br></td>
</tr>
</table><br>
<h3>4.4 <a name="Tangut" href="#Tangut">Tangut</a></h3>
<p>The properties covered in the table are:
<a href="#kTGT_MergedSrc" class="changed">kTGT_MergedSrc</a>,
<a href="#kTGT_Numeric" class="changed">kTGT_Numeric</a>, and
<a href="#kTGT_RSUnicode" class="changed">kTGT_RSUnicode</a>.</p>
<!-- START MAIN TABLE TANGUT -->
< <!-- kTGT_MergedSrc -->
<table summary="kTGT_MergedSrc" border="1" cellpadding="2" width="100%">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%"><a name="kTGT_MergedSrc" id="kTGT_MergedSrc"><strong class="changed">kTGT_MergedSrc</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Normative</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%"><span class="removed">Description</span><span class="changed">Sources</span></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">9.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Syntax</td>
<td width="90%">
H2004-[AB]-\d{4}<br>
| H2021-\d{6}<br>
| L(19(86|97)<br>
| 20(06|12))-\d{4}<br>
| L2008-\d{4}([AB]|-\d{4})?<br>
| N1966-\d{3}-\d{2}[0-9A-Z]{1,2}<br>| N5217-\d{2}<br>
<span class="changed">| N5314-\d{2}</span><br>
| S1968-\d{4}<br>| UTN42-\d{3}</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Default</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">The Tangut sources are made of the following categories: <br>
<br>
H2004 = Hán Xiǎománg (韓小忙), 西夏文正字研究 (Xīxiàwén Zhèngzì Yánjiū)<br>
[Research into the Correct Forms of Tangut Characters]. 2004.<br>
H2021 = Hán Xiǎománg (韓小忙), 西夏文词典: 世俗文献部分 (Xīxiàwén Cídiǎn: Shìsú
Wénxiàn Bùfēn) <br> [Tangut Word Dictionary: Secular Literature
Part, 9 vols.]. 2021. WG2 N5286 2024-10-14.<br>
L1986 = Lǐ Fànwén (李範文), 同音研究 (Tóngyīn Yánjiū)
[Study of the Homophones]. Yinchuan. 1986.<br>
L1997 = Lǐ Fànwén (李範文), 夏漢字典 (Xià-Hàn Zìdiàn)<br>
[Tangut-Chinese Dictionary]. Beijing. 1997.<br>
L2006 = Lǐ Fànwén (李範文), 《五音切韵》与《文海宝韵》比较研究 (Wǔyīn Qiēyùn yǔ Wénhǎi
Bǎoyùn bǐjiào yánjiū),<br>
In 西夏研究 (Xīxià Yánjiū) [Western Xia Studies] no.2. Beijing. 2006<br>
L2008 = Lǐ Fànwén (李範文). 夏漢字典 (Xià-Hàn Zìdiàn)
[Tangut-Chinese Dictionary]. Beijing, 2008.<br>L2012 = Lǐ Fànwén,
2012 abridged edition, 2008 Tangut-Chinese Dictionary, cited in WG2 N
4724, page 2, 2014-04-21.<br>
N1966 = Nishida Tatsuo (西田龍雄), 西夏文小字典 (Seikabun Shōjiten) [Little Dictionary of Tangut],<br>
In 西夏語の研 究 (Seikago no kenkyū) [A Study of the Hsi-Hsia Language] (1964-1966) vol.2. Tokyo, 1966.<br>
N5217 = Andrew West, Proposal to encode 2 Tangut components and 28
Tangut ideographs,<br> WG2 N5217 = L2/23-149. 2023-10-02. <br>
<span class="changed">N5314 = Andrew West, Proposal to encode one newly-identified Tangut ideograph,<br>
WG2 N5314 = L2/25-165. 2025-05-26.</span><br>
S1968 = Sofronov M. V. (М. В. Софронов), Грамматика тангутского языка (Grammatika tangutskogo jazyka)<br>
[Grammar of the Tangut Language]. Moscow, 1968.<br>
UTN42 = Andrew West and Viacheslav Zaytsev, Tangut Character Additions and Glyph Corrections,<br>
Unicode Technical Note #42. 2019-12-21.<br>
</td>
</tr>
</table><br>
<!-- kTGT_Numeric -->
<table summary="kTGT_Numeric" border="1" cellpadding="2" width="100%" class="changed">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%"><a name="kTGT_Numeric" id="kTGT_Numeric"><strong>kTGT_Numeric</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Provisional</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%">Numeric values</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">18.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Syntax</td>
<td width="90%">
\d+(\.5)?<br></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Default</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">Numeric value of the Tangut character. It only applies
to a few characters. The main bibliographic information for this
property is: Jiǎ Chánɡyè 贾常业, ed. Xīxiàwén Zìdiǎn 西夏文字典 (Tangut
Dictionary). Lanzhou: 甘肃文化出版社 (Gansu Culture Press), 2019.5, ISBN 978-7-
5490-1785-0, p. 995 </td>
</tr>
</table><br>
<!-- kTGT_RSUnicode -->
<table summary="kRSTUnicode" border="1" cellpadding="2" width="100%">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%"><a name="kTGT_RSUnicode" id="kTGT_RSUnicode"><strong class="changed">kTGT_RSUnicode</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Provisional</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%"><span class="removed">Sources</span><span class="changed">Radical-Strokes</span></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">9.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Syntax</td>
<td width="90%">
[1-9]\d{0,2}\.[1-9]\d{0,1} <br></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Default</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">The first number is the component number, and the second
number is the total stroke count.</td>
</tr>
</table><br>
<!-- 5 -->
<h2>5 <a name="History" href="#History">History</a></h2>
<p><span class="changed">For two of the scripts, Nüshu, and Tangut, the</span> <span class="removed">The</span>
information presented in this document used to be partially located
in preambles attached to <span class="removed">each of </span> the <span class="changed">related</span> data file<span class="changed">s. These data files were created using data originally present in the encoding proposals and their updates.
The data file for Tangut was incorporated in Unicode 9.0 and in
Unicode 10.0 for Nüshu. </span>It was augmented by details found
in original encoding proposals for the covered scripts. <span class="changed">
Similarly, for the other two scripts, Jurchen and Seal,
which are being encoded in Unicode 18.0,
the information was directly extracted from the encoding proposals material.</span></p>
<h2 class="nonumber"><a name="References" href="#References">References</a></h2>
<p>For references for this annex, see Unicode Standard Annex #41, “<a href="https://www.unicode.org/reports/tr41/tr41-32.html">Common References for Unicode Standard Annexes</a>.”</p>
<h2><a name="Acknowledgements" href="#Acknowledgements">Acknowledgements</a></h2>
<p class="changed">
Andrew West (RIP) was the author of the encoding proposal for two scripts
covered by this annex, Jurchen and Tangut, and he provided most of the
information provided in this annex for these two scripts. For the other two
scripts<span class="changed">, Nüshu and Seal, the information gathering
was a collective effort from many experts</span>.</p>
<h2><a name="Modifications" href="#Modifications">Modifications</a></h2>
<h3 class="changed">Revision 2</h3>
<ul>
<li class="changed"><strong>Draft</strong> of the first version of UAX#60 for Unicode 18.0.0.</li>
<li class="changed">Changed the prefix for Tangut property names from 'TANG' to 'TGT'.</li>
<li class="changed">Added the new kTGT_Numeric property.</li>
<li class="changed">Added a new source for Tangut (N5314).</li>
<li class="changed">Added coverage for the Seal script.</li>
</ul>
<h3>Revision 1</h3>
<ul>
<li><strong>Proposed draft</strong> of the first version of UAX#60 for Unicode 18.0.0.</li>
</ul>
<p>Previous revisions will be accessed with the “Previous Version” link in the header
when appropriate.</p>
<hr width="50%">
<p class="copyright">© 2024–2026 Unicode, Inc. This publication is protected by copyright, and permission must be obtained from Unicode, Inc. prior to any reproduction, modification, or other use not permitted by the <a href="https://www.unicode.org/copyright.html">Terms of Use</a>. Specifically, you may make copies of this publication and may annotate and translate it solely for personal or internal business purposes and not for public distribution, provided that any such permitted copies and modifications fully reproduce all copyright and other legal notices contained in the original. You may not make copies of or modifications to this publication for public distribution, or incorporate it in whole or in part into any product or publication without the express written permission of Unicode.</p>
<p class="copyright">Use of all Unicode Products, including this publication, is governed by the Unicode <a href="https://www.unicode.org/copyright.html">Terms of Use</a>. The authors, contributors, and publishers have taken care in the preparation of this publication, but make no express or implied representation or warranty of any kind and assume no responsibility or liability for errors or omissions or for consequential or incidental damages that may arise therefrom. This publication is provided “AS-IS” without charge as a convenience to users.</p>
<p class="copyright">Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries.</p>
</div>
</body>
</html>
Rendered documentLive HTML preview