tr28-3.html
5995 lines<!doctype HTML
PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head><base href="https://www.unicode.org/reports/tr28/tr28-3.html">
<link rel="stylesheet" href="http://www.unicode.org/reports/reports.css" type="text/css">
<meta name="GENERATOR" content="Microsoft FrontPage 5.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<style type=text/css><!--
td.n { text-align: Center; vertical-align: top }
td.q { text-align: Center; width: 48px; border: 1px solid #FFFFFF; }
tt.n { font-size: 75% }
-->
</style>
<title>UTR #28: Unicode 3.2</title>
</head>
<body>
<table class="header" width="100%" cellspacing="0" cellpadding="0">
<tr>
<td class="icon"><a href="http://www.unicode.org"><img
align="middle"
alt="[Unicode]" border="0"
src="http://www.unicode.org/webscripts/logo60s2.gif" width="34"
height="33"></a> <a class="bar"
href="http://www.unicode.org/unicode/reports">Technical
Reports</a></td>
</tr>
<tr>
<td class="gray"> </td>
</tr>
</table>
<div class="body">
<h2 align="center">Unicode Standard Annex #28</h2>
<h1 align="center">Unicode 3.2</h1>
<table border="1" cellpadding="2" width="100%">
<tr>
<td height="23" valign="TOP" width="20%">Version</td>
<td valign="TOP" height="23">Unicode 3.2.0</td>
</tr>
<tr>
<td height="23" valign="TOP">Authors</td>
<td valign="TOP" height="23">Members of the Editorial Committee</td>
</tr>
<tr>
<td height="23" valign="TOP">Date</td>
<td valign="TOP" height="23">2002-03-27</td>
</tr>
<tr>
<td height="23" valign="TOP">This Version</td>
<td valign="TOP" height="23">
<a href="http://www.unicode.org/unicode/reports/tr28/tr28-3.html">
http://www.unicode.org/unicode/reports/tr28/tr28-3</a></td>
</tr>
<tr>
<td height="23" valign="TOP">Previous Version</td>
<td valign="TOP" height="23">
N/A</td>
</tr>
<tr>
<td height="23" valign="TOP">Latest Version</td>
<td valign="TOP" height="23"><a href="http://www.unicode.org/unicode/reports/tr28">
http://www.unicode.org/unicode/reports/tr28</a></td>
</tr>
<tr>
<td height="23" valign="TOP">Tracking Number</td>
<td valign="TOP" height="23"><a href="#tracking_number">3</a></td>
</tr>
</table>
<h3><i>Summary</i></h3>
<p><i><em>This document defines Version 3.2 of the Unicode Standard. </em></i></p>
<h3><i>Status</i></h3>
<p><i>This document has been reviewed by Unicode members and other interested
parties, and has been approved by the Unicode Technical Committee as a <b>
Unicode Standard Annex</b>. It is a stable document and may be used as reference
material or cited as a normative reference from another document.</i></p>
<blockquote>
<i><b>A Unicode Standard Annex (UAX)</b> forms an integral part of the Unicode Standard, but is published as a separate document. Note that conformance to a version of the Unicode Standard includes conformance to its Unicode Standard Annexes. The version number of a UAX document corresponds to the version number of the Unicode Standard at the last point that the UAX document was updated.</i></blockquote>
<p><i>A list of current Unicode Technical Reports is found on
<a href="http://www.unicode.org/unicode/reports/">
http://www.unicode.org/unicode/reports/</a>. For more information about versions
of the Unicode Standard, see
<a href="http://www.unicode.org/unicode/standard/versions/">
http://www.unicode.org/unicode/standard/versions/</a>.</i></p>
<p><i>The <a href="#references">References</a> provide related information that
is useful in understanding this document. Please mail corrigenda and other
comments to the author(s).</i></p>
<h3><i>Contents</i></h3>
<ul>
<li><a href="#description">I Description</a></li>
<li><a href="#conformance">II Conformance</a>
<ul>
<li><a href="#3_1_conformance">3.1 Conformance Requirements (revision)</a></li>
<li><a href="#3_6_decomposition">3.6 Decomposition (revision)</a></li>
<li><a href="#3_9_special_character_properties">3.9 Special Character
Properties (revision)</a></li>
<li><a href="#3_11_conjoining_jamo_behavior">3.11 Conjoining Jamo Behavior
(revision)</a></li>
<li><a href="#4_2_combining_classes_normative">4.2 Combining Classes—Normative
(revision)</a></li>
</ul>
</li>
<li><a href="#general_structure_and_guidelines">III General Structure and Guidelines</a>
<ul>
<li><a href="#2_2_unicode_design_principles">2.2 Unicode Design Principles
(addition) </a></li>
<li><a href="#5_15_locating_text_element_boundaries">5.15 Locating Text
Element Boundaries (revision)</a></li>
</ul>
</li>
<li><a href="#block">IV Block Descriptions</a>
<ul>
<li><a href="#6_1_general_punctuation">6.1 General Punctuation (addition)</a></li>
<li><a href="#7_2_greek">7.2 Greek (revision)</a></li>
<li><a href="#8_2_arabic">8.2 Arabic (addition)</a></li>
<li><a href="#9_15_khmer">9.15 Khmer (addition)</a></li>
<li><a href="#9_16_philippine_scripts">9.16 Philippine Scripts (new section)</a></li>
<li><a href="#10_1_han">10.1 Han (addition)</a></li>
<li><a href="#10_3_katakana">10.3 Katakana (addition)</a></li>
<li><a href="#10_4_hangul">10.4 Hangul (addition)</a></li>
<li><a href="#11_4_mongolian">11.4 Mongolian (addition)</a></li>
<li><a href="#11_4_mongolian">12.4 Mathematical Operators (additions)</a></li>
<li><a href="#12_5_technical_symbols">12.5 Technical Symbols (additions)</a></li>
<li><a href="#12_7_miscellaneous_symbols_and_dingbats">12.7 Miscellaneous
Symbols and Dingbats (new subsection, revision and addition)</a></li>
<li><a href="#12_12_standardized_variants_of_mathematical_symbols">12.12
Standardized Variants of Mathematical Symbols (new section)</a></li>
<li><a href="#13_2_layout_controls">13.2 Layout Controls (additions)</a></li>
<li><a href="#13_7_variation_selectors">13.7 Variation Selectors (new
section)</a></li>
<li><a href="#14.1_character_names_list">14.1 Character Names List
(addition)</a></li>
</ul>
</li>
<li><a href="#charts">V Code Charts</a></li>
<li><a href="#errata">VI Errata</a></li>
<li><a href="#database">VII Unicode Character Database Changes</a></li>
<li><a href="#relation">VIII Relation to 10646</a></li>
<li><a href="#references">IX References and Sources</a></li>
<li><a href="#Modifications">X Modifications</a></li>
</ul>
<hr align="LEFT">
<h2 class="bb"><a name="description">I Description</a></h2>
<p>Unicode 3.2 is a minor version of the Unicode Standard. It overrides certain
features of Unicode 3.1, and adds a significant number of coded characters. </p>
<h3>Recommended Citation Format for Unicode 3.2</h3>
<table border="1" cellspacing="0" cellpadding="4">
<tr>
<td width="100%">
<p class="small">The Unicode Consortium. The Unicode Standard, Version 3.2.0
is defined by <i>The Unicode Standard, Version 3.0</i> (Reading, MA,
Addison-Wesley, 2000. ISBN 0-201-61633-5), as amended by the <i>Unicode
Standard Annex #27: Unicode 3.1</i> (<a
href="http://www.unicode.org/unicode/reports/tr27/">http://www.unicode.org/reports/tr27/</a>)
and by the Unicode Standard Annex #28: <i>Unicode 3.2</i> (<a
href="http://www.unicode.org/reports/tr28/">http://www.unicode.org/reports/tr28/</a>).</td>
</tr>
</table>
<h3>Formal Definition of Unicode 3.2</h3>
<p>The Unicode Standard, Version 3.2.0 is defined by the following list.
The version numbering and the role of each component are explained in
<a href="http://www.unicode.org/unicode/standard/versions/">Versions of The
Unicode Standard</a>. The symbols in the change status column are explained in
the <a href="#ChangeStatusKey">key</a> below. A summary of modifications in the
Unicode Character Database for this version can be found in
<a href="http://www.unicode.org/Public/3.2-Update/UnicodeCharacterDatabase-3.2.0.html">
UnicodeCharacterDatabase-3.2.0.html</a>, together with a list of which data
files contain normative vs. informative data. </p>
<blockquote>
<table border="0" cellspacing="0" class="noborder" style="border-collapse: collapse" cellpadding="0">
<tr>
<th align="left" colspan="4" class="noborder">Major Reference</th>
</tr>
<tr>
<th align="left" class="noborder"></th>
<td colspan="2" class="noborder"></td>
<td class="noborder">The Unicode Consortium.
<a href="http://www.unicode.org/unicode/uni2book/u2.html">The Unicode
Standard, Version 3.0</a><br>
Reading, MA, Addison-Wesley Developers Press, 2000. ISBN 0-201-61633-5.</td>
</tr>
<tr>
<th align="left" colspan="4" class="noborder">Minor References</th>
</tr>
<tr>
<td class="noborder"></td>
<td colspan="2" class="noborder"></td>
<td class="noborder">UAX #27: Unicode 3.1</td>
</tr>
<tr>
<td class="noborder"></td>
<td colspan="2" class="noborder"></td>
<td class="noborder">UAX #28: Unicode 3.2</td>
</tr>
<tr>
<th align="left" colspan="4" class="noborder">Update Reference</th>
</tr>
<tr>
<td class="noborder"></td>
<td colspan="2" class="noborder"></td>
<td class="noborder"><b>n/a</b></td>
</tr>
<tr>
<th align="left" colspan="4" class="noborder">
<a href="http://www.unicode.org/unicode/reports/">Unicode Standard Annexes</a></th>
</tr>
<tr>
<td class="noborder"></td>
<td colspan="2" class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/unicode/reports/tr9/tr9-10.html">UAX
#9: The Bidirectional Algorithm, V3.2.0</a><br>
<a href="http://www.unicode.org/unicode/reports/tr11/tr11-10.html">UAX
#11: East Asian Width, V3.2.0</a><br>
<a href="http://www.unicode.org/unicode/reports/tr13/tr13-9.html">UAX #13:
Unicode Newline Guidelines, V3.2.0</a><br>
<a href="http://www.unicode.org/unicode/reports/tr14/tr14-12.html">UAX
#14: Line Breaking Properties, V3.2.0</a><br>
<a href="http://www.unicode.org/unicode/reports/tr15/tr15-22.html">UAX
#15: Unicode Normalization Forms, V3.2.0</a><br>
<a href="http://www.unicode.org/unicode/reports/tr19/tr19-9.html">UAX #19:
UTF-32, V3.2.0</a><br>
<a href="http://www.unicode.org/unicode/reports/tr21/tr21-5.html">UAX #21:
Case Mappings, V3.2.0</a></td>
</tr>
<tr>
<th align="left" colspan="4" class="noborder">Unicode Character Database</th>
</tr>
<tr>
<td class="noborder"></td>
<td colspan="2" class="noborder"></td>
<th align="left" class="noborder"><a href="http://www.unicode.org/Public/3.2-Update">
http://www.unicode.org/Public/3.2-Update</a>, or<br>
<a href="ftp://www.unicode.org/Public/3.2-Update/">
ftp://www.unicode.org/Public/3.2-Update/</a></th>
</tr>
<tr>
<td class="noborder"></td>
<td class="noborder"></td>
<th colspan="2" align="left" class="noborder">Documentation</th>
</tr>
<tr>
<td class="noborder"><i>T</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/DerivedProperties-3.2.0.html">
DerivedProperties-3.2.0.html</a></td>
</tr>
<tr>
<td class="noborder"><i>T</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder"><a href="http://www.unicode.org/Public/3.2-Update/Index-3.2.0.txt">
Index-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>T</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/NamesList-3.2.0.html">NamesList-3.2.0.html</a></td>
</tr>
<tr>
<td class="noborder"><i>T</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/PropList-3.2.0.html">
PropList-3.2.0.html</a></td>
</tr>
<tr>
<td class="noborder"><i>T</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder"><a href="http://www.unicode.org/Public/3.2-Update/ReadMe-3.2.0.txt">
ReadMe-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>T</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/UnicodeCharacterDatabase-3.2.0.html">
UnicodeCharacterDatabase-3.2.0.html</a></td>
</tr>
<tr>
<td class="noborder"><i>T</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/UnicodeData-3.2.0.html">
UnicodeData-3.2.0.html</a></td>
</tr>
<tr>
<td class="noborder"></td>
<td class="noborder"></td>
<th colspan="2" align="left" class="noborder">Core Data</th>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/ArabicShaping-3.2.0.txt">ArabicShaping-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/BidiMirroring-3.2.0.txt">BidiMirroring-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder"><a href="http://www.unicode.org/Public/3.2-Update/Blocks-3.2.0.txt">Blocks-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/CompositionExclusions-3.2.0.txt">CompositionExclusions-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/EastAsianWidth-3.2.0.txt">
EastAsianWidth-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>T</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder"><a href="http://www.unicode.org/Public/3.2-Update/Jamo-3.2.0.txt">
Jamo-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/LineBreak-3.2.0.txt">
LineBreak-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/NamesList-3.2.0.txt">
NamesList-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>N</i></td>
<td class="noborder"> </td>
<td class="noborder"> </td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/NormalizationCorrections-3.2.0.txt">
NormalizationCorrections-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>N</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/PropertyAliases-3.2.0.txt">
PropertyAliases-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>N</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/PropertyValueAliases-3.2.0.txt">
PropertyValueAliases-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder"><a href="http://www.unicode.org/Public/3.2-Update/PropList-3.2.0.txt">
PropList-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder"><a href="http://www.unicode.org/Public/3.2-Update/Scripts-3.2.0.txt">
Scripts-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/SpecialCasing-3.2.0.txt">
SpecialCasing-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>N</i></td>
<td class="noborder"> </td>
<td class="noborder"> </td>
<td class="noborder"><a href="http://www.unicode.org/Public/3.2-Update/StandardizedVariants-3.2.0.html">
StandardizedVariants-3.2.0.html</a></td>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/UnicodeData-3.2.0.txt">
UnicodeData-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/Unihan-3.2.0.txt">Unihan-3.2.0.txt</a>
(very large file, see
<a href="http://www.unicode.org/Public/3.2-Update/Unihan-3.2.0.zip">
Unihan-3.2.0.zip</a>)</td>
</tr>
<tr>
<td class="noborder"></td>
<td class="noborder"></td>
<th colspan="2" align="left" class="noborder">Derived Data</th>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/CaseFolding-3.2.0.txt">CaseFolding-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>N</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/DerivedAge-3.2.0.txt">DerivedAge-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/DerivedCoreProperties-3.2.0.txt">
DerivedCoreProperties-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/DerivedNormalizationProps-3.2.0.txt">DerivedNormalizationProps-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"></td>
<td class="noborder"></td>
<th colspan="2" align="left" class="noborder">Extracted Data</th>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/extracted/DerivedBidiClass-3.2.0.txt">
DerivedBidiClass-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/extracted/DerivedBinaryProperties-3.2.0.txt">
DerivedBinaryProperties-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/extracted/DerivedCombiningClass-3.2.0.txt">
DerivedCombiningClass-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/extracted/DerivedDecompositionType-3.2.0.txt">
DerivedDecompositionType-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/extracted/DerivedEastAsianWidth-3.2.0.txt">
DerivedEastAsianWidth-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/extracted/DerivedGeneralCategory-3.2.0.txt">
DerivedGeneralCategory-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/extracted/DerivedJoiningGroup-3.2.0.txt">
DerivedJoiningGroup-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/extracted/DerivedJoiningType-3.2.0.txt">
DerivedJoiningType-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/extracted/DerivedLineBreak-3.2.0.txt">
DerivedLineBreak-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/extracted/DerivedNumericType-3.2.0.txt">
DerivedNumericType-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"></td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/extracted/DerivedNumericValues-3.2.0.txt">
DerivedNumericValues-3.2.0.txt</a></td>
</tr>
<tr>
<td class="noborder"></td>
<td class="noborder"></td>
<th colspan="2" align="left" class="noborder">Conformance Test Data</th>
</tr>
<tr>
<td class="noborder"><i>D</i></td>
<td class="noborder"></td>
<td class="noborder"> </td>
<td class="noborder">
<a href="http://www.unicode.org/Public/3.2-Update/NormalizationTest-3.2.0.txt">
NormalizationTest-3.2.0.txt</a></td>
</tr>
</table>
<p><b><a name="ChangeStatusKey">Key:</a></b></p>
<table border="1" cellspacing="0" cellpadding="2">
<tr>
<td><i>N</i></td>
<td>New in this release</td>
</tr>
<tr>
<td><i>D</i></td>
<td>Data change (possibly also format/text change)</td>
</tr>
<tr>
<td><i>F</i></td>
<td>Data format change (possibly also text change)</td>
</tr>
<tr>
<td><i>T</i></td>
<td>Text annotation change</td>
</tr>
<tr>
<td><i>-</i></td>
<td>Unchanged</td>
</tr>
</table>
</blockquote>
<p>The list of contributory data files constituting the Unicode Standard,
Version 3.2 can also be found online at
<a href="http://www.unicode.org/standard/versions/enumeratedversions.html">Enumerated Versions</a>.</p>
<h3>New Character Allocations</h3>
<p>The primary feature of Unicode 3.2 is the addition of 1016 new encoded
characters. These additions consist of several Philippine scripts, a large
collection of mathematical symbols, and small sets of other letters and
symbols. </p>
<p>All of the newly encoded characters in Unicode 3.2 are additions to the Basic
Multilingual Plane (BMP). </p>
<p>Complete introductions to the newly encoded scripts and symbols can be found
in <a href="#block">Article IV, Block Descriptions</a>, below. </p>
<h3>Additional Features of Unicode 3.2</h3>
<p>Unicode 3.2 also features amended contributory data files, to bring the data
files up to date against the expanded repertoire of characters. A summary of the
revisions to the data files can be found in <a href="#database">Article VII,
Unicode Character Database Changes</a>. </p>
<p>All outstanding errata and corrigenda to the Unicode Standard are included in
this specification. Major corrigenda having a bearing on conformance to the
standard are listed in <a href="#conformance">Article II, Conformance</a>. Other
minor errata are listed in <a href="#errata">Article VI, Errata</a>. </p>
<p>Most notable among the corrigenda to the Standard is a further tightening of
the definition of UTF-8, to eliminate irregular UTF-8 and to bring the Unicode
specification of UTF-8 more completely into line with other specifications of
UTF-8. </p>
<p>The former UTR #21, Case Mappings has been upgraded in status to a Unicode
Standard Annex in Unicode 3.2. This means that
<a href="http://www.unicode.org/unicode/reports/tr21/tr21-5.html">UAX #21, Case
Mappings</a> is now formally a part of the Unicode Standard.</p>
<h3>Conventions Used in this Document</h3>
<p>The sections of this document are referred to as “articles” to prevent
confusion with references to sections of <i>The Unicode Standard, Version 3.0</i>.
In addition, the articles in this document are numbered with Roman numerals, to
highlight the distinction. The word “section” always refers to sections of <i>
The Unicode Standard, Version 3.0</i> or to a new section of the standard which
is added by this document. Page numbers also refer to <i>
<a href="http://www.unicode.org/unicode/uni2book/u2.html">The Unicode Standard, Version 3.0</a></i>.</p>
<p>New or replacement text for the standard is indicated with <u>underlined</u>
text, when this new text is a corrigendum of an existing section of the
standard.</p>
<p>Deleted text from the standard is indicated with <strike>struck-through</strike>
text.</p>
<p>In instances where entire new sections or subsections are to be added to the
standard, as for the block descriptions for newly encoded scripts or symbol
sets, new section numbers are provided that interleave reasonably with the
existing sections of the published Unicode 3.0 book. And for these added
sections, the text is not underlined, since the entire sections are new.</p>
<p>In this document, unambiguous dates of the current common era, such as 1999,
are unlabeled. In cases of ambiguity, CE is used. Dates before the common era
are labeled with BCE.</p>
<h2 class="bb"><a name="conformance">II Conformance</a></h2>
<h3><a name="3_1_conformance">3.1 Conformance Requirements (revision)</a></h3>
<h3>Elimination of Irregular Sequences </h3>
<p>The definition of transformation formats such as UTF-8 allowed conformant
processes to interpret certain sequences called <i>irregular</i> sequences.
These irregular sequences are those that would be produced by transforming
supplementary code points as if they were a sequence of two surrogate code
points.</p>
<p>To tighten the definitions, in Unicode 3.2 such irregular sequences are now
illegal. </p>
<p>Note: Some implementations of UTF-8 might still interpret irregular
sequences; for those, a separate compatibility encoding scheme, to be
distinguished from UTF-8, may be used. See <a href="http://www.unicode.org/reports/tr26/">Unicode Technical Report #26, “Compatibility
Encoding Scheme for UTF-16: 8-Bit (CESU-8).”</a> However, CESU-8 is not intended
nor recommended as an encoding used for open information exchange.</p>
<p>Terminology to distinguish <i>ill-formed</i>, <i>illegal</i>, and <i>
irregular</i> code unit sequences is no longer needed. There are no <i>irregular</i>
code unit sequences, and thus all <i>ill-formed</i> code unit sequences are <i>
illegal</i>. It is illegal to emit or interpret any <i>ill-formed</i> code unit
sequence. Unicode 4.0 will revise the terminology and conformance clauses in
light of this. For Unicode 3.2, only the minimal changes required of the text
are noted here.</p>
<p><i><b>Change C12 in Unicode 3.1 to:</b></i></p>
<table class="noborder" style="border-collapse: collapse" cellpadding="0" cellspacing="0">
<tr>
<td valign="top" align="center" class="noborder">C12</td>
<td valign="top" align="left" class="noborder">(a) When a process generates data in a Unicode
Transformation Format, it shall not emit ill-formed code unit sequences.<br>
(b) When a process interprets data in a Unicode Transformation Format, it
shall treat <strike>illegal</strike> <u>ill-formed</u> code unit sequences
as an error condition.<br>
(c) A conformant process shall not interpret <strike>illegal</strike> <u>
ill-formed</u> UTF code unit sequences as characters.<br>
<strike>(d) Irregular UTF code unit sequences shall not be used for encoding
any other information.</strike></td>
</tr>
</table>
<p><i><b>Change the fifth note after C12 in Unicode 3.1 to:</b></i></p>
<ul>
<li>Conformant processes cannot interpret <strike>illegal</strike> <u>
ill-formed</u> code unit sequences. However, the conformance clauses do not,
for example, prevent utility programs from operating on “mangled” text. For
example, a UTF-8 file could have had CRLF sequences introduced at every 80
bytes by a bad mailer program. This could result in some UTF-8 byte sequences
being interrupted by CRLFs, producing ill-formed byte sequences. This mangled
text is no longer UTF-8. It is permissible for a conformant program to repair
such text, recognizing that the mangled text was originally well-formed UTF-8
byte sequences. However, such repair of mangled data is a special case, and
must not be used in circumstances where it would cause security problems.</li>
</ul>
<p><i><b>Change Table 3.1B after C12 in Unicode 3.1 by splitting the row
U+1000..U+FFFF to exclude the surrogate code points:</b></i></p>
<div align="center">
<center>
<table cellspacing="0" cellpadding="4" border="1">
<caption><b>Table 3.1B. Legal UTF-8 Byte Sequences</b></caption>
<tr>
<th style="BACKGROUND-COLOR: #990000" width="10%" bgcolor="#cccccc">
<font color="#ffffff"> Code Points</font></th>
<th style="BACKGROUND-COLOR: #990000" width="10%"><font color="#ffffff">
1st Byte</font></th>
<th style="BACKGROUND-COLOR: #990000" width="10%"><font color="#ffffff">
2nd Byte</font></th>
<th style="BACKGROUND-COLOR: #990000" width="10%"><font color="#ffffff">
3rd Byte</font></th>
<th style="BACKGROUND-COLOR: #990000" width="10%"><font color="#ffffff">
4th Byte</font></th>
</tr>
<tr>
<th style="BACKGROUND-COLOR: #990000" width="10%"><tt>
<font color="#ffffff">U+0000..U+007F</font></tt></th>
<td width="10%"><tt>00..7F</tt></td>
<td width="10%"><tt> </tt></td>
<td width="10%"><tt> </tt></td>
<td width="10%"><tt> </tt></td>
</tr>
<tr>
<th style="BACKGROUND-COLOR: #990000" width="10%"><tt>
<font color="#ffffff">U+0080..U+07FF</font></tt></th>
<td width="10%"><tt>C2..DF</tt></td>
<td width="10%"><tt>80..BF </tt></td>
<td width="10%"><tt> </tt></td>
<td width="10%"><tt> </tt></td>
</tr>
<tr>
<th style="BACKGROUND-COLOR: #990000" width="10%"><tt>
<font color="#ffffff">U+0800..U+0FFF</font></tt></th>
<td width="10%"><tt>E0</tt></td>
<td width="10%"><tt><u>A0</u>..BF</tt></td>
<td width="10%"><tt>80..BF </tt></td>
<td width="10%"><tt> </tt></td>
</tr>
<tr>
<th style="BACKGROUND-COLOR: #990000" width="10%"><tt>
<font color="#ffffff"><u>U+1000..U+CFFF</u></font></tt></th>
<td width="10%"><tt>E1..EC</tt></td>
<td width="10%"><tt>80..BF</tt></td>
<td width="10%"><tt>80..BF </tt></td>
<td width="10%"><tt> </tt></td>
</tr>
<tr>
<th style="BACKGROUND-COLOR: #990000" width="10%"><tt>
<font color="#ffffff"><u>U+D000..U+D7FF</u></font></tt></th>
<td width="10%"><tt>ED</tt></td>
<td width="10%"><tt>80..<u>9F</u></tt></td>
<td width="10%"><tt>80..BF </tt></td>
<td width="10%"><tt> </tt></td>
</tr>
<tr>
<th style="BACKGROUND-COLOR: #990000" width="10%"><tt>
<font color="#ffffff"><u>U+D800..U+DFFF</u></font></tt></th>
<td width="40%" colspan="4"><tt>ill-formed</tt></td>
</tr>
<tr>
<th style="BACKGROUND-COLOR: #990000" width="10%"><tt>
<font color="#ffffff"><u>U+E000..U+FFFF</u></font></tt></th>
<td width="10%"><tt>EE..EF</tt></td>
<td width="10%"><tt>80..BF</tt></td>
<td width="10%"><tt>80..BF </tt></td>
<td width="10%"><tt> </tt></td>
</tr>
<tr>
<th style="BACKGROUND-COLOR: #990000" width="10%"><tt>
<font color="#ffffff">U+10000..U+3FFFF</font></tt></th>
<td width="10%"><tt>F0</tt></td>
<td width="10%"><tt><u>90</u>..BF</tt></td>
<td width="10%"><tt>80..BF</tt></td>
<td width="10%"><tt>80..BF</tt></td>
</tr>
<tr>
<th style="BACKGROUND-COLOR: #990000" width="10%"><tt>
<font color="#ffffff">U+40000..U+FFFFF</font></tt></th>
<td width="10%"><tt>F1..F3</tt></td>
<td width="10%"><tt>80..BF</tt></td>
<td width="10%"><tt>80..BF</tt></td>
<td width="10%"><tt>80..BF</tt></td>
</tr>
<tr>
<th style="BACKGROUND-COLOR: #990000" width="10%"><tt>
<font color="#ffffff">U+100000..U+10FFFF</font></tt></th>
<td width="10%"><tt>F4</tt></td>
<td width="10%"><tt>80..<u>8F</u></tt></td>
<td width="10%"><tt>80..BF </tt></td>
<td width="10%"><tt>80..BF</tt></td>
</tr>
</table>
</center>
</div>
<h3><a name="3_6_decomposition">3.6 Decomposition</a> (revision)</h3>
<p>The text of D21 is replaced by the following text:</p>
<p>D21 <i>Compatibility decomposable character</i>: a character whose compatibility
decomposition is not identical to its canonical decomposition. It may also be
known as a <i>compatibility precomposed</i> character or a <i>compatibility
composite</i> character.</p>
<ul>
<li>For example:
<ul>
<li>U+00B5 MICRO SIGN has no canonical decomposition mapping, so its
canonical decomposition is the same as the character itself. It has a
compatibility decomposition to U+03BC GREEK SMALL LETTER MU. Because MICRO
SIGN has a compatibility decomposition that is not equal to its canonical
decomposition, it is a compatibility decomposable character.</li>
<li>U+03D3 GREEK UPSILON WITH ACUTE AND HOOK SYMBOL canonically decomposes
to the sequence <U+03D2 GREEK UPSILON WITH HOOK SYMBOL, U+0301
COMBINING ACUTE ACCENT>. That sequence has a compatibility decomposition of
<U+03A5 GREEK CAPITAL LETTER UPSILON, U+0301 COMBINING ACUTE ACCENT>.
Because GREEK UPSILON WITH ACUTE AND HOOK SYMBOL has a compatibility
decomposition that is not equal to its canonical decomposition, it is a
compatibility decomposable character.</li>
</ul>
</li>
<li>This should not be confused with the term “compatibility character”, which
is discussed in <i>Section 2.2,</i> <i>Unicode Design Principles</i> in <i>The
Unicode Standard, Version 3.0</i>.</li>
<li>Compatibility composites are a subset of compatibility characters included
in the Unicode Standard to represent distinctions in other base standards.
They support transmission and processing of legacy data. Their use is
discouraged other than for legacy data or other special circumstances.</li>
<li>Replacing a compatibility decomposable character by its compatibility decomposition may
lose round-trip convertibility with a base standard.</li>
</ul>
<p>Add the following new text after D23:</p>
<p>D23a <i>Canonical decomposable character</i>: a character which is not
identical to its canonical decomposition. It may also be known as a <i>canonical
precomposed</i> character or a <i>canonical composite</i> character.</p>
<ul>
<li>For example: U+00E0 LATIN SMALL LETTER A WITH GRAVE is a canonical
decomposable character because its canonical decomposition is to the two
characters U+0061 LATIN SMALL LETTER A and U+0300 COMBINING GRAVE ACCENT.
U+212A KELVIN SIGN is a canonical decomposable character because its canonical
decomposition is to U+004B LATIN CAPITAL LETTER K.</li>
</ul>
<h3><a name="3_9_special_character_properties">3.9 Special Character Properties</a>
(revision)</h3>
<h3>Replacing ZWNBSP with Word Joiner</h3>
<p>The character U+2060 has been added to the standard to allow unambiguous
expression of the word-joining semantics. U+2060 WORD JOINER is now the
preferred character to express the word-joining semantics implied by the ZWNBSP.
The availability of U+2060 makes it unnecessary to use U+FEFF as a zero-width
non-breaking space, allowing U+FEFF to be used solely with the semantic of BOM.
For more information, see the subsection on “Word Joiner” in <i>
<a href="#13_2_layout_controls">Section 13.2, Layout Controls</a></i> in this
document.</p>
<p>Note: Implementers are strongly encouraged to use word joiner in those
circumstances whenever word joining semantics is intended.</p>
<h3>Additions to Properties</h3>
<p>A number of characters which have special character properties have been added in the Unicode Standard, Version 3.2. To reflect this, the following changes
are made to the special character properties listing, on pages 48-50 of <i>The
Unicode Standard, Version 3.0</i>:</p>
<p>In the entry for “Line boundary control”, add:</p>
<p>205F MEDIUM MATHEMATICAL SPACE<br>
2060 WORD JOINER</p>
<p>Change the name of the “Joining” entry to “Cursive joining and ligation
control”.</p>
<p>Add a new entry called “Grapheme joining” after the renamed entry for
“Cursive joining and ligation control” and add to that new entry:</p>
<p>034F COMBINING GRAPHEME JOINER</p>
<p>Add a new entry called “Mathematical expression formatting” after the entry
“Bidirectional ordering” and add to that new entry:</p>
<p>2061 FUNCTION APPLICATION<br>
2062 INVISIBLE TIMES<br>
2063 INVISIBLE SEPARATOR</p>
<p dir="ltr">Change the name of the “Alternate formatting” entry to “Deprecated
alternate formatting”.</p>
<p dir="ltr">Change the name of the “Syriac abbreviation” entry to “Prefixed
format control” and add to that entry:</p>
<p dir="ltr">06DD ARABIC END OF AYAH</p>
<p>Change the name of the “Indic dead-character formation” entry to
“Brahmi-derived script dead-character formation” and add to that entry:</p>
<p>1714 TAGALOG SIGN VIRAMA<br>
1734 HANUNOO SIGN PAMUDPOD</p>
<p>Change the name of the “Mongolian variant selectors” entry to “Mongolian
variation selectors”.</p>
<p>After the “Mongolian variation selectors” entry add a new entry “Generic
variation selectors” and add to that new entry:</p>
<p>FE00 VARIATION SELECTOR-1<br>
FE01 VARIATION SELECTOR-2<br>
FE02 VARIATION SELECTOR-3<br>
FE03 VARIATION SELECTOR-4<br>
FE04 VARIATION SELECTOR-5<br>
FE05 VARIATION SELECTOR-6<br>
FE06 VARIATION SELECTOR-7<br>
FE07 VARIATION SELECTOR-8<br>
FE08 VARIATION SELECTOR-9<br>
FE09 VARIATION SELECTOR-10<br>
FE0A VARIATION SELECTOR-11<br>
FE0B VARIATION SELECTOR-12<br>
FE0C VARIATION SELECTOR-13<br>
FE0D VARIATION SELECTOR-14<br>
FE0E VARIATION SELECTOR-15<br>
FE0F VARIATION SELECTOR-16</p>
<h3>Application of Combining Marks</h3>
<p>Formally speaking, combining marks apply to the preceding grapheme cluster. In most cases, this is the same as
applying to the preceding base character. However, in two circumstances there is
a difference: </p>
<ul>
<li><i>Hangul syllables</i> </li>
<li><i>Enclosing combining marks</i></li>
</ul>
<p><b><i>Hangul Syllables.</i> </b>Where a grapheme cluster contains a Hangul
syllable, the combining mark applies to the entire syllable. For example, in the
following sequence the <i>grave</i> is applied to the entire Hangul syllable,
not just the last jamo:</p>
<ul>
<li>U+1100 HANGUL CHOSEONG KIYEOK </li>
<li>U+1161 HANGUL JUNGSEONG A </li>
<li>U+0300 COMBINING GRAVE ACCENT</li>
</ul>
<p><b><i>Enclosing Combining Marks.</i> </b>These marks enclose the entire
preceding grapheme cluster. For example, in the following sequence the entire
Hangul syllable is circled, not just part of it:</p>
<ul>
<li>U+1100 HANGUL CHOSEONG KIYEOK </li>
<li>U+1161 HANGUL JUNGSEONG A </li>
<li>U+20DD COMBINING ENCLOSING CIRCLE</li>
</ul>
<p>This is also true of grapheme clusters composed of elements linked by a
Grapheme_Link or <i>combining grapheme joiner</i>. For example, the entire conjunct is circled in the following
sequence: </p>
<ul>
<li>U+0915 DEVANAGARI LETTER KA </li>
<li>U+094D DEVANAGARI SIGN VIRAMA </li>
<li>U+0922 DEVANAGARI LETTER DDHA </li>
<li>U+20DD COMBINING ENCLOSING CIRCLE</li>
</ul>
<p>On the other hand, where elements are linked by a Grapheme_Link or combining
grapheme joiner, <i>
non-enclosing</i> combining marks <i>only</i> apply to the last base character.
For example, in the following sequence the <i>nukta</i> applies to the
immediately preceding <i>ddha</i>, not to the entire cluster:</p>
<ul>
<li>U+0915 DEVANAGARI LETTER KA </li>
<li>U+094D DEVANAGARI SIGN VIRAMA </li>
<li>U+0922 DEVANAGARI LETTER DDHA </li>
<li>U+093C DEVANAGARI SIGN NUKTA</li>
</ul>
<p>For more information, see the subsection on “Combining Grapheme Joiner” in
<i><a href="#13_2_layout_controls">Section 13.2, Layout Controls</a></i> in this
document.</p>
<h3><a name="3_11_conjoining_jamo_behavior">3.11 Conjoining Jamo Behavior</a>
(revision)</h3>
<p>The following text replaces the text and tables for this section on pages
52-53 of <i>The Unicode Standard, Version 3.0</i>:</p>
<p>The Unicode Standard contains both a large set of precomposed modern Hangul
syllables and a set of conjoining Hangul jamo, which can be used to encode
archaic syllable blocks as well as modern syllable blocks. This section
describes how to:</p>
<ul>
<li>Determine the syllable boundaries in a sequence of conjoining jamo
characters</li>
<li>Compose jamo characters into precomposed Hangul syllables</li>
<li>Determine the canonical decomposition of precomposed Hangul syllables</li>
<li>Algorithmically determine the names of precomposed Hangul syllables</li>
</ul>
<p>For more information, see the “Hangul Syllables” and “Hangul Jamo”
subsections in <i>Section 10.4, Hangul</i> in <i>The Unicode Standard, Version
3.0</i>. Hangul syllables are a special case of grapheme clusters.</p>
<p>The jamo characters can be classified into three sets of characters: <i>
choseong</i> (leading consonants, or syllable-initial characters), <i>jungseong</i>
(vowels, or syllable-peak characters), and <i>jongseong</i> (trailing
consonants, or syllable-final characters). In the following discussion, these
jamo are abbreviated as <i>L</i> (leading consonant), <i>V</i> (vowel), and <i>T</i>
(trailing consonant); syllable breaks are shown by <i>middle dots</i> “·”;
non-syllable breaks are shown by “×”, combining marks are shown by M, and non-jamo
are shown by <i>X</i>.</p>
<p>In the following discussion, a <i>syllable</i> refers to a sequence of Korean
characters that should be grouped into a single cell for display. This is
different from a <i>precomposed Hangul syllable</i>, which consists of any of
the characters in the range U+AC00..U+D7A3. Note that a syllable may contain a
precomposed Hangul syllable <i>plus</i> other characters.</p>
<h3>Syllable Boundaries</h3>
<p>In rendering, a sequence of jamos is displayed as a series of syllable
blocks. The following rules specify how to divide up an arbitrary sequence of
jamos (including nonstandard sequences) into these syllable blocks. In these
rules, a <i>choseong filler</i> (<i>L<sub>f </sub></i>) is treated as a <i>
choseong</i> character, and a <i>jungseong filler</i> (<i>V</i><i><sub>f </sub>
</i>) is treated as a <i>jungseong</i>.</p>
<p>The precomposed Hangul syllables are of two types: <i>LV</i> or <i>LVT</i>.
In determining the syllable boundaries, the LV behave as if they were a sequence
of jamo L V, and the LVT behave as if they were a sequence of jamo L V T.</p>
<p>Within any sequence of characters, a syllable break never occurs between the
pairs of characters shown in <i>Table 3-5</i>. In all other cases, there is a
syllable break before and after any jamo or precomposed Hangul syllable. Note
that like other characters, any combining mark between two conjoining jamos
prevents the jamos from forming a syllable.</p>
<p align="center"><b>Table 3-5. Hangul Syllable No-Break Rules</b></p>
<div align="center">
<center>
<table border="2" cellpadding="2" cellspacing="0">
<tr>
<td colspan="2"><b>Do Not Break Between</b></td>
<td><b>Examples</b></td>
</tr>
<tr>
<td>L</td>
<td>L, V, or precomposed<br>
Hangul syllable</td>
<td>L × L<br>
L× V<br>
L × LV<br>
L × LVT</td>
</tr>
<tr>
<td>V or LV</td>
<td>V or T </td>
<td>V × V<br>
V × T<br>
LV × V<br>
LV × T</td>
</tr>
<tr>
<td>T or LVT</td>
<td>T</td>
<td>T × T<br>
LVT × T</td>
</tr>
<tr>
<td>Jamo or<br>
precomposed<br>
Hangul syllable</td>
<td>Combining marks</td>
<td>L × M<br>
V × M<br>
T × M<br>
LV × M<br>
LVT × M</td>
</tr>
</table>
</center>
</div>
<p>Note that even in normalization form NFC, a syllable may contain a
precomposed Hangul syllable in the middle. An example is “L LVT T”. Each
well-formed modern Hangul syllable, however, can be represented in the form L V
T? (that is one L, one V and optionally one T), and is a single character in NFC.</p>
<p>For information on the behavior of Hangul compatibility jamo in syllables,
see <i>Section 10.4, Hangul</i> in <i>The Unicode Standard, Version 3.0</i>.</p>
<h3>Standard Korean Syllables</h3>
<p>A standard Korean syllable block is composed of a sequence of one or more <i>
L</i> followed by a sequence of one or more <i>V</i> and optionally a sequence
of zero or more <i>T</i>. A sequence of nonstandard syllable blocks can be
transformed into a sequence of standard Korean syllable blocks by inserting <i>
choseong</i> fillers (<i>L<sub>f </sub></i>) and <i>jungseong</i> fillers (<i>V<sub>f
</sub></i>).</p>
<p>Using regular expression notation, a standard Korean syllable is thus of the
form:</p>
<p>L+ V+ T*</p>
<p>The transformation of a string of text into standard Korean syllables is
performed by determining the syllable breaks as explained in the subsection on
“Syllable Boundaries” earlier in this section, then inserting one or two fillers
as necessary to transform each syllable into a standard Korean syllable. Thus:</p>
<p>L ^V → L V<sub>f</sub> ^V<br>
^L V → ^L L<sub>f</sub> V<br>
^V T → ^V L<sub>f</sub> V<sub>f</sub> T</p>
<p>where ^X indicates a character that is not X, or the absence of a character.</p>
<p><i><b>Examples.</b></i> In <i>Table 3-6</i>, the first row shows syllable
breaks in a standard sequence, the second row shows syllable breaks in a
nonstandard sequence, and the third row shows how the sequence in the second row
could be transformed into standard form by inserting fillers into each syllable.
</p>
<p align="center"><b>Table 3-6. Syllable Break Examples</b></p>
<div align="center">
<center>
<table border="2" cellpadding="2" cellspacing="0">
<tr>
<td align="left">
<p align="left">No. </td>
<td align="left">Sequence</td>
<td align="left"> </td>
<td align="left">Sequence with Syllable Breaks Marked</td>
</tr>
<tr>
<td align="left">
<p align="left">1 </td>
<td align="left">
<p align="left">LVTLVLVLV<sub>f</sub>L<sub>f</sub>VL<sub>f</sub>V<sub>f</sub>T</td>
<td align="left">→ </td>
<td align="left">LVT · LV · LV · LV<sub>f</sub> · L<sub>f</sub>V · L<sub>f</sub>V<sub>f</sub>T</td>
</tr>
<tr>
<td align="left">
<p align="left">2</td>
<td align="left">LLTTVVTTVVLLVV</td>
<td align="left">→</td>
<td align="left">LL · TT · VVTT · VV · LL · LLVV</td>
</tr>
<tr>
<td align="left">
<p align="left">3</td>
<td align="left">LLTTVVTTVVLLVV</td>
<td align="left">→ </td>
<td align="left">LLV<sub>f</sub> · L<sub>f</sub>V<sub>f</sub>TT · L<sub>f</sub>VVTT
· L<sub>f</sub>VV · LLV<sub>f</sub> · LLVV</td>
</tr>
</table>
</center>
</div>
<h3><a name="4_2_combining_classes_normative">4.2 Combining Classes—Normative</a> (revision)</h3>
<p>Remove the entry for U+06DD ARABIC END OF AYAH from <i>Table 4-3, Combining
Classes</i> on page 80 of <i>The Unicode Standard, Version 3.0</i>.</p>
<h3>Unicode Standard Annex #15, “Unicode Normalization Forms” (revision)</h3>
<p>In Corrigendum #3 the canonical mapping for U+F951 has been corrected. For
more information, see <a href="http://www.unicode.org/unicode/reports/tr15/">Unicode
Standard Annex #15, “Unicode Normalization Forms”</a>.</p>
<h2 class="bb"><a name="general_structure_and_guidelines">III General Structure
and Guidelines</a></h2>
<h3><a name="2_2_unicode_design_principles">2.2 Unicode Design Principles</a>
(addition) </h3>
<p>Add the following text to page 18 of<i> The Unicode Standard, Version 3.0 </i>just before
the subsection on “Convertibility”:</p>
<p><i><b>Decompositions</b></i></p>
<p>Precomposed characters are formally known as decomposables, because they have
decompositions to one or more other characters. There are two types of
decompositions:</p>
<ul>
<li><b>Canonical.</b> The character and its decomposition should be treated as
essentially equivalent.</li>
<li><b>Compatibility. </b>The decomposition may remove some information
(typically formatting information) that is important to preserve in
particular contexts. By definition, compatibility decomposition is a
superset of canonical decomposition.</li>
</ul>
<p>Thus there are three types of characters, based on their decomposition
behavior:</p>
<ul>
<li><b>Nondecomposable. </b>The character has no decomposition: neither
canonical nor compatibility.</li>
<li><b>Canonical Decomposable. </b>The character has a distinct canonical
decomposition.</li>
<li><b>Compatibility Decomposable. </b>The character has a distinct
compatibility decomposition.</li>
</ul>
<p>The following figure illustrates these three types. The solid arrows indicate canonical decompositions, and the dotted arrows indicate compatibility decompositions. If an arrow loops back and points to the character itself, that indicates that there is no decomposition of that type (other than in the trivial sense of a character
“decomposing” to itself).</p>
<p>The figure illustrates two important things to keep in mind:</p>
<ul>
<li>Decompositions may be to single characters <i>or</i> to
sequences of characters. Decompositions to a single character,
also known as <i>singleton decompositions,</i> are seen
for the <i>ohm sign</i> and the <i>halfwidth katakana
ka</i> in the figure. Because of examples like these,
decomposable characters in Unicode do not always consist of
obvious, separate parts; one can only know their status by
examining the data tables for the standard.</li>
<li>There are a very small number of characters that are both
canonical <i>and</i> compatibility decomposable. The
example shown in the figure is for the Greek hooked upsilon
symbol with an acute accent. It has a canonical decomposition
to one sequence and a compatibility decomposition to a different
sequence.</li>
</ul>
<p>For more precise definitions of some of these terms, see <i>Chapter 3,
Conformance</i> in <i>The Unicode Standard, Version 3.0</i>.</p>
<div align="center">
<center>
<table border="1" cellspacing="0" cellpadding="8" style="page-break-before:always">
<tr>
<th colspan="2" style="text-align: center"><font size="4">Nondecomposables</font>
<p>
<img border="0" src="nondecomp.gif" alt="nondecomposable example" width="289" height="179"></th>
</tr>
<tr>
<th style="text-align: center"><font size="4">Canonical Decomposables</font>
<p>
<img border="0" src="cdecomp.gif" alt="canonical decomposable example" width="289" height="179"></p>
<p>
<img border="0" src="cdecomp2.gif" alt="canonical decomposable example" width="289" height="179"></p>
<p>
<img border="0" src="ckdecomp.gif" alt="canonical decomposable example" width="289" height="179"></th>
<th style="text-align: center"><font size="4">Compatibility Decomposables</font>
<p>
<img border="0" src="kdecomp.gif" alt="compatibility decomposable example" width="289" height="179"></p>
<p>
<img border="0" src="kdecomp2.gif" alt="compatibility decomposable example" width="289" height="179"></p>
<p>
<img border="0" src="ckdecomp.gif" alt="compatibility decomposable example" width="289" height="179"></th>
</tr>
</table>
</center>
</div>
<h3><a name="5_15_locating_text_element_boundaries">5.15 Locating Text Element
Boundaries</a> (revision)</h3>
<p>Add the following text after bullet item 6 on page 125 of<i> The Unicode
Standard, Version 3.0</i>:<i><br>
<br>
</i>The rules are applied in order. That is, there is an implicit “otherwise” at
the front of each rule following the first. It is possible to construct
alternate sets of such rules that are fully equivalent; that is, they have the
same effect.</p>
<p>Note: The rules for default grapheme cluster boundaries, default word boundaries and default sentence
boundaries are in the process of being superseded by a new
<a href="http://www.unicode.org/unicode/reports/tr29/">Unicode Technical
Report #29, Text Boundaries</a>.</p>
<h2 class="bb"><a name="block">IV Block Descriptions</a></h2>
<p>Note: The numbering used here for block descriptions and revised text follows
<i>The Unicode Standard, Version 3.0</i> for ease of cross-reference.</p>
<h3><a name="6_1_general_punctuation">6.1 General Punctuation</a> (addition)</h3>
<p><i><b>Invisible Operators</b></i>. In mathematics some operators or
punctuation are often implied, but not displayed. U+2063 INVISIBLE SEPARATOR or
<i>invisible comma</i> is intended for use in index expressions and other
mathematical notation where two adjacent variables form a list and are not
implicitly multiplied. In mathematical notation, commas are not always
explicitly present, but need to be indicated for symbolic calculation software
to help it disambiguate a sequence from a multiplication. For example, the
double <i><sub>ij</sub></i> subscript in the variable <i>a<sub>ij</sub></i>
means <i>a<sub>i</sub></i><sub>, <i>j </i></sub>— that is, the <i>i</i> and <i>j</i>
are separate indices and not a single variable with the name <i>ij</i> or even
the product of <i>i</i> and <i>j</i>. Accordingly to represent the implied list
separation in the subscript <i><sub>ij</sub></i> one can insert a nondisplaying
<i>invisible separator</i> between the <i>i</i> and the <i>j</i>. In addition,
use of the invisible comma would hint to a math layout program to typeset a
small space between the variables.</p>
<p>Similarly an expression like <i>mc</i><sup>2</sup> implies that the mass <i>m</i>
multiplies the square of the speed <i>c</i>. To represent the implied
multiplication in <i>mc</i><sup>2</sup>, one inserts a nondisplaying U+2061
INVISIBLE TIMES between the <i>m</i> and the <i>c</i>. A related case is the use
of U+2062 FUNCTION APPLICATION for an implied function dependence as in <i>f</i>(<i>x</i>
+ <i>y</i>). To indicate that this is the function <i>f</i> of the quantity <i>x</i>
+ <i>y</i> and not the expression <i>fx</i> + <i>fy</i>, one can insert the
nondisplaying <i>function application symbol</i> between the <i>f</i> and the
left parenthesis. </p>
<p>Another example is the expression <i>f <sup>ij</sup></i>(cos(<i>ab</i>)),
which means the same as <i>f<sup>ij</sup></i>(cos(<i>a</i>×<i>b</i>)), where ×
represents <i>multiplication</i>, not the <i>cross product</i>. Note that the
spacing between characters may also depend on whether the adjacent variables are
part of a list or are to be concatenated, that is, multiplied.</p>
<p>A more complete discussion of mathematical notation can be found in
<a href="http://www.unicode.org/reports/tr25/">Proposed Draft Unicode Technical Report #25, “Unicode Support
for Mathematics.”</a></p>
<p><i><b>Commercial Minus.</b></i> U+2052 COMMERCIAL MINUS SIGN is used in
commercial or tax related forms or publications in several European countries,
including Germany and Scandinavia. The string “./.” appears to be used as a
fallback representation for this character.</p>
<p>The symbol may also appear as a marginal note in letters, denoting
enclosures. One variation replaces the top dot with a digit indicating the
number of enclosures.</p>
<p>An additional usage of the sign appears in the Finno-Ugric Phonetic Alphabet
(FUPA), where it marks a structurally-related borrowed element of different
pronunciation. In Finland and a number of other European countries, the dingbats
<img src="U-2052.jpg" alt="U+2052" width="21" height="19"> and
<img src="U-2713.jpg" alt="U+2713" width="16" height="19"> are used for “correct” and “incorrect”
respectively in marking a student’s paper. This contrasts with American
practice, for example, where
<img src="U-2713.jpg" alt="U+2713" width="16" height="19"> and
<img src="U-2717.jpg" alt="U+2717" width="12" height="19"> can be used for “correct” and “incorrect”
respectively in the same context.</p>
<h3>CJK Symbols and Punctuation: U+3000–U+303F (update and addition)</h3>
<p>On page 155 of <i>The Unicode Standard, Version 3.0</i> update the first full
paragraph as follows:</p>
<p>This block encodes punctuation marks and symbols <strike>primarily </strike>
used by writing systems that employ Han ideographs. Most of these characters are
found in East Asian standards.</p>
<p>Add a new paragraph on page 155 of <i>The Unicode Standard, Version 3.0</i> to
follow the paragraph on U+3006: </p>
<p>U+3008, U+3009 angle brackets are unambiguously wide. The Unicode Standard
encodes different characters for use in other contexts, such as mathematics.
There are other characters in this block that have the same characteristics,
including double angle brackets, tortoise shell brackets, and white square
brackets.</p>
<h3><a name="7_2_greek">7.2 Greek</a> (revision)</h3>
<h3>Representative Glyphs for Greek Phi</h3>
<p>With Unicode 3.0 and the concurrent second edition of ISO/IEC 10646-1, the
representative glyphs for U+03C6 GREEK LETTER SMALL PHI and U+03D5 GREEK PHI SYMBOL
were swapped. In ordinary Greek text, the character U+03C6 is used exclusively,
although this characters has considerably glyphic variation, sometimes
represented with a glyph more like the representative glyph shown for U+03C6
(the “loopy” form) and less often with a glyph more like the representative
glyph shown for U+03D5 (the “straight” form).</p>
<p>For mathematical and technical use, the straight form of the small phi is an
important symbol and needs to be consistently distinguishable from the loopy
form. The straight form phi glyph is used as the representative glyph for the
symbol phi at U+03D5 to satisfy this distinction.</p>
<p>The reversed assignment of representative glyphs in versions of the Unicode
Standard prior to Unicode 3.0 had the problem that the character explicitly
identified as the mathematical symbol did not have the straight form of the
character that is the preferred glyph for that use. Furthermore, it made it
unnecessarily difficult for general purpose fonts supporting ordinary Greek text
to also add support for Greek letters used as mathematical symbols. This
resulted from the fact that many of those fonts already used the loopy form
glyph for U+03C6, as preferred for Greek body text; to support the phi symbol as
well, they would have had to disrupt glyph choices already optimized for Greek
text.</p>
<p>When mapping symbol sets or SGML entities to the Unicode Standard, it is
important to make sure that codes or entities that require the straight form of
the phi symbol be mapped to U+03D5 and not to U+03C6. Mapping to the latter
should be reserved for codes or entities that represent the small phi as used in
ordinary Greek text.</p>
<p>Fonts used primarily for Greek text may use either glyph form for U+03C6, but
fonts that also intend to support technical use of the Greek letters should use
the loopy form to ensure appropriate contrast with the straight form used for
U+03D5. </p>
<h3><a name="8_2_arabic">8.2 Arabic</a> (addition)</h3>
<p><b><i>End of Ayah. </i></b>U+06DD ARABIC END OF AYAH<i> </i>graphically
encloses a sequence of zero or more digits (of General Category Nd) that follow
it in the data stream. The enclosure terminates with any non-digit. For behavior
of a similar prefixed formatting control, see the discussion of the Syriac
Abbreviation Mark in <i>Section 8.3, Syriac</i> in <i>The Unicode Standard,
Version 3.0</i>.</p>
<h3><a name="9_15_khmer">9.15 Khmer</a> (addition)</h3>
<p><b><i>Characters Whose Use is Discouraged.</i> </b>The use of the following characters
is discouraged; they are being
considered for possible deprecation in a future version of the Standard. These
characters should be avoided in the normal representation of Khmer text:</p>
<p>17A3 KHMER INDEPENDENT VOWEL QAQ<br>
17A4 KHMER INDEPENDENT VOWEL QAA<br>
17B4 KHMER VOWEL INHERENT AQ<br>
17B5 KHMER VOWEL INHERENT AA<br>
17D3 KHMER SIGN BATHAMASAT<br>
17D8 KHMER SIGN BEYYAL</p>
<p>For transliteration of Pali/Sanskrit, U+17A2 KHMER LETTER QA is recommended instead of
U+17A3 KHMER INDEPENDENT VOWEL QAQ, and the sequence <U+17A2 KHMER LETTER QA, U+17B6
KHMER VOWEL SIGN AA> is recommended instead of
U+17A4 KHMER INDEPENDENT VOWEL QAA.</p>
<p>The use of U+17D3 KHMER SIGN BATHAMASAT is not recommended for representation of Khmer lunar dates;
a separate proposal for the full representation of Khmer lunar dates is
under development.</p>
<p>U+17D8 KHMER SIGN BEYYAL is not recommended for use in the Khmer word meaning, “etc.”.
It should be spelled out with a sequence of signs and letters instead.</p>
<p><i><b>Combined Vowels</b></i>. The Khmer language uses two dependent
vowel signs whose Unicode representation consists of a sequence of two code
points. These are <i>khmer vowel sign srak om</i>, represented by the sequence
<U+17BB KHMER VOWEL SIGN U, U+17C6 KHMER SIGN NIKAHIT> and <i>khmer vowel sign
srak aam</i>, represented by the sequence <U+17B6 KHMER VOWEL SIGN AA, U+17C6
KHMER SIGN NIKAHIT>. The <i>nikahit</i> represents the final nasalization of the
vowel, shown by the “m” in the transliteration. These dependent vowels are treated as units, for the purposes of enumeration of
the “letters” of Khmer, and most importantly for collation. Having these vowels
represented by a sequence of two Unicode code points may be unexpected for Khmer
implementers. It is important, therefore, to ensure that
these sequences are treated as units when implementing Khmer.</p>
<p><i><b>Subscript Letters.</b></i> The Unicode encoding of the Khmer script
uses an independent (and invisible) <i>coeng</i> sign to indicate that the
following consonant is subscripted, by analogy with the virama model employed
for representing conjuncts in Indian scripts. Subscripted independent vowels are
encoded in the same manner. This approach uses an artificial <i>coeng</i> sign
character which does not exist as a letter or sign in the Khmer script, and
therefore departs from the ordinary way that
Khmer is conceived of and taught to native Khmer speakers. Consequently,
the encoding may not be intuitive to a native user of the Khmer writing system. Ordinarily, the units
such as <i>khmer consonant coeng ka</i> are conceived of as independent and
unitary subscript letters, rather than as a result of conjunct formation.</p>
<p>To aid Khmer script users, a full listing of all the Khmer subscript letters
has been provided in the table, “Additional Khmer Character Names”, together with appropriate names for them which follow
preferred Khmer practice. While the Unicode encoding represents both the
subscripts and the combined vowel letters with a pair of code points, they must
be treated<i> as a unit</i> for most processing purposes. In other words they
must function as if they had been encoded as a single character. The combined
vowel characters are also included in this list, and should also be treated as a
unit in processing.</p>
<p>A full Khmer script chart is also provided, showing <i>
all</i> of the Khmer characters preferred for modern Khmer usage, including the
subscripts and combined vowels. This chart is better for didactic purposes in
representing the Khmer script and its Unicode encoding. By contrast, the main
Unicode code chart does not reflect the modern reading rules for Khmer, and
thereby can give a misleading picture of the structure of the script.</p>
<div align="center">
<center>
<table width="75%">
<caption>
<b>
<font size="3">Khmer Script Chart</font> </b>
</caption>
<tr>
<th style="text-align:center" colspan="10">Consonants</th>
</tr>
<tr>
<td width="10%" style="text-align: center">
<img alt="1780" src="images/U1780.gif" width="52" height="62"><br>
<tt>1780</tt></td>
<td width="10%" style="text-align: center">
<img alt="1781" src="images/U1781.gif" width="52" height="62"><br>
<tt>1781</tt></td>
<td width="10%" style="text-align: center">
<img alt="1782" src="images/U1782.gif" width="52" height="62"><br>
<tt>1782</tt></td>
<td width="10%" style="text-align: center">
<img alt="1783" src="images/U1783.gif" width="52" height="62"><br>
<tt>1783</tt></td>
<td width="10%" style="text-align: center">
<img alt="1784" src="images/U1784.gif" width="52" height="62"><br>
<tt>1784</tt></td>
<td width="10%" style="text-align: center">
<img alt="1785" src="images/U1785.gif" width="52" height="62"><br>
<tt>1785</tt></td>
<td width="10%" style="text-align: center">
<img alt="1786" src="images/U1786.gif" width="52" height="62"><br>
<tt>1786</tt></td>
<td width="10%" style="text-align: center">
<img alt="1787" src="images/U1787.gif" width="52" height="62"><br>
<tt>1787</tt></td>
<td width="10%" style="text-align: center">
<img alt="1788" src="images/U1788.gif" width="52" height="62"><br>
<tt>1788</tt></td>
<td width="10%" style="text-align: center">
<img alt="1789" src="images/U1789.gif" width="52" height="62"><br>
<tt>1789</tt></td>
</tr>
<tr>
<td style="text-align: center"><img alt="178A" src="images/U178A.gif" width="52" height="62"><br>
<tt>178A</tt></td>
<td style="text-align: center"><img alt="178B" src="images/U178B.gif" width="52" height="62"><br>
<tt>178B</tt></td>
<td style="text-align: center"><img alt="178C" src="images/U178C.gif" width="52" height="62"><br>
<tt>178C</tt></td>
<td style="text-align: center"><img alt="178D" src="images/U178D.gif" width="52" height="62"><br>
<tt>178D</tt></td>
<td style="text-align: center"><img alt="178E" src="images/U178E.gif" width="52" height="62"><br>
<tt>178E</tt></td>
<td style="text-align: center"><img alt="178F" src="images/U178F.gif" width="52" height="62"><br>
<tt>178F</tt></td>
<td style="text-align: center"><img alt="1790" src="images/U1790.gif" width="52" height="62"><br>
<tt>1790</tt></td>
<td style="text-align: center"><img alt="1791" src="images/U1791.gif" width="52" height="62"><br>
<tt>1791</tt></td>
<td style="text-align: center"><img alt="1792" src="images/U1792.gif" width="52" height="62"><br>
<tt>1792</tt></td>
<td style="text-align: center"><img alt="1793" src="images/U1793.gif" width="52" height="62"><br>
<tt>1793</tt></td>
</tr>
<tr>
<td style="text-align: center"><img alt="1794" src="images/U1794.gif" width="52" height="62"><br>
<tt>1794</tt></td>
<td style="text-align: center"><img alt="1795" src="images/U1795.gif" width="52" height="62"><br>
<tt>1795</tt></td>
<td style="text-align: center"><img alt="1796" src="images/U1796.gif" width="52" height="62"><br>
<tt>1796</tt></td>
<td style="text-align: center"><img alt="1797" src="images/U1797.gif" width="52" height="62"><br>
<tt>1797</tt></td>
<td style="text-align: center"><img alt="1798" src="images/U1798.gif" width="52" height="62"><br>
<tt>1798</tt></td>
<td style="text-align: center"><img alt="1799" src="images/U1799.gif" width="52" height="62"><br>
<tt>1799</tt></td>
<td style="text-align: center"><img alt="179A" src="images/U179A.gif" width="52" height="62"><br>
<tt>179A</tt></td>
<td style="text-align: center"><img alt="179B" src="images/U179B.gif" width="52" height="62"><br>
<tt>179B</tt></td>
<td style="text-align: center"><img alt="179C" src="images/U179C.gif" width="52" height="62"><br>
<tt>179C</tt></td>
<td style="text-align: center"><img alt="179D" src="images/U179D.gif" width="52" height="62"><br>
<tt>179D</tt></td>
</tr>
<tr>
<td style="text-align: center"><img alt="179E" src="images/U179E.gif" width="52" height="62"><br>
<tt>179E</tt></td>
<td style="text-align: center"><img alt="179F" src="images/U179F.gif" width="52" height="62"><br>
<tt>179F</tt></td>
<td style="text-align: center"><img alt="17A0" src="images/U17A0.gif" width="52" height="62"><br>
<tt>17A0</tt></td>
<td style="text-align: center"><img alt="17A1" src="images/U17A1.gif" width="52" height="62"><br>
<tt>17A1</tt></td>
<td style="text-align: center"><img alt="17A1" src="images/U17A2.gif" width="52" height="62"><br>
<tt>17A2</tt></td>
<td style="text-align: center"> </td>
<td style="text-align: center"> </td>
<td style="text-align: center"> </td>
<td style="text-align: center"> </td>
<td style="text-align: center"> </td>
</tr>
<tr>
<th style="text-align:center" colspan="10">Independent Vowels</th>
</tr>
<tr>
<td style="text-align: center"><img alt="17A5" src="images/U17A5.gif" width="52" height="62"><br>
<tt>17A5</tt></td>
<td style="text-align: center"><img alt="17A6" src="images/U17A6.gif" width="52" height="62"><br>
<tt>17A6</tt></td>
<td style="text-align: center"><img alt="17A7" src="images/U17A7.gif" width="52" height="62"><br>
<tt>17A7</tt></td>
<td style="text-align: center"><img alt="17A9" src="images/U17A9.gif" width="52" height="62"><br>
<tt>17A9</tt></td>
<td style="text-align: center"><img alt="17AA" src="images/U17AA.gif" width="52" height="62"><br>
<tt>17AA</tt></td>
<td style="text-align: center"><img alt="17AB" src="images/U17AB.gif" width="52" height="62"><br>
<tt>17AB</tt></td>
<td style="text-align: center"><img alt="17AC" src="images/U17AC.gif" width="52" height="62"><br>
<tt>17AC</tt></td>
<td style="text-align: center"><img alt="17AD" src="images/U17AD.gif" width="52" height="62"><br>
<tt>17AD</tt></td>
<td style="text-align: center"><img alt="17AE" src="images/U17AE.gif" width="52" height="62"><br>
<tt>17AE</tt></td>
<td style="text-align: center"><img alt="17AF" src="images/U17AF.gif" width="52" height="62"><br>
<tt>17AF</tt></td>
</tr>
<tr>
<td style="text-align: center"><img alt="17B0" src="images/U17B0.gif" width="52" height="62"><br>
<tt>17B0</tt></td>
<td style="text-align: center"><img alt="17B1" src="images/U17B1.gif" width="52" height="62"><br>
<tt>17B1</tt></td>
<td style="text-align: center"><img alt="17B3" src="images/U17B3.gif" width="52" height="62"><br>
<tt>17B3</tt></td>
<td style="text-align: center"> </td>
<td style="text-align: center"> </td>
<td style="text-align: center"> </td>
<td style="text-align: center"> </td>
<td style="text-align: center"> </td>
<td style="text-align: center"> </td>
<td style="text-align: center"> </td>
</tr>
<tr>
<th style="text-align:center" colspan="10">Dependent Vowel Signs</th>
</tr>
<tr>
<td style="text-align: center"><img alt="17B6" src="images/U17B6.gif" width="52" height="62"><br>
<tt>17B6</tt></td>
<td style="text-align: center"><img alt="17B7" src="images/U17B7.gif" width="52" height="62"><br>
<tt>17B7</tt></td>
<td style="text-align: center"><img alt="17B8" src="images/U17B8.gif" width="52" height="62"><br>
<tt>17B8</tt></td>
<td style="text-align: center"><img alt="17B9" src="images/U17B9.gif" width="52" height="62"><br>
<tt>17B9</tt></td>
<td style="text-align: center"><img alt="17BA" src="images/U17BA.gif" width="52" height="62"><br>
<tt>17BA</tt></td>
<td style="text-align: center"><img alt="17BB" src="images/U17BB.gif" width="52" height="62"><br>
<tt>17BB</tt></td>
<td style="text-align: center"><img alt="17BC" src="images/U17BC.gif" width="52" height="62"><br>
<tt>17BC</tt></td>
<td style="text-align: center"><img alt="17BD" src="images/U17BD.gif" width="52" height="62"><br>
<tt>17BD</tt></td>
<td style="text-align: center"><img alt="17BE" src="images/U17BE.gif" width="52" height="62"><br>
<tt>17BE</tt></td>
<td style="text-align: center"><img alt="17BF" src="images/U17BF.gif" width="52" height="62"><br>
<tt>17BF</tt></td>
</tr>
<tr>
<td style="text-align: center"><img alt="17C0" src="images/U17C0.gif" width="52" height="62"><br>
<tt>17C0</tt></td>
<td style="text-align: center"><img alt="17C1" src="images/U17C1.gif" width="52" height="62"><br>
<tt>17C1</tt></td>
<td style="text-align: center"><img alt="17C2" src="images/U17C2.gif" width="52" height="62"><br>
<tt>17C2</tt></td>
<td style="text-align: center"><img alt="17C3" src="images/U17C3.gif" width="52" height="62"><br>
<tt>17C3</tt></td>
<td style="text-align: center"><img alt="17C4" src="images/U17C4.gif" width="52" height="62"><br>
<tt>17C4</tt></td>
<td style="text-align: center"><img alt="17C5" src="images/U17C5.gif" width="52" height="62"><br>
<tt>17C5</tt></td>
<td style="text-align: center">
<img alt="17BB 17C6" src="images/17BB17C6.gif" width="52" height="62"><br>
<tt>17BB<br>
17C6</tt></td>
<td style="text-align: center"><img alt="17C6" src="images/U17C6.gif" width="52" height="62"><br>
<tt>17C6</tt></td>
<td style="text-align: center">
<img alt="17B6 17C6" src="images/17B617C6.gif" width="52" height="62"><br>
<tt>17B6<br>
17C6</tt></td>
<td style="text-align: center"><img alt="17C7" src="images/U17C7.gif" width="52" height="62"><br>
<tt>17C7</tt></td>
</tr>
<tr>
<th style="text-align:center" colspan="10">Subscript Characters</th>
</tr>
<tr>
<td style="text-align: center">
<img alt="17D2 1780" src="images/17D21780.gif" width="58" height="70"><br>
<tt>17D2<br>
1780</tt></td>
<td style="text-align: center">
<img alt="17D2 1781" src="images/17D21781.gif" width="58" height="70"><br>
<tt>17D2<br>
1781</tt></td>
<td style="text-align: center">
<img alt="17D2 1782" src="images/17D21782.gif" width="58" height="70"><br>
<tt>17D2<br>
1782</tt></td>
<td style="text-align: center">
<img alt="17D2 1783" src="images/17D21783.gif" width="58" height="70"><br>
<tt>17D2<br>
1783</tt></td>
<td style="text-align: center">
<img alt="17D2 1784" src="images/17D21784.gif" width="58" height="70"><br>
<tt>17D2<br>
1784</tt></td>
<td style="text-align: center">
<img alt="17D2 1785" src="images/17D21785.gif" width="58" height="70"><br>
<tt>17D2<br>
1785</tt></td>
<td style="text-align: center">
<img alt="17D2 1786" src="images/17D21786.gif" width="58" height="70"><br>
<tt>17D2<br>
1786</tt></td>
<td style="text-align: center">
<img alt="17D2 1787" src="images/17D21787.gif" width="58" height="70"><br>
<tt>17D2<br>
1787</tt></td>
<td style="text-align: center">
<img alt="17D2 1788" src="images/17D21788.gif" width="58" height="70"><br>
<tt>17D2<br>
1788</tt></td>
<td style="text-align: center">
<img alt="17D2 1789" src="images/17D21789.gif" width="58" height="70"><br>
<tt>17D2<br>
1789</tt></td>
</tr>
<tr>
<td style="text-align: center">
<img alt="17D2 178A" src="images/17D2178A.gif" width="58" height="70"><br>
<tt>17D2<br>
178A</tt></td>
<td style="text-align: center">
<img alt="17D2 178B" src="images/17D2178B.gif" width="58" height="70"><br>
<tt>17D2<br>
178B</tt></td>
<td style="text-align: center">
<img alt="17D2 178C" src="images/17D2178C.gif" width="58" height="70"><br>
<tt>17D2<br>
178C</tt></td>
<td style="text-align: center">
<img alt="17D2 178D" src="images/17D2178D.gif" width="58" height="70"><br>
<tt>17D2<br>
178D</tt></td>
<td style="text-align: center">
<img alt="17D2 178E" src="images/17D2178E.gif" width="58" height="70"><br>
<tt>17D2<br>
178E</tt></td>
<td style="text-align: center">
<img alt="17D2 178F" src="images/17D2178F.gif" width="58" height="70"><br>
<tt>17D2<br>
178F</tt></td>
<td style="text-align: center">
<img alt="17D2 1790" src="images/17D21790.gif" width="58" height="70"><br>
<tt>17D2<br>
1790</tt></td>
<td style="text-align: center">
<img alt="17D2 1791" src="images/17D21791.gif" width="58" height="70"><br>
<tt>17D2<br>
1791</tt></td>
<td style="text-align: center">
<img alt="17D2 1792" src="images/17D21792.gif" width="58" height="70"><br>
<tt>17D2<br>
1792</tt></td>
<td style="text-align: center">
<img alt="17D2 1793" src="images/17D21793.gif" width="58" height="70"><br>
<tt>17D2<br>
1793</tt></td>
</tr>
<tr>
<td style="text-align: center">
<img alt="17D2 1794" src="images/17D21794.gif" width="58" height="70"><br>
<tt>17D2<br>
1794</tt></td>
<td style="text-align: center">
<img alt="17D2 1795" src="images/17D21795.gif" width="58" height="70"><br>
<tt>17D2<br>
1795</tt></td>
<td style="text-align: center">
<img alt="17D2 1796" src="images/17D21796.gif" width="58" height="70"><br>
<tt>17D2<br>
1796</tt></td>
<td style="text-align: center">
<img alt="17D2 1797" src="images/17D21797.gif" width="58" height="70"><br>
<tt>17D2<br>
1797</tt></td>
<td style="text-align: center">
<img alt="17D2 1798" src="images/17D21798.gif" width="58" height="70"><br>
<tt>17D2<br>
1798</tt></td>
<td style="text-align: center">
<img alt="17D2 1799" src="images/17D21799.gif" width="58" height="70"><br>
<tt>17D2<br>
1799</tt></td>
<td style="text-align: center">
<img alt="17D2 179A" src="images/17D2179A.gif" width="58" height="70"><br>
<tt>17D2<br>
179A</tt></td>
<td style="text-align: center">
<img alt="17D2 179B" src="images/17D2179B.gif" width="58" height="70"><br>
<tt>17D2<br>
179B</tt></td>
<td style="text-align: center">
<img alt="17D2 179C" src="images/17D2179C.gif" width="58" height="70"><br>
<tt>17D2<br>
179C</tt></td>
<td style="text-align: center">
<img alt="17D2 179D" src="images/17D2179D.gif" width="58" height="70"><br>
<tt>17D2<br>
179D</tt></td>
</tr>
<tr>
<td style="text-align: center">
<img alt="17D2 179E" src="images/17D2179E.gif" width="58" height="70"><br>
<tt>17D2<br>
179E</tt></td>
<td style="text-align: center">
<img alt="17D2 179F" src="images/17D2179F.gif" width="58" height="70"><br>
<tt>17D2<br>
179F</tt></td>
<td style="text-align: center">
<img alt="17D2 17A0" src="images/17D217A0.gif" width="58" height="70"><br>
<tt>17D2<br>
17A0</tt></td>
<td style="text-align: center">
<img alt="17D2 17A2" src="images/17D217A2.gif" width="58" height="70"><br>
<tt>17D2<br>
17A2</tt></td>
<td style="text-align: center">
<img alt="17D2 17A7" src="images/17D217A7.gif" width="58" height="70"><br>
<tt>17D2<br>
17A7</tt></td>
<td style="text-align: center">
<img alt="17D2 17AB" src="images/17D217AB.gif" width="58" height="70"><br>
<tt>17D2<br>
17AB</tt></td>
<td style="text-align: center">
<img alt="17D2 17AF" src="images/17D217AF.gif" width="58" height="70"><br>
<tt>17D2<br>
17AF</tt></td>
<td style="text-align: center"> </td>
<td style="text-align: center"> </td>
<td style="text-align: center"> </td>
</tr>
<tr>
<th style="text-align:center" colspan="10">Various Signs</th>
</tr>
<tr>
<td style="text-align: center"><img alt="17C8" src="images/U17C8.gif" width="52" height="62"><br>
<tt>17C8</tt></td>
<td style="text-align: center"><img alt="17CB" src="images/U17CB.gif" width="52" height="62"><br>
<tt>17CB</tt></td>
<td style="text-align: center"><img alt="17CC" src="images/U17CC.gif" width="52" height="62"><br>
<tt>17CC</tt></td>
<td style="text-align: center"><img alt="17CD" src="images/U17CD.gif" width="52" height="62"><br>
<tt>17CD</tt></td>
<td style="text-align: center"><img alt="17CE" src="images/U17CE.gif" width="52" height="62"><br>
<tt>17CE</tt></td>
<td style="text-align: center"><img alt="17CF" src="images/U17CF.gif" width="52" height="62"><br>
<tt>17CF</tt></td>
<td style="text-align: center"><img alt="17D0" src="images/U17D0.gif" width="52" height="62"><br>
<tt>17D0</tt></td>
<td style="text-align: center"><img alt="17D1" src="images/U17D1.gif" width="52" height="62"><br>
<tt>17D1</tt></td>
<td style="text-align: center"><img alt="17D4" src="images/U17D4.gif" width="52" height="62"><br>
<tt>17D4</tt></td>
<td style="text-align: center"><img alt="17D5" src="images/U17D5.gif" width="52" height="62"><br>
<tt>17D5</tt></td>
</tr>
<tr>
<td style="text-align: center"><img alt="17D6" src="images/U17D6.gif" width="52" height="62"><br>
<tt>17D6</tt></td>
<td style="text-align: center"><img alt="17D7" src="images/U17D7.gif" width="52" height="62"><br>
<tt>17D7</tt></td>
<td style="text-align: center"><img alt="17D9" src="images/U17D9.gif" width="52" height="62"><br>
<tt>17D9</tt></td>
<td style="text-align: center"><img alt="17DA" src="images/U17DA.gif" width="52" height="62"><br>
<tt>17DA</tt></td>
<td style="text-align: center"><img alt="17DC" src="images/U17DC.gif" width="52" height="62"><br>
<tt>17DC</tt></td>
<td style="text-align: center"><img alt="17DB" src="images/U17DB.gif" width="52" height="62"><br>
<tt>17DB</tt></td>
<td style="text-align: center"><img alt="17C9" src="images/U17C9.gif" width="52" height="62"><br>
<tt>17C9</tt></td>
<td style="text-align: center"><img alt="17CA" src="images/U17CA.gif" width="52" height="62"><br>
<tt>17CA</tt></td>
<td style="text-align: center"> </td>
<td style="text-align: center"> </td>
</tr>
<tr>
<th style="text-align:center" colspan="10">Digits</th>
</tr>
<tr>
<td style="text-align: center"><img alt="17E0" src="images/U17E0.gif" width="52" height="62"><br>
<tt>17E0</tt></td>
<td style="text-align: center"><img alt="17E1" src="images/U17E1.gif" width="52" height="62"><br>
<tt>17E1</tt></td>
<td style="text-align: center"><img alt="17E2" src="images/U17E2.gif" width="52" height="62"><br>
<tt>17E2</tt></td>
<td style="text-align: center"><img alt="17E3" src="images/U17E3.gif" width="52" height="62"><br>
<tt>17E3</tt></td>
<td style="text-align: center"><img alt="17E4" src="images/U17E4.gif" width="52" height="62"><br>
<tt>17E4</tt></td>
<td style="text-align: center"><img alt="17E5" src="images/U17E5.gif" width="52" height="62"><br>
<tt>17E5</tt></td>
<td style="text-align: center"><img alt="17E6" src="images/U17E6.gif" width="52" height="62"><br>
<tt>17E6</tt></td>
<td style="text-align: center"><img alt="17E7" src="images/U17E7.gif" width="52" height="62"><br>
<tt>17E7</tt></td>
<td style="text-align: center"><img alt="17E8" src="images/U17E8.gif" width="52" height="62"><br>
<tt>17E8</tt></td>
<td style="text-align: center"><img alt="17E9" src="images/U17E9.gif" width="52" height="62"><br>
<tt>17E9</tt></td>
</tr>
</table>
</center>
</div>
<p> </p>
<center>
<b>Additional Khmer Character Names</b>
<table>
<tr><td align="center" style="vertical-align: middle"><b>Glyph</b></td>
<td align="center" style="vertical-align: middle"><b>Code</b></td>
<td align="center" style="vertical-align: middle"><b>Name</b></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17BB17C6.gif" alt="17BB,17C6" width="52" height="62"></td>
<td style="vertical-align: middle">17BB 17C6</td>
<td style="vertical-align: middle"><i>khmer vowel sign srak om</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17B617C6.gif" alt="17B6,17C6" width="52" height="62"></td>
<td style="vertical-align: middle">17B6 17C6</td>
<td style="vertical-align: middle"><i>khmer vowel sign srak am</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D21780.gif" alt="17D2,1780" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 1780</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng ka</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D21781.gif" alt="17D2,1781" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 1781</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng kha</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D21782.gif" alt="17D2,1782" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 1782</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng ko</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D21783.gif" alt="17D2,1783" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 1783</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng kho</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D21784.gif" alt="17D2,1784" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 1784</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng ngo</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D21785.gif" alt="17D2,1785" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 1785</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng ca</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D21786.gif" alt="17D2,1786" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 1786</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng cha</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D21787.gif" alt="17D2,1787" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 1787</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng co</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D21788.gif" alt="17D2,1788" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 1788</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng cho</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D21789.gif" alt="17D2,1789" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 1789</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng nyo</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D2178A.gif" alt="17D2,178A" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 178A</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng da</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D2178B.gif" alt="17D2,178B" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 178B</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng ttha</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D2178C.gif" alt="17D2,178C" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 178C</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng do</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D2178D.gif" alt="17D2,178D" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 178D</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng ttho</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D2178E.gif" alt="17D2,178E" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 178E</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng na</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D2178F.gif" alt="17D2,178F" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 178F</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng ta</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D21790.gif" alt="17D2,1790" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 1790</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng tha</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D21791.gif" alt="17D2,1791" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 1791</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng to</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D21792.gif" alt="17D2,1792" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 1792</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng tho</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D21793.gif" alt="17D2,1793" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 1793</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng no</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D21794.gif" alt="17D2,1794" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 1794</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng ba</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D21795.gif" alt="17D2,1795" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 1795</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng pha</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D21796.gif" alt="17D2,1796" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 1796</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng po</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D21797.gif" alt="17D2,1797" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 1797</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng pho</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D21798.gif" alt="17D2,1798" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 1798</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng mo</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D21799.gif" alt="17D2,1799" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 1799</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng yo</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D2179A.gif" alt="17D2,179A" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 179A</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng ro</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D2179B.gif" alt="17D2,179B" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 179B</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng lo</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D2179C.gif" alt="17D2,179C" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 179C</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng vo</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D2179D.gif" alt="17D2,179D" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 179D</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng sha</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D2179E.gif" alt="17D2,179E" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 179E</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng ssa</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D2179F.gif" alt="17D2,179F" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 179F</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng sa</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D217A0.gif" alt="17D2,17A0" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 17A0</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng ha</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D217A2.gif" alt="17D2,17A2" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 17A2</td>
<td style="vertical-align: middle"><i>khmer consonant sign coeng qa</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D217A7.gif" alt="17D2,17A7" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 17A7</td>
<td style="vertical-align: middle"><i>khmer vowel sign coeng qu</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D217AB.gif" alt="17D2,17AB" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 17AB</td>
<td style="vertical-align: middle"><i>khmer vowel sign coeng ry</i></td></tr>
<tr><td style="vertical-align: middle">
<img src="images/17D217AF.gif" alt="17D2,17AF" width="58" height="70"></td>
<td style="vertical-align: middle">17D2 17AF</td>
<td style="vertical-align: middle"><i>khmer vowel sign coeng qe</i></td></tr>
</table>
</center>
<p> </p>
<h3><a name="9_16_philippine_scripts">9.16 Philippine Scripts</a> (new section) </h3>
<h3>Tagalog: U+1700..U+171F<br>
Hanunóo: U+1720..U+173F<br>
Buhid: U+1740..U+175F<br>
Tagbanwa: U+1760..U+177F</h3>
<p>The first of these four scripts, Tagalog, is no longer used, although the
other three, Hanunóo, Buhid, and Tagbanwa, are living scripts of the
Philippines. South Indian scripts of the Pallava dynasty made their way to the
Philippines, although the exact route is uncertain. They may have been
transported by way of the Kavi scripts of Western Java between the 10th and 14th
centuries CE. </p>
<p>There are written accounts of the Tagalog script by Spanish missionaries, and
documents in Tagalog dating from the mid-1500s. The first book in this script
was printed in Manila in 1593. While the Tagalog script was used to write
Tagalog, Bisaya, Ilocano, and other languages, it fell out of normal use by the
mid-1700s; modern Tagalog language is now written in the Latin script. </p>
<p>The three living scripts, Hanunóo, Buhid, and Tagbanwa, are related to
Tagalog, but may not be directly descended from it. The Hanunóo and the Buhid
peoples live in Mindoro, while the Tagbanwa live in Palawan. Hanunóo enjoys the
most use; it is widely used to write love poetry, a popular pastime among the
Hanunóo. Tagbanwa is less used.</p>
<h3>Principles of the Scripts</h3>
<p>The Philippine scripts share features with the other Brahmi-derived scripts
to which they are related.</p>
<p><i><b>Consonant Letters.</b></i> Philippine scripts have consonants
containing an inherent <i>-a</i> vowel, which may be modified by the addition of
vowel signs or canceled (killed) by the use of a virama-type mark.</p>
<p><b><i>Independent Vowel Letters.</i></b> Philippine scripts have null
consonants which are used to write syllables that start with a vowel.</p>
<p><i><b>Dependent Vowel Signs.</b></i> The vowel <i>-i</i> is written with a
mark above the associated consonant, and the vowel <i>-u</i> with an identical
mark below. The mark is known in Tagalog as <i>kudlit </i>“diacritic,” <i>tuldik</i>
“accent,” or <i>tildok</i> “dot,” and <i>ulitan</i> “diacritic” in Tagbanwa. The
Philippine scripts employ only the two vowel signs <i>i</i> and <i>u</i>, which
are also used to stand for the vowels <i>e</i> and <i>o</i> respectively.</p>
<p><i><b>Virama.</b></i> Though all languages normally written with the
Philippine scripts have syllables ending in consonants, not all of the scripts
have a mechanism for expressing the canceled <i>-a</i>. As a result, in those
orthographies, the final consonants are unexpressed. Francisco Lopez introduced
a cross-shaped <i>virama</i> in his 1620 catechism in the Ilocano language, but
this innovation did not seem to find favor with native users, who seem to have
considered the script adequate without it (they preferred
<img src="kakapi-1.jpg" alt="image for kakapi" width="52" height="14"> <i>kakapi</i> to
<img src="kakampi-2.jpg" alt="image for kakampi" width="68" height="14"> <i>kakampi</i>). A similar
reform for the Hanunóo script seems to have been better received. The Hanunóo <i>
pamudpod</i> was devised by Antoon Postma, who went to the Philippines from the
Netherlands in the mid-1950s. In traditional orthography,
<img src="si-apu-1.jpg" alt="image for si apu ba upada" width="116" height="17"> <i>si apu ba upada</i>
is, with the <i>pamudpod</i>, rendered more accurately as
<img src="si-aypud-2.jpg" alt="image for si aypud bay upadan" width="205" height="20"> <i>si aypud bay
upadan</i>; the Hanunóo pronunciation is <i>si aypod bay upadan</i>. The Tagalog
<i>virama</i> and Hanunóo <i>pamudpod</i> cancel only the inherent <i>-a</i>. No
conjunct consonants are employed in the Philippine scripts.</p>
<p><i><b>Directionality.</b></i> The Philippine scripts are read from left to
right in horizontal lines running from top to bottom. They may be written or
carved either in that manner, or in vertical lines running from bottom to top,
moving from left to right. In the latter case, the letters are written sideways
so they may be read horizontally. This method of writing is probably due to the
medium and writing implements used. Text is often scratched with a sharp
instrument onto beaten strips of bamboo which are held pointing away from the
body and worked from the proximal to distal ends, in columns from left to right.</p>
<p><i><b>Rendering.</b></i> In Tagalog and Tagbanwa, the vowel signs simply rest
over or under the consonants. In Hanunóo and Buhid, however, special ligatures
are often formed as shown in the following tables.</p>
<center>
<table style="page-break-before:always; border-collapse:collapse" class="noborder" cellpadding="0" cellspacing="0"> <tr>
<td class="noborder">
<p align="center"><b>Hanunóo</b></td>
<td class="noborder">
<p align="center"><b>Buhid</b></td>
</tr>
<tr>
<td class="noborder">
<img border="0" src="phil1.jpg" alt="Table for Hanunoo" width="240" height="448"></td>
<td class="noborder"><img border="0" src="phil2.jpg" alt="Table for Buhid" width="240" height="448"></td>
</tr>
</table>
</center>
<p><i><b>Punctuation.</b></i> Punctuation has been unified for the Philippine
scripts. In the Hanunóo block, U+1735 PHILIPPINE SINGLE PUNCTUATION and U+1736
PHILIPPINE DOUBLE PUNCTUATION are encoded. Tagalog makes use only of the latter;
Hanunóo, Buhid, and Tagbanwa make use of both of them. </p>
<h3><a name="10_1_han">10.1 Han</a> (addition)</h3>
<h3>CJK Compatibility Ideographs (addition) </h3>
<p>Unicode 3.2 adds 59 new ideographs to the Compatibility Ideographs block.
These new compatibility ideographs are found from U+FA30 to U+FA6A. They are
included in the Unicode Standard to provide full round-trip compatibility with
the ideographic repertoire of JIS X 0213:2000 and should not be used for any
other purpose.</p>
<h3><a name="10_3_katakana">10.3 Katakana</a> (addition)</h3>
<h3>Katakana Phonetic Extensions (addition) </h3>
<p>Katakana Phonetic Extensions: U+31F0..U+31FF</p>
<p>These extensions to the Katakana syllabary are all “small” variants. They are
used in Japan for phonetic transcription of Ainu and other languages.</p>
<h3><a name="10_4_hangul">10.4 Hangul</a> (addition) </h3>
<h3>Hangul Compatibility Jamo</h3>
<p>When Hangul compatibility jamo are transformed with a compatibility
normalization form, NFKD or NFKC, the characters are converted to the
corresponding conjoining jamo characters. Where the characters are intended to
remain in separate syllables after such transformation, they may require
separation from adjacent characters. This can be done by inserting any
non-Korean character.</p>
<ul>
<li>U+200B ZERO-WIDTH SPACE is recommended where the characters are to allow
line-break.</li>
<li>U+2060 WORD JOINER can be used where the characters are not to break
across lines.</li>
</ul>
<p>For example, the table below illustrates how two Hangul compatibility jamo
can be separated in display, even after transforming with NFKD or NFKC.</p>
<center>
<table border="1" cellspacing="0" cellpadding="4">
<caption><b>Separating Jamo Characters</b></caption>
<tr>
<th width="25%" style="text-align: center">Original</th>
<th width="25%" style="text-align: center"> NFKD</th>
<th width="25%" style="text-align: center"> NFKC</th>
<th width="25%" style="text-align: center">Display</th>
</tr>
<tr>
<td class="n">
<table>
<tr>
<td class="q">
<img src="http://www.unicode.org/cgi-bin/refglyph?24-3131" alt="U+3131"><br>
<tt class="n">3131</tt></td>
<td class="q">
<img src="http://www.unicode.org/cgi-bin/refglyph?24-314F" alt="U+314F"><br>
<tt class="n">314F</tt></td>
</tr>
</table>
</td>
<td class="n">
<table>
<tr>
<td class="q">
<img src="http://www.unicode.org/cgi-bin/refglyph?24-1100" alt="U+1100"><br>
<tt class="n">1100</tt></td>
<td class="q">
<img src="http://www.unicode.org/cgi-bin/refglyph?24-1161" alt="U+1161"><br>
<tt class="n">1161</tt></td>
</tr>
</table>
</td>
<td class="n">
<table>
<tr>
<td class="q">
<img src="http://www.unicode.org/cgi-bin/refglyph?24-AC00" alt="U+AC00"><br>
<tt class="n">AC00</tt></td>
</tr>
</table>
</td>
<td class="n"><img src="http://www.unicode.org/cgi-bin/refglyph?24-AC00"
alt="Glyph for U+AC00"></td>
</tr>
<tr>
<td class="n">
<table>
<tr>
<td class="q">
<img src="http://www.unicode.org/cgi-bin/refglyph?24-3131" alt="U+3131"><br>
<tt class="n">3131</tt></td>
<td class="q">
<img src="http://www.unicode.org/cgi-bin/refglyph?24-200B" alt="U+200B"><br>
<tt class="n">200B</tt></td>
<td class="q">
<img src="http://www.unicode.org/cgi-bin/refglyph?24-314F" alt="U+314F"><br>
<tt class="n">314F</tt></td>
</tr>
</table>
</td>
<td class="n">
<table>
<tr>
<td class="q">
<img src="http://www.unicode.org/cgi-bin/refglyph?24-1100" alt="U+1100"><br>
<tt class="n">1100</tt></td>
<td class="q">
<img src="http://www.unicode.org/cgi-bin/refglyph?24-200B" alt="U+200B"><br>
<tt class="n">200B</tt></td>
<td class="q">
<img src="http://www.unicode.org/cgi-bin/refglyph?24-1161" alt="U+1161"><br>
<tt class="n">1161</tt></td>
</tr>
</table>
</td>
<td class="n">
<table>
<tr>
<td class="q">
<img src="http://www.unicode.org/cgi-bin/refglyph?24-1100" alt="U+1100"><br>
<tt class="n">1100</tt></td>
<td class="q">
<img src="http://www.unicode.org/cgi-bin/refglyph?24-200B" alt="U+200B"><br>
<tt class="n">200B</tt></td>
<td class="q">
<img src="http://www.unicode.org/cgi-bin/refglyph?24-1161" alt="U+1161"><br>
<tt class="n">1161</tt></td>
</tr>
</table>
</td>
<td class="n"><img src="http://www.unicode.org/cgi-bin/refglyph?24-3131"
alt="Glyph for U+3131"><img
src="http://www.unicode.org/cgi-bin/refglyph?24-314F"
alt="Glyph for U+314F"></td>
</tr>
</table>
</center>
<p><br>
</p>
<h3><a name="11_4_mongolian">11.4 Mongolian</a> (addition)</h3>
<h3>Standardized Variants of Mongolian Characters (addition) </h3>
<p>Like Arabic letters, Mongolian letters have various presentation forms
depending on their positions in words. There are additional linguistic
constraints that result in variations that must be employed in specific
contexts, creating the need for several Mongolian-specific variant selectors,
which are encoded at U+180B, U+180C, and U+180D.</p>
<p>The table of standardized variants in the Unicode Character Database found at
<a href="http://www.unicode.org/Public/3.2-Update/StandardizedVariants-3.2.0.html">
http://www.unicode.org/Public/3.2-Update/StandardizedVariants-3.2.0.html</a>
provides a description of the variant appearances corresponding to the use of
appropriate variation selectors with all allowed base Mongolian characters. Only
some presentation forms of the base Mongolian characters used with the Mongolian
free variation selectors produce variant appearances. These combinations are
exhaustively listed and described in the table. All combinations not listed in
the table are unspecified and are reserved for future standardization; no
conformant process may interpret them as standardized variants.</p>
<p>For more information, see <i><a href="#13_7_variation_selectors">Section
13.7, Variation Selectors</a></i>, later in this document.</p>
<h3><a name="12_4_mathematical_operators">12.4 Mathematical Operators</a>
(additions)</h3>
<p>In addition to the symbols in these blocks, mathematical and scientific
notation makes frequent use of arrows, punctuation characters, letterlike
symbols, geometrical shapes and other miscellaneous and technical symbols. For
additional information on all the mathematical operators and other symbols, see
<a href="http://www.unicode.org/reports/tr25/">Proposed Draft Unicode Technical Report #25, “Unicode Support
for Mathematics.”</a></p>
<p>Other symbols used in mathematical and scientific notation can be found in
the Geometric Shapes block. For an extensive discussion of mathematical
alphanumeric symbols, see <i>Section 12.2, Letterlike Symbols</i> in <i>The
Unicode Standard, Version 3.0</i>. For additional information on all the
mathematical operators and other symbols, see <a href="http://www.unicode.org/reports/tr25/">Proposed Draft
Unicode Technical Report #25, “Unicode Support for Mathematics.”</a></p>
<h3>Supplements to Mathematical Operators and Arrows</h3>
<p>The Unicode Standard defines a number of additional blocks to supplement the
repertoire of mathematical operators and arrows. These additions are intended to
extend the Unicode repertoire sufficiently to cover the needs of such
applications as MathML, modern mathematical formula editing and presentation
software, and symbolic algebra systems.</p>
<p><i><b>Standards.</b></i> MathML, an XML application, is intended to support
the full legacy collection of the ISO mathematical entity sets. Accordingly, the
repertoire of mathematical symbols for the Unicode Standard has been
supplemented by the full list of mathematical entity sets in ISO TR 9573-13, <i>
Public entity sets for mathematics and science</i>. Additional repertoire was
provided from the amalgamated collection of the STIX Project (Scientific and
Technical Information Exchange). That collection includes, but is not limited
to, symbols gleaned from mathematical publications by experts of the American
Mathematical Society and symbol sets provided by Elsevier Publishing and by the
American Physical Society.</p>
<p><i><b>Semantics.</b></i> The same mathematical symbol may have different
meanings in different subdisciplines or different contexts. The Unicode Standard
only encodes a single character for a single symbolic form. For example, the “+”
symbol normally denotes addition in a mathematical context, but might refer to
concatenation in a computer science context dealing with strings, or
incrementation, or have any number of other functions in given contexts. It is
up to the application to distinguish such meanings according to the appropriate
context. Where information is available about the usage (or usages) of
particular symbols, it has been indicated in the character annotations in
<i>Chapter 14, Code Charts</i> in <i>The Unicode Standard, Version 3.0</i>.</p>
<h3>Supplemental Mathematical Operators: U+2A00–U+2AFF</h3>
<p>This block contains many additional symbols to supplement the collection of
mathematical operators.</p>
<h3>Miscellaneous Mathematical Symbols-A: U+27C0–U+27EF</h3>
<p>This block contains symbols used mostly as operators or delimiters in
mathematical notation.</p>
<p><i><b>Mathematical Brackets.</b></i> The mathematical white square brackets,
angle brackets, and double angle brackets encoded at U+27E6..U+27EB are intended
for ordinary mathematical use of these particular bracket types. They are
unambiguously narrow, for use in mathematical and scientific notation, and
should be distinguished from the corresponding wide forms of white square
brackets, angle brackets, and double angle brackets used in CJK typography. (See
the CJK Symbols and Punctuation block.) Note especially that the “bra” and “ket”
angle brackets, U+2329 LEFT-POINTING ANGLE BRACKET and U+232A RIGHT-POINTING
ANGLE BRACKET, are now deprecated for use with mathematics because of their
canonical equivalence to CJK angle brackets, which is likely to result in
unintended spacing problems if used in mathematical formulae.</p>
<h3>Miscellaneous Mathematical Symbols-B: U+2980–U+29FF</h3>
<p>This block contains miscellaneous symbols used for mathematical notation,
including fences and other delimiters. Some of the symbols in this block may
also be used as operators in some contexts.</p>
<p><b><i>Wiggly Fence</i></b>. U+29DB LEFT WIGGLY FENCE has a superficial
similarity to U+FE34 PRESENTATION FORM FOR VERTICAL LOW LINE. The latter is a
wiggly sidebar character, intended for legacy support as an style of underlining
character in a vertical text layout context; it has a compatibility mapping to
U+005F LOW LINE. This represents a very different usage from the standard use of
fence characters in mathematical notation.</p>
<h3>Supplemental Arrows-A: U+27F0–U+27FF</h3>
<p>This block contains a small additional set of arrows to supplement the main
set in the Arrows block.</p>
<p><i><b>Long Arrows.</b></i> The long arrows encoded in the range
U+27F5..U+27FF map to standard SGML entity sets supported by MathML. Long arrows
represent distinct semantics from their short counterparts, rather than mere
stylistic glyph differences. For example, the shorter forms of arrows are often
used in connection with limits, whereas the longer ones are associated with
mappings. The use of the long arrows is so common that they were assigned entity
names in the ISOAMSA entity set, one of the suite of mathematical symbol entity
sets covered by the Unicode Standard.</p>
<h3>Supplemental Arrows-B:U+2900–U+297F</h3>
<p>This block contains a large additional repertoire of arrows to round out the
main set in the Arrows block.</p>
<h3><a name="12_5_technical_symbols">12.5 Technical Symbols</a> (additions)</h3>
<h3>Miscellaneous Technical: U+2300-U+23FF (additions)</h3>
<p><b><i>Keytop Labels.</i></b> [to precede “Crops and Quine Corners”] Where
possible, keytop labels have been unified with other symbols of like appearance,
for example U+21E7 UPWARDS WHITE ARROW to indicate the shift key. While symbols
such as U+2318 PLACE OF INTEREST SIGN and U+2388 HELM SYMBOL are generic symbols
that have been adapted to use on keytops, other symbols specifically follow ISO/IEC
9995-7.</p>
<p><b><i>Angle Brackets.</i></b> [to follow “Crops and Quine Corners”] U+2329
LEFT-POINTING ANGLE BRACKET and U+232A RIGHT-POINTING ANGLE BRACKET have long
been canonically equivalent to the CJK punctuation characters, U+3008 LEFT ANGLE
BRACKET and U+3009 RIGHT ANGLE BRACKET, respectively. This canonical equivalence
implies that the use of the latter (CJK) code points is preferred, and that
U+2329 and U+232A are also “wide” characters. (See <a href="http://www.unicode.org/reports/tr11/"><i>Unicode
Standard Annex #11, “East Asian Width</i></a><a href="http://www.unicode.org/reports/tr25/">”</a>, for the
definition of the East Asian wide property.) Because of this fact, the use of
U+2329 and U+232A is deprecated for mathematics and technical publication, where
the wide property of the characters has the potential for interfering with
proper formatting of mathematical formulae. Instead, use the angle brackets
specifically provided for mathematics: U+27E8 MATHEMATICAL LEFT ANGLE BRACKET
and U+27E9 MATHEMATICAL RIGHT ANGLE BRACKET. See <i>
<a href="#12_4_mathematical_operators">Section 12.4, Mathematical Operators</a>
</i>earlier in this document<i>.</i></p>
<p><i><b>Symbol Pieces.</b></i> [to follow “APL Functional Symbols”] The
characters in the range U+239B..U+23B3, plus U+23B7, comprise a set of bracket
and other symbol fragments for use in mathematical typesetting. These pieces
originated in older font standards, but have been used in past mathematical
processing as characters in their own right to make up extra-tall glyphs for
enclosing multi-line mathematical formulae. Mathematical fences are ordinarily
sized to the content that they enclose. However, in creating a large fence, the
glyph is not scaled proportionally; in particular the displayed stem weights
must remain compatible with the accompanying smaller characters. Thus, simple
scaling of font outlines cannot be used to create tall brackets. Instead, a
common technique is to build up the symbol from pieces. In particular, the
characters U+239B LEFT PARENTHESIS UPPER HOOK through U+23B3 SUMMATION BOTTOM
represent a set of glyph pieces for building up large versions of the fences (,
), [, ], {, and }, and of the large operators ∑ and ∫. These brace and operator
pieces are compatibility characters. They should not be used in stored
mathematical text, but are often used in the data stream created by display and
print drivers.</p>
<p>The following table shows which pieces are intended to be used together to
create specific symbols.</p>
<p align="center"><b>Use of Symbol Pieces</b></p>
<div align="center">
<table border="2" cellpadding="2" cellspacing="0">
<tr>
<td> </td>
<td>2-row</td>
<td>3-row</td>
<td>5-row</td>
</tr>
<tr>
<td>Summation </td>
<td>23B2, 23B3 </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td>Integral</td>
<td>2320, 2321</td>
<td>2320, 23AE, 2321</td>
<td>2320, 3×23AE, 2321</td>
</tr>
<tr>
<td>Left Parenthesis</td>
<td>239B, 239D</td>
<td>239B, 239D</td>
<td>239B, 3×239C, 239D</td>
</tr>
<tr>
<td>Right Parenthesis</td>
<td>239E, 23A0</td>
<td>239E, 239F, 23A0</td>
<td>239E, 3×239F, 23A0</td>
</tr>
<tr>
<td>Left Bracket </td>
<td>23A1, 23A3</td>
<td>23A1, 23A2, 23A4 </td>
<td>23A1, 3×23A2, 23A3</td>
</tr>
<tr>
<td>Right Bracket </td>
<td>23A4, 23A6</td>
<td>23A4, 23A5, 23A6</td>
<td>
<p align="left">23A4, 3×23A5, 23A6</td>
</tr>
<tr>
<td>Left Brace</td>
<td>23B0, 23B1</td>
<td>23A7, 23A8, 2389</td>
<td>23A7, 23AA, 23A8, 23AA, 2389</td>
</tr>
<tr>
<td>Right Brace </td>
<td>23B1, 23B0</td>
<td>23AB, 23AC, 23AD</td>
<td>23AB, 23AA, 23AC, 23AA, 23AD</td>
</tr>
</table>
</div>
<p>For example, an instance of U+239B can be positioned relative to instances of
U+239C and U+239D to form an extra-tall (three or more line) left parenthesis.
The center sections encoded here are meant to be used only with the top and
bottom pieces encoded adjacent to them because the segments are usually
graphically constructed within the fonts so that they match perfectly when
positioned at the same <i>x</i> coordinates.</p>
<p><i><b>Vertical Square Brackets.</b></i> The vertical square brackets, U+23B4
TOP SQUARE BRACKET and U+23B5 BOTTOM SQUARE BRACKET, are compatibility
characters for legacy applications emulating certain terminals. They are
intended for those terminal applications only, for limited use in
vertically-oriented bracketed expressions. U+23B6 BOTTOM SQUARE BRACKET OVER TOP
SQUARE BRACKET is used when a single character cell is both the end of one such
expression and the start of another. These compatibility characters should not
be confused with the general need for rotated <i>glyphs</i> for parentheses,
brackets, braces, and quotation marks for vertically rendered CJK text. Such
rotations should be handled by fonts and rendering software, rather than by
separate encoding of each rotated glyph as a character. See further discussion
in <i>Section 6.1, General Punctuation</i> in <i>The Unicode Standard, Version
3.0.</i></p>
<p><i><b>Terminal Graphics Characters.</b></i> In addition to the box-drawing
characters in the Box Drawing block, a small number of additional vertical or
horizontal line characters are encoded in the Miscellaneous Technical symbols
block to complete the set of compatibility characters needed for applications
which need to emulate various old terminals. The horizontal scan line
characters, U+23BA HORIZONTAL SCAN LINE-1 through U+23BD HORIZONTAL SCAN LINE-9,
in particular, represent characters that were encoded in character ROM for use
with 9-line character graphic cells. Horizontal scan line characters are encoded
for scan lines 1, 3, 7, and 9. The horizontal scan line character for scan line
5 is unified with U+2500 BOX DRAWINGS LIGHT HORIZONTAL.</p>
<p><i><b>Dental Symbols. </b></i>The set of symbols from U+23BE to U+23CC form a
set of symbols from JIS X0213 for use in dental notation.</p>
<p><i><b>Standards. </b></i>This block contains a large number of symbols from
ISO/IEC 9995-7:1994, <i>Information technology—Keyboard layouts for text and
office systems—Part 7: Symbols used to represent functions</i>. </p>
<h3><a name="12_7_miscellaneous_symbols_and_dingbats">12.7 Miscellaneous Symbols
and Dingbats</a> (new subsection, revision and addition)</h3>
<h3>Recycling Symbols (new subsection in Miscellaneous Symbols: U+2600-U+26FF)</h3>
<p><i><b>Plastic Bottle Material Code System</b></i>. The seven numbered logos
encoded from U+2673 to U+2679
<img src="PBMCS.jpg"
alt="images for U+2673 to U+2679" width="211" height="31"> are from “The Plastic Bottle Material Code
System,” introduced in 1988 by the Society of the Plastics Industry (SPI) (see
<a href="http://www.socplas.org">http://www.socplas.org</a>). This set
consistently uses thin, two-dimensional curved arrows suitable for use in
plastics molding. In actual use, the symbols often are combined with an
abbreviation of the material class below the triangle. Such abbreviations are
not universal, therefore they are not present in the representative glyphs in <i>Chapter 14, Code Charts</i> in <i>The Unicode Standard, Version 3.0</i>.</p>
<p><i><b>Recycling Symbol for Generic Materials</b></i>. An unnumbered plastic
resin code symbol U+267A <img src="U-267A.jpg" width="33" height="30"
alt="U+267A"> RECYCLING SYMBOL FOR GENERIC MATERIALS is not formally part of the
SPI system, but is found in many fonts. Occasional use of this symbol as a
generic materials code symbol can be found in the field, usually with a text
legend below, but sometimes also surrounding (or overlaid by) other text or
symbols. Sometimes, the UNIVERSAL RECYCLING SYMBOL is substituted for the
generic symbol in this context.</p>
<p><i><b>Universal Recycling Symbol.</b></i> Unicode encodes two common glyph
variants of this symbol, U+2672 <img src="U-2672.jpg" width="38" height="31"
alt="U+2672"> UNIVERSAL RECYCLING SYMBOL and U+267B <img src="U-267B.jpg"
width="35" height="32" alt="U+267B"> BLACK UNIVERSAL RECYCLING SYMBOL. Both are
used to indicate that the material is recyclable. The white form is the
traditional version of the symbol, but the black form is sometimes substituted,
presumably because the thin outlines of the white form do not always reproduce
well.</p>
<p><b><i>Paper Recycling Symbols.</i></b> The two paper recycling symbols U+267C
<img src="U-267C.jpg" width="32" height="30" alt="U+267C"> RECYCLED PAPER SYMBOL
and U+267D <img src="U-267D.jpg" width="33" height="29" alt="U+267D">
PARTIALLY-RECYCLED PAPER SYMBOL can be used to distinguish fully and partially
recycled fiber content in paper products or packaging. They are usually
accompanied by additional text.</p>
<h3>Dingbats: U+2700-U+27BF (revision) </h3>
<p>The following text replaces the text on Dingbats on pages 305-306 of <i>The
Unicode Standard, Version 3.0</i>:</p>
<p>The Dingbats are derived from a well-established set of glyphs, the ITC Zapf
Dingbats series 100, which comprises the industry standard “Zapf Dingbat” font
currently available in most laser printers. Other series of dingbat glyphs also
exist, but are not encoded in the Unicode Standard because they are not widely
implemented in existing hardware and software as character-encoded fonts. The
order of the Dingbats block basically follows the PostScript encoding.</p>
<p><b><i>Unifications.</i></b> Where a dingbat from the ITC Zapf Dingbats series
100 could be unified with a generic symbol widely used in other contexts, only
the generic symbol was encoded. This accounts for the encoding gaps in the
Dingbats block. Examples of such unifications include card suits, BLACK STAR,
BLACK TELEPHONE, and BLACK RIGHT-POINTING INDEX (see “Miscellaneous Symbols”);
BLACK CIRCLE and BLACK SQUARE (see “Geometric Shapes”); white encircled numbers
1 to 10 (see “Enclosed Alphanumerics”); and several generic arrows (see
“Arrows”). Those four entries appear elsewhere in this section.</p>
<p>In other instances, other glyphs from the ITC Zapf Dingbats series 100 glyphs
have come to be recognized as having applicability as generic symbols, despite
having originally been encoded in the Dingbats block. For example, the series of
negative (black) circled numbers 1 to 10 are now treated as generic symbols for
this sequence, the continuation of which can be found in “Enclosed Alphanumerics”.
Other examples include U+2708 AIRPLANE and U+2709 ENVELOPE, which have definite
semantics independent of the specific glyph shape, and which therefore should be
considered generic symbols, rather than as symbols representing only the Zapf
Dingbat glyph shapes.</p>
<p>For many of the remaining characters in the Dingbat block, their semantic
value is primarily their shape; unlike characters that represent letters from a
script, there is no well-established range of typeface variations for a dingbat
that will retain its identity and therefore its semantics. It would be incorrect
to arbitrarily replace U+279D TRIANGLE-HEADED RIGHTWARDS ARROW with any other
right arrow dingbat or with any of the generic arrows from the Arrows block
(U+2190..U+21FF). But exact shape retention for the glyphs is not always
required in order to maintain the relevant distinctions. For example, ornamental
characters such as U+2741 EIGHT PETALLED OUTLINE BLACK FLORETTE have been
successfully implemented in font faces other than Zapf Dingbats with glyph
shapes which are similar, but not identical to the ITC Zapf Dingbats series 100.</p>
<p>The following guidelines are provided for font developers wishing to support
this block of characters. Characters showing large sets of contrastive glyph
shapes in the Dingbats block, and in particular the various arrow shapes at
U+2794..U+27BE, should have glyphs that are closely modeled on the ITC Zapf
Dingbats series 100, which are shown as representative glyphs in the code charts. The
same applies to the various stars, asterisks, and snowflakes, drop-shadowed
squares, checkmarks, and x’s, many of which are ornamental, and have an
elaborate name describing their glyph.</p>
<p>Where the above does not apply, or where dingbats have more generic
applicability as a symbol, their glyphs do not need not to match the representative
glyphs in the code charts in every detail.</p>
<h3>Ornamental Brackets (addition to Dingbats: U+2700-U+27BF)</h3>
<p><b><i>Ornamental Brackets.</i></b> The 14 ornamental brackets encoded at
U+2768..U+2775 are a late addition to the set of Zapf Dingbats encoded in the
Unicode Standard. Although they have always been included in Zapf Dingbats
fonts, they were unencoded in PostScript versions of the fonts on some
platforms, and hence were omitted from the original set encoded in Unicode. They
have been added for compatibility and consistency in handling of the cmaps for
current versions of the fonts.</p>
<h3><a name="12_12_standardized_variants_of_mathematical_symbols">12.12
Standardized Variants of Mathematical Symbols</a> (new section)</h3>
<p>These mathematical variants are all produced with the addition of U+FE00
VARIATION SELECTOR-1 (VS1) to mathematical operator base characters. Only the
valid, recognized combinations are listed in the table of standardized variants.
All combinations not listed here are unspecified and are reserved for future
standardization; no conformant process may interpret them as standardized
variants.</p>
<h3>Change in Representative Glyphs for U+2278 and U+2279</h3>
<p>In Unicode 3.2 the representative glyphs for U+2278 NEITHER LESS-THAN NOR
GREATER-THAN and U+2279 NEITHER GREATER-THAN NOR LESS-THAN are changed from using a vertical cancellation to using a
slanted cancellation. This change was made to match the long standing canonical decompositions for these characters, which use
U+0338 COMBINING LONG SOLIDUS OVERLAY. Irrespective of this change to the
representative glyphs, the symmetric forms using the vertical stroke are
acceptable glyph variants. Using U+2278 or U+2279 with VS1 will request these
variants explicitly, as will using U+2276 LESS-THAN OR GREATER-THAN or U+2277
GREATER-THAN OR LESS-THAN with U+20D2 COMBINING LONG VERTICAL LINE
OVERLAY. Unless fonts are
created with the intention to add support for both forms (via VS1 for the
upright forms), there is no need to revise the glyphs in existing fonts; the
glyphic range implied by using the base character code alone encompasses both
shapes.</p>
<p>For more information, see <i><a href="#13_7_variation_selectors">Section
13.7, Variation Selectors</a></i>, later in this document.</p>
<h3><a name="13_2_layout_controls">13.2 Layout Controls</a> (additions)</h3>
<h3>Combining Grapheme Joiner (U+034F) (addition) </h3>
<p>The <i>combining grapheme joiner</i> is used to indicate that adjacent characters
belong to the same grapheme cluster. Grapheme clusters are sequences of one or
more encoded characters that correspond to what users think of as characters.
They include, but are not limited to, combining character sequences such as (g +
°), digraphs such as Slovak “ch”, or sequences with letter modifiers such as k<sup>w</sup>.
Grapheme cluster boundaries are important for collation, regular-expressions,
and counting “character” positions within text. The Unicode Standard provides a
determination of where the default grapheme boundaries fall in a string of
characters. This algorithm can be customized for specific locales. </p>
<p>Note: The rules for default grapheme cluster boundaries, default word boundaries and default sentence
boundaries are in the process of being superseded by a new
<a href="http://www.unicode.org/unicode/reports/tr29/">Unicode Technical
Report #29, Text Boundaries</a>.</p>
<p>There are circumstances where even the locale-specific determination of
grapheme boundaries may need to be further tailored on a local basis. These
include:</p>
<ul>
<li>Determining the placement of combining accents that should apply to a
sequence of base characters, rather than a single base character.</li>
<li>Distinguishing in collation between sequences of characters that are
normally considered a grapheme in a particular language, and that same
sequence in foreign words.</li>
</ul>
<p>The character U+034F COMBINING GRAPHEME JOINER has been added to prevent
inappropriate grapheme breaks. The properties of this character are specified so
as to work well with current software for such processes as grapheme-cluster
determination, line-break, and collation. In terms of grapheme determination it
functions like the Indic <i>viramas</i>. Thus a sequence
functions as a single grapheme.</p>
<p>The grapheme joiner prevents line breaking between adjacent characters;
however, where the prevention of line breaking is the only desired effect, the word joiner should be used
instead (see <a href="http://www.unicode.org/reports/tr14/">Unicode Standard Annex #14, “Line Breaking
Properties”</a>). In collation, the grapheme joiner should be ignored unless it
specifically occurs within a tailored collation element mapping. Thus it is
given a completely ignorable collation element in the default collation table,
like NULL (see <a href="http://www.unicode.org/reports/tr10/">Unicode Technical Standard #10, “Unicode
Collation Algorithm”</a> and also ISO/IEC 14651). However, it can be entered
into the tailoring rules for any given language, using the UCA and ISO/IEC 14651
tailoring capabilities.</p>
<p>For rendering, the grapheme joiner is an invisible combining character with
canonical class of zero. It can bind adjacent characters into a base for
combining marks in circumstances described in “Applications of Combining Marks”
in <i><a href="#3_9_special_character_properties">Section 3.9, Special Character
Properties (revision)</a></i> in this document. For
any specified repertoire, implementation support for this capability can be
provided by means of ligature tables in the font, or by means of special
placement rules (see
<a href="http://partners.adobe.com/asn/developer/opentype/main.html">
http://partners.adobe.com/asn/developer/opentype/main.html</a>). Some display
engines may be able to supply runtime generative support. As with other
combining marks, there is considerable latitude for display depending on the
environment (such as the choice of font). </p>
<p>The combining grapheme joiner must not be confused with the <i>zero width
joiner,</i> or the <i>word joiner,</i> which have very different functions. In
particular, inserting a <i>combining grapheme joiner</i> between two characters
has no effect on their ligation or cursive joining behavior.</p>
<h3>Word Joiner (U+2060) (addition) </h3>
<p>In Unicode 3.1.1 and before, the codepoint U+FEFF serves two very different
purposes:</p>
<ul>
<li>It is used as a zero-width non-breaking space (ZWNBSP), with applicability
across a wide range of scripts and usages.</li>
<li>It is also used as a signature, with a very specific use at the start of
files or streams. See <i>Section 2.7, Special Character and Noncharacter
Values</i>, <i>Section 3.8, Transformations</i>, and <i>Section 13.6, Specials</i>
in <i>The Unicode Standard, Version 3.0</i>.</li>
</ul>
<p>If U+FEFF had only the semantic of a signature codepoint, it could be freely
deleted from text without affecting the interpretation of the rest of the text.
Carelessly appending files together, for example, can result in a signature
codepoint in the middle of text. Unfortunately, U+FEFF also has significance as
a character. As a ZWNBSP, it indicates that line breaks are not allowed between
the adjoining characters. Thus U+FEFF impacts the interpretation of text, and
cannot be freely deleted. The overloading of semantics for this codepoint has
caused problems for programs and protocols.</p>
<p>The new character U+2060 WORD JOINER has the same semantics in all cases as
U+FEFF, except that it <i>cannot</i> be used as a signature. That is, the
function of the character is to indicate that the two adjacent characters should
not be broken across lines. See the GL category in <a href="http://www.unicode.org/reports/tr14/">Unicode
Standard Annex #14, “Line Breaking Properties”</a>. In other contexts the
character should be ignored.</p>
<p>Unicode 3.2 implementations should support this new character, but also
support the ZWNBSP semantic of U+FEFF.</p>
<p>Note: Implementers are strongly encouraged to use word joiner in those
circumstances whenever word joining semantics is intended.</p>
<p>The word joiner must not be confused with the <i>zero width joiner</i> or the
<i>combining grapheme joiner,</i> which have very different functions. In
particular, inserting a <i>word joiner</i> between two characters has no effect on
their ligating or cursive joining behavior.</p>
<h3>Ligatures and Latin Typography (addition)</h3>
<p>It is the task of the rendering system to select a ligature (where ligatures
are possible) as part of the task of creating the most pleasing line layout.
Fonts that provide more ligatures give the rendering system more options.</p>
<p>However, defining the locations where ligatures are possible cannot be done
by the rendering system, because there are many languages in which this depends
not on simple letter pair context but on the meaning of the word in question. </p>
<p>ZWJ and ZWNJ are to be used for the latter task, marking the non-regular
cases where ligatures are required or prohibited. This is different from
selecting a degree of ligation for stylistic reasons. Such selection is best
done with style markup. See
<a href="http://www.unicode.org/unicode/reports/tr20/">Unicode Technical Report
#20, “Unicode in XML and other Markup Languages”</a> for more information.</p>
<h3><a name="13_7_variation_selectors">13.7 Variation Selectors</a> (new
section)</h3>
<p>Unicode characters can be represented by a wide variety of glyphs, as
discussed in <i>Chapter 2</i><span lang="en-us"><i>, General Structure</i> in<i>
The Unicode Standard, Version 3.0</i>.</span><i> </i>Occasionally the need arises in text
processing to restrict or change the set of glyphs that are to be used to
represent a character. Normally such changes are indicated by choice of font or
style in rich-text documents. In special circumstances, such a variation from
the normal range of appearance needs to be expressed side-by-side in the same
document in plain-text contexts, where it is impossible or inconvenient to
exchange formatted text. For example, in languages employing the Mongolian
script, sometimes a specific variant range of glyphs is needed for a specific
textual purpose for which the range of “generic” glyphs is considered
inappropriate. The variation selectors are used when characters have essentially
the same semantic.</p>
<p>Variation selectors provide a mechanism for specifying a restriction on the
set of glyphs that are used to represent a particular character. They also
provide a mechanism for specifying variants, such as for CJK Ideographs and
Mongolian, that have essentially the same semantic but have substantially
different ranges of glyphs. A variation sequence, which always consists of a
base character followed by the variation selector, may be specified as part of
the Unicode Standard. That sequence is referred to as a <i>variant</i> of the
base character. The variation selector affects <i>only</i> the appearance of the
base character,* and only in the variation sequences defined in this Standard.
The variation selector is <i>not </i>used as a general code extension mechanism:</p>
<blockquote>
<p><i>Only the variation sequences specifically defined in the Unicode
Character Database in the file <a href="http://www.unicode.org/Public/3.2-Update/StandardizedVariants-3.2.0.html">StandardizedVariants.html</a>
are sanctioned for standard use; in all other cases the variation selector
cannot change the visual appearance of the preceding base character from what
it would have had, in the absence of the variation selector.</i></p>
</blockquote>
<p>The base character in a variation sequence is never a combining character or
a decomposable character.* The variation selectors themselves are combining marks
of combining class 0, and are default ignorable characters. Thus if the
variation sequence is not supported, the variation selector should be invisible
and ignored. As with all default ignorable characters, this does not preclude
modes or environments where the variation selectors should be given visible
appearance. For example, a “Show Hidden” mode could reveal the presence of
such characters with specialized glyphs, or particular environment could use or
require a visual indication of a base character (such as a wavy underline) to
show that it is part of a standardized variation sequence that cannot be
supported by the current font.</p>
<p>The standardization or support of a particular variation sequence does <i>not</i>
limit the set of glyphs that can be used to represent the base character alone.
If a user <i>requires</i> a visual distinction between a character and a
particular variant of that character, then fonts must be used to make that
distinction. The existence of a variation sequence does not preclude the later
encoding of a new character with a distinct semantic and a similar or
overlapping range of glyphs.</p>
<blockquote>* Note: Just before publication, an inconsistency was discovered between the
above principles and the standardization of the two variant sequences <2278,
FE00> and <2279, FE00> because U+2278 and U+2279 are in fact decomposable
characters. Those variant sequences denote glyph variants of these mathematical
symbols with a vertical line instead of a slanted line as the diacritic to indicate the negation.<p>The sequence <2278, FE00> is canonically equivalent to <2276, 0338, FE00>, and
the sequence <2279, FE00> is canonically equivalent to <2277, 0338, FE00>. So
that these equivalent sequences are given equivalent rendering treatment, the
use of U+FE00 would have to be interpreted—exceptionally—as
defining a variant appearance for the <i>entire</i> sequence.</p>
<p>Because a combining vertical line overlay, U+20D2 COMBINING LONG VERTICAL LINE
OVERLAY, is also available in the Standard, an alternate way of explicitly
indicating these particular variants already exists. That alternative mechanism
is a safer and more stable way to indicate the distinction, as the inherent
complications in allowing variation selectors to follow combining marks may
require future corrective action to remove the exceptional variant sequences
<2278, FE00> and <2279, FE00> from the table.</p>
</blockquote>
<h3><a name="14.1_character_names_list">14.1 Character Names List</a> (addition)</h3>
<p>Add the following text to the end of <i>Section 14.1, Character Names List</i>
on page 335, <i>The Unicode Standard, Version 3.0</i>:</p>
<h3>Subheads</h3>
<p>The character names list contains a number of informative subheads which help divide up the list into smaller sublists of similar characters. For example, in the Miscellaneous Symbols block, U+2600..U+26FF, there are subheads for
“Astrological symbols”, “Chess symbols”, and so on. Such subheads are editorial and informative, and should not be taken as providing any definitive, normative status information about characters in the sublists they mark, nor about any constraints on what characters could be encoded in the future at reserved code points within their ranges.
The subheads are subject to change.</p>
<h2 class="bb"><a name="charts">V Code Charts</a></h2>
<p>The following code charts contain the characters added in Unicode 3.2. They
are shown together with the characters that were part of Unicode 3.1. New
characters are shown on a yellow background in these code charts.</p>
<ul>
<li><a href="http://www.unicode.org/charts/PDF/U32-0180.pdf">Latin Extended-B</a></li>
<li><a href="http://www.unicode.org/charts/PDF/U32-0300.pdf">Combining
Diacritical Marks</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-0370.pdf">Greek and Coptic</a>
</li>
<li><a href="http://www.unicode.org/charts/PDF/U32-0400.pdf">Cyrillic</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-0500.pdf">Cyrillic
Supplement</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-0600.pdf">Arabic</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-0780.pdf">Thaana</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-10A0.pdf">Georgian</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-1700.pdf">Tagalog</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-1720.pdf">Hanunoo</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-1740.pdf">Buhid</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-1760.pdf">Tagbanwa</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-2000.pdf">General
Punctuation</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-2070.pdf">Superscripts and
Subscripts</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-20A0.pdf">Currency Symbols</a>
</li>
<li><a href="http://www.unicode.org/charts/PDF/U32-20D0.pdf">Combining
Diacritical Marks for Symbols</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-2100.pdf">Letterlike
Symbols</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-2190.pdf">Arrows</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-2200.pdf">Mathematical
Operators</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-2300.pdf">Miscellaneous
Technical</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-2460.pdf">Enclosed
Alphanumerics</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-2580.pdf">Block Elements</a>
</li>
<li><a href="http://www.unicode.org/charts/PDF/U32-25A0.pdf">Geometric Shapes</a>
</li>
<li><a href="http://www.unicode.org/charts/PDF/U32-2600.pdf">Miscellaneous
Symbols</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-2700.pdf">Dingbats</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-27C0.pdf">Miscellaneous
Mathematical Symbols-A</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-27F0.pdf">Supplemental
Arrows-A</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-2900.pdf">Supplemental
Arrows-B</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-2980.pdf">Miscellaneous
Mathematical Symbols-B</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-2A00.pdf">Supplemental
Mathematical Operators</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-3000.pdf">CJK Symbols and
Punctuation</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-3040.pdf">Hiragana</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-30A0.pdf">Katakana</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-31F0.pdf">Katakana Phonetic
Extensions</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-3200.pdf">Enclosed CJK
Letters and Months</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-A490.pdf">Yi Radicals</a>
</li>
<li><a href="http://www.unicode.org/charts/PDF/U32-F900.pdf">CJK Compatibility
Ideographs</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-FE70.pdf">Arabic
Presentation Forms-B</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-FE00.pdf">Variation
Selectors</a> </li>
<li><a href="http://www.unicode.org/charts/PDF/U32-FE30.pdf">CJK Compatibility
Forms</a></li>
</ul>
<p> </p>
<blockquote>
<table border="1" width="85%" cellpadding="3" cellspacing="0">
<tr>
<td width="85%" height="15">
<p align="center"><b><i><u>Code Charts Notice:</u></i></b> </p>
<p>Annotations for many characters have been added or revised throughout
the code charts. These are not mentioned explicitly in the list above.
Please see <a href="http://www.unicode.org/charts">
http://www.unicode.org/charts</a> for a list of all code charts.</td>
</tr>
</table>
</blockquote>
<h2 class="bb"><a name="errata">VI Errata</a></h2>
<p>This article contains errata rolled up since the publication of <i>The
Unicode Standard, Version 3.1</i>. These errata are listed by date in the table
below. For prior errata from Unicode 3.1, see the errata listed in <i>Unicode
Standard Annex #27: Unicode 3.1</i> (<a href="http://www.unicode.org/reports/tr27/#errata">http://www.unicode.org/reports/tr27/#errata</a>).</p>
<table border="1">
<tr>
<th width="20%">Date </th>
<th width="85%">Summary </th>
</tr>
<tr>
<td width="20%">2002 February 26</td>
<td width="85%">Corrigendum #3: U+F951 Normalization posted.<br>
NOTE: This corrigendum is incorporated in, and superseded by, this document.
</td>
</tr>
<tr>
<td width="20%">2002 January 18</td>
<td width="85%">In UAX #27: Unicode 3.1, in Article IV, Guidelines under the
subsection Unassigned Code Points, “U+FFFC” should instead read “U+FFFB” in
the following sentence:<br>
To allow a greater degree of compatibility across versions of the standard,
the ranges of U+2060..U+206F, U+FFF0..U+FFFB, and U+E0000..U+E0FFF are
reserved for format and control characters (General Category = Cf).</td>
</tr>
<tr>
<td width="20%" valign="top">2001 September 25</td>
<td width="80%">The character U+0B83 TAMIL SIGN VISARGA is actually a
stand-alone character, not a combining character. This character's General
Category has been changed from “Mc” to “Lo” in accordance with this. The
glyph on the left below shows the character in previous charts; the glyph on
the right shows the character as it should appear (without a dotted circle). See
<a href="http://www.unicode.org/charts/PDF/U32-0B80.pdf">http://www.unicode.org/charts/PDF/U32-0B80.pdf</a>.
<p><img border="0" src="Tamil0B83-before.jpg" alt="prior U+0B83" width="149" height="171">
<img border="0"
src="Tamil0B83-after.jpg" alt="corrected U+0B83" width="149" height="171">
</td>
</tr>
<tr>
<td width="20%" valign="top">2001 April 25</td>
<td width="80%">On p. 500, in the Unicode names list in TUS 3.0, the glyph
for U+2032 was omitted. It is shown correctly in the code chart on page 498
or see <a href="http://www.unicode.org/charts/PDF/U2000.pdf">http://www.unicode.org/charts/PDF/U2000.pdf</a>.</td>
</tr>
</table>
<h2 class="bb"><a name="database">VII Unicode Character Database Changes</a></h2>
<p>The main change to the <a href="http://www.unicode.org/Public/3.2-Update/">
Unicode Character Database for Unicode 3.2</a> is the extension of the data
files to cover the character repertoire addition. This most importantly impacts
UnicodeData.txt, LineBreak.txt, and EastAsianWidth.txt, each of which has been
extended to cover all the newly encoded characters. Also, an updated informative
NamesList.txt file is provided to cover the new repertoire.</p>
<p><b><i>Property and Property Value Aliases.</i></b> The PropertyAliases and
PropertyValueAliases files contain contain recommended UCD property identifiers
and property value identifiers. These identifers can be used for XML formats of
UCD data, for regular-expression property tests, and other programmatic textual
descriptions of Unicode data. In comparing identifiers, case differences should
not be significant, and the presence or absence of an underbar should be
ignored. The identifiers in the PropertyAliases and PropertyValueAliases files
are normative in the following sense: </p>
<blockquote>
<p>Where the identifiers are used to refer to Unicode properties or property
values, they can only be used in accordance with the Unicode Character
Database semantics.</p>
</blockquote>
<p>This does not prevent implementations from using other identifiers to refer
to Unicode property or property values. For example, there is nothing to prevent
the use of French translations of the identifiers.</p>
<p><b><i>Blocks.</i></b> The normative blocks defined in Blocks.txt have been
adjusted slightly, in accordance with Unicode Technical Committee decisions.</p>
<ul>
<li>Every block starts and ends on a column boundary. That is, the last digit
of the first code point in the block is always 0, and the last digit of the
final code point in the block is always F.</li>
<li>Every block is contiguous. That is, if any two code points are in the same
block, then all intermediate code points are in that block. </li>
</ul>
<p>The block property values are listed in the Blocks datafile, and are not
repeated in the PropertyValueAliases datafile. (Block property values should be
used with caution; for more information see
<a href="http://www.unicode.org/reports/tr18/tr18-6d2.html">Unicode Technical
Report #18, “Unicode Regular Expression Guidelines”</a>, Annex A.)</p>
<p>The notes for SpecialCasing.txt have been updated, and the rules for casing
involving dotted letters (i, j) have been reformulated more generically.</p>
<p>An updated Index.txt has been provided, to make it easier to locate the newly
added characters, particularly for mathematics.</p>
<h3>New Properties</h3>
<p>The following new property files have been added:</p>
<ul>
<li>PropertyValueAliases and PropertyAliases: These contain recommended UCD
property names and property value names. These names can be used for XML
formats of UCD data, for regular-expression property tests, and other
programmatic textual descriptions of Unicode data.</li>
<li>DerivedAge: This file shows when various code points were designated in
successive versions of the Unicode Standard.</li>
<li>NormalizationCorrections: This file contains any corrections required
to maintain backwards compatibility for normalization. Currently it
lists code point differences for
<a href="http://www.unicode.org/versions/corrigendum3.html">Corrigendum #3: U+F951 Normalization</a>.</li>
</ul>
<p>Other new properties include:</p>
<ul>
<li>Grapheme_Base, Grapheme_Extend, Grapheme_Link: For programmatic
determination of grapheme cluster boundaries.</li>
<li>IDS_Binary_Operator, IDS_Trinary_Operator, Radical, Unified_Ideograph: For
a machine-readable list of Ideographic Description Sequences.</li>
<li>Default_Ignorable_Code_Point: For programmatic determination of
default-ignorable code points. These code points are to be ignored by processes that do not explicitly support them. This
permits programs to be compatible with future assignments of such characters.
Ordinarily they are invisible, have no glyph, and have no advance width.</li>
<li>Deprecated: For a machine-readable list of deprecated characters. No
characters will ever be removed from the standard, but the use of deprecated
characters is strongly
discouraged.</li>
<li>Soft_Dotted: Characters with a “soft dot”, like <i>i</i> or <i>j</i>. An
accent placed on these characters causes the dot to disappear.</li>
<li>Logical_Order_Exception: There are a small number of characters (in the
Thai and Lao scripts) that do not use logical order. These characters require
special handling in most processing.</li>
</ul>
<p>For more information on these new properties, see the relevant documentation
in the Unicode Character Database.</p>
<p>Note: For consistency with the property naming conventions, the property <i>
BidiMirrored</i> has been renamed to <i>Bidi_Mirrored</i> (see
DerivedBinaryProperties.txt). Also the property <i>Comp_Ex</i> has been renamed
to <i>Full_Composition_Exclusion</i> (see DerivedNormalizationProps.txt).</p>
<h3>File Name Length Restriction</h3>
<p>For cross-platform interoperability, the file names will be restricted to no
more than 31 characters in length. Due to this change in policy,
DerivedNormalizationProps.txt is the new file name for the file formerly known
as DerivedNormalizationProperties.txt.</p>
<p>The documentation files for the Unicode Character Database have been updated
to reflect the additions of new property files and new character properties to
existing files, and the new file name length restriction.</p>
<h2 class="bb"><a name="relation">VIII Relation to ISO/IEC 10646</a></h2>
<p>ISO/IEC 10646 is a multi-part standard. Part 1, published as ISO/IEC
10646-1:2000(E), covers the Architecture and Basic Multilingual Plane. Part 2,
published as ISO/IEC 10646-2:2001(E), covers the supplementary planes. Amendment
1 to Part 1 makes a few modifications to the architecture of 10646 and adds
about a thousand characters to the BMP. </p>
<p>Unicode 3.2 contains all of the characters of Amendment 1, including the two
characters of Amendment 1 that had already been added to Unicode 3.1. With the
publication of Amendment 1 to ISO/IEC 10646-1:2000 and the Unicode Standard,
Version 3.2, the two standards are fully synchronized. </p>
<p>The Unicode Consortium and ISO/IEC JTC1/SC2/WG2 are committed to maintaining
the synchronization between the two standards. </p>
<p>Notable among the architectural changes to ISO/IEC 10646 approved in
Amendment 1 are: </p>
<ul>
<li>The range of characters available for private use has been restricted to
those characters accessible via UTF-16, and the intent not to encode
characters past Plane 16 has been clarified. This guarantees the
interoperability of UTF-8 and UTF-16, and the equivalence of UTF-32 and UCS-4.</li>
<li>The definition of UCS short identifiers has been modified and UCS sequence
identifiers have been added. This brings 10646 in line with Unicode
conventions for representing characters and sequences of characters. </li>
<li>The clause reserving characters for internal use has been updated, so that
the 10646 specification is in line with the Unicode specification of
noncharacters, including the noncharacters at U+FDD0..U+FDEF.</li>
</ul>
<h2 class="bb"><a name="references">IX References</a> and Sources</h2>
<h3>Standards and Specifications</h3>
<p>ISO/IEC 9573-13: International Organization for Standardization. <i>
Information technology—SGML support facilities—Techniques for using SGML—Part
13: Public entity sets for mathematics and science.</i> [Geneva], 1991. (ISO/IEC
TR 9573-13:1991).</p>
<p>ISO/IEC 9995-7: <i>Information technology—Keyboard layouts for text and
office systems—Part 7: Symbols used to represent functions</i>. [Geneva], 1994.
(ISO/IEC 9995-7:1994).</p>
<p>ISO/IEC 14651: International Organization for Standardization. <i>Information
technology—International string ordering and comparison—Method for comparing
character strings and description of the common template tailorable ordering</i>.
[Geneva], 2001. (ISO/IEC 14651:2001).</p>
<p>JIS X 0213: Japanese Industrial Standards Committee. <i>7 bitto oyobi 8 bitto
no 2 baito jouhou koukan you fugouka kakuchou kanji shuugou</i> (<i>7-bit and
8-bit double byte coded extended KANJI sets for information interchange</i>).
Tokyo, 2000. (JIS X 0213:2000).</p>
<h3>Other References and Sources</h3>
<p> <i>Doctrina christiana: the first book printed in the Philippines, Manila
1593.</i> A facsimile of the copy in the Lessing J.
Rosenwald Collection...with an introductory essay by Edwin Wolf II. Washington, DC, Library of Congress,
1947.</p>
<p>Kuipers, Joel C., and Ray McDermott. “Insular Southeast Asian Scripts.” In <i>The World’s Writing System</i>s.
Edited by Peter T. Daniels and William Bright. New York, Oxford University Press,
1996. ISBN 0-19-507993-0.</p>
<p>Santos, Hector. <i>The Living Scripts</i>. Los Angeles: Sushi Dog Graphics,
1995. (Ancient Philippine Scripts Series; 2).<br>
User’s guide accompanying <i>Computer Fonts, Living Scripts</i> software.</p>
<p>Santos, Hector. <i>Our Living Scripts</i>. January 31, 1997. <br>
<a href="http://www.bibingka.com/dahon/living/living.htm">http://www.bibingka.com/dahon/living/living.htm
</a>
<br>
Part of his <i>A Philippine Leaf</i>.</p>
<p>Santos, Hector. <i>The Tagalog Script</i>. Los Angeles: Sushi Dog Graphics,
1994. (Ancient Philippine Scripts Series; 1).
<br>
User’s guide accompanying <i>Tagalog Script Fonts</i> software.</p>
<p>Santos, Hector. <i>The Tagalog Script</i>. October 26, 1996. <br>
<a href="http://www.bibingka.com/dahon/tagalog/tagalog.htm">http://www.bibingka.com/dahon/tagalog/tagalog.htm</a>
<br>
Part of his <i>A Philippine Leaf</i>.</p>
<p>STIPUB Consortium. STIX (Scientific and Technical Information Exchange)
Project. <br>
<a href="http://www.ams.org/STIX/">http://www.ams.org/STIX/</a></p>
<h2><a name="Modifications">X Modifications</a></h2>
<p>The following summarizes modifications from the previous version of this
document. Modifications to this document will be limited to repairing
straightforward typographical and production errors. Updates in content will be
carried out via a future version of the Unicode Standard, published in a
separate document.</p>
<table cellspacing="4" cellpadding="0" width="100%" border="0" class="noborder" style="border-collapse: collapse">
<tr>
<td valign="top" width="1" class="noborder"><a name="tracking_number">3</a></td>
<td valign="top" class="noborder">
<ul>
<li>None</li>
</ul>
</td>
</tr>
</table>
<hr align="LEFT">
<p><font size="-1">Copyright © 2001-2002 Unicode, Inc. All Rights Reserved. The
Unicode Consortium makes no expressed or implied warranty of any kind, and
assumes no liability for errors or omissions. No liability is assumed for
incidental and consequential damages in connection with or arising out of the
use of the information or programs contained or accompanying this technical
report.</font></p>
<p><font size="-1">Unicode and the Unicode logo are trademarks of Unicode, Inc.,
and are registered in some jurisdictions.</font></p>
</div>
</body>
</html>
Rendered documentLive HTML preview