tr27
rev 4Unicode 3.1
Open HTMLUpstream
tr27-4.html
3059 lines
Open Raw
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"

       "http://www.w3.org/TR/REC-html40/loose.dtd"> 

<html>



<head><base href="https://www.unicode.org/reports/tr27/tr27-4.html">





<link rel="stylesheet" href="http://www.unicode.org/unicode.css" type="text/css">

<meta name="GENERATOR" content="Microsoft FrontPage 4.0">

<meta name="ProgId" content="FrontPage.Editor.Document">

<title>UAX #27: Unicode 3.1</title>

</head>



<body>



<table border="0" cellpadding="0" cellspacing="0" width="100%">

  <tbody>

    <tr>

      <td>

        <table border="0" cellpadding="0" cellspacing="0" width="100%">

          <tbody>

            <tr>

              <td class="icon"><a href="http://www.unicode.org"><img

                align="middle" alt="[Unicode]" border="0"

                src="http://www.unicode.org/webscripts/logo60s2.gif" width="34"

                height="33"></a>&nbsp;&nbsp;<a class="bar"

                href="http://www.unicode.org/unicode/reports">Technical Reports</a></td>                                                                             

            </tr>                                                                             

          </tbody>                                                                             

        </table>                                                                             

      </td>                                                                             

    </tr>                                                                             

    <tr>                                                                             

      <td class="gray">&nbsp;</td>                                                                             

    </tr>                                                                             

  </tbody>                                                                             

</table>                                                                             

<h2 align="center">Unicode Standard Annex #27</h2>                                                                            

<h1 align="center">Unicode 3.1</h1>                                                                            

<table border="1" cellpadding="2" width="100%">                                     

  <tr>                                     

    <td height="24" valign="TOP" width="20%">Version</td>                                     

    <td valign="TOP">Unicode 3.1.0</td>                                    

  </tr>                                    

  <tr>                                    

    <td height="24" valign="TOP">Authors</td>                                    

    <td valign="TOP">Mark Davis, Michael Everson, Asmus Freytag, John H. Jenkins                                  

      and other members of the editorial                                                                            

      committee</td>                                     

  </tr>                                     

  <tr>                                     

    <td height="24" valign="TOP">Date</td>                                     

    <td valign="TOP">2001-05-16</td>                               

  </tr>                               

  <tr>                               

    <td height="24" valign="TOP">This Version</td>                               

    <td valign="TOP"><a  

      href="http://www.unicode.org/unicode/reports/tr27/tr27-4.html">http://www.unicode.org/unicode/reports/tr27/tr27-4.html</a></td>                              

  </tr>                              

  <tr>                              

    <td height="24" valign="TOP">Previous Version</td>                              

    <td valign="TOP"><a                           

      href="http://www.unicode.org/unicode/reports/tr27/tr27-3.html">http://www.unicode.org/unicode/reports/tr27/tr27-3.html</a></td>                              

  </tr>                              

  <tr>                              

    <td height="24" valign="TOP">Latest Version</td>                              

    <td valign="TOP"><a href="http://www.unicode.org/unicode/reports/tr27">http://www.unicode.org/unicode/reports/tr27</a></td>                              

  </tr>                              

  <tr>                              

    <td height="24" valign="TOP">Tracking Number</td>                              

    <td valign="TOP"><a href="#tracking_number4">4</a></td>                             

  </tr>                             

</table>                                                         

<h3><i>Summary</i></h3>                                                             

<p><i><em>This document defines Version 3.1 of the Unicode Standard. It                                                               

overrides certain features of Unicode 3.0.1, and adds a large number of coded                                                               

characters.</em></i></p>                                                              

<h3><i>Status</i></h3>                                                              

<p><i>This document has been reviewed by Unicode members and other interested                              

parties, and has been approved by the Unicode Technical Committee as a <b>Unicode                              

Standard Annex</b>. It is a stable document and may be used as reference                              

material or cited as a normative reference from another document.</i></p>                             

<blockquote>                             

  <p><i><b>A Unicode Standard Annex (UAX)</b> forms an integral part of the                              

  Unicode Standard, carrying the same version number, but is published as a                              

  separate document. Note that conformance to a version of the Unicode Standard                              

  includes conformance to its Unicode Standard Annexes.</i></p>                             

</blockquote>                             

<p><i>A list of current Unicode Technical Reports is found on <a                                                             

href="http://www.unicode.org/unicode/reports/">http://www.unicode.org/unicode/reports/</a>.                                                              

For more information about versions of the Unicode Standard, see <a                                                             

href="http://www.unicode.org/unicode/standard/versions/">http://www.unicode.org/unicode/standard/versions/</a>.</i></p>                                                             

<p><i>The <a href="#references">References</a> provide related information that                                  

is useful in understanding this document. Please mail corrigenda and other                                  

comments to the author(s).</i></p>                                                            

<h3><i>Contents</i></h3>                                                              

<ul>                                                              

  <li><a href="#description">I Description</a></li>                                                              

  <li><a href="#notation">II Notational Changes for the Standard</a></li>                                                              

  <li><a href="#conformance">III Conformance</a></li>                                                              

  <li><a href="#guidelines">IV Guidelines</a></li>                                                              

  <li><a href="#block">V Block Descriptions</a></li>                                                              

  <li><a href="#charts">VI Code Charts</a></li>                                                              

  <li><a href="#errata">VII Errata</a></li>                                                              

  <li><a href="#database">VIII Unicode Character Database Changes</a></li>                                                              

  <li><a href="#relation">IX Relation to 10646</a></li>                                                              

  <li><a href="#references">X References and Sources</a></li>                                                               

  <li><a href="#Modifications">XI Modifications</a></li>                                                              

</ul>                                                              

<hr align="LEFT">                                                              

<h2 class="bb"><a name="description">I Description</a></h2>                                                              

<p>Unicode 3.1 is a minor version of the Unicode Standard. It overrides certain                                                               

features of Unicode 3.0.1, and adds a large number of coded characters.</p>                                                              

<h3>Formal Definition of Unicode 3.1</h3>                                                              

<p>The Unicode Standard, Version 3.1 is defined by the following list. The                 

version numbering and the role of each component are explained in <a                 

href="http://www.unicode.org/unicode/standard/versions/">Versions of The Unicode                 

Standard</a>. The symbols in the change status column are explained in the <a                

href="#ChangeStatusKey">key</a> below. A summary of modifications in the Unicode                 

Character Database for this version can be found in <a                 

href="http://www.unicode.org/Public/3.1-Update/UnicodeCharacterDatabase-3.1.0.html">UnicodeCharacterDatabase-3.1.html</a>,                 

together with a list of which data files contain normative vs. informative data.</p>                 

<blockquote>                 

  <table border="0" cellspacing="0">                 

    <tr>                 

      <th align="left" colspan="4">Major Reference</th>                 

    </tr>                 

    <tr>                 

      <th align="left"></th>                 

      <td colspan="2"></td>                 

      <td>The Unicode Consortium. <a                 

        href="http://www.unicode.org/unicode/uni2book/u2.html">The Unicode                 

        Standard, Version 3.0</a><br>                 

        Reading, MA, Addison-Wesley Developers Press, 2000. ISBN 0-201-61633-5.</td>                 

    </tr>                 

    <tr>                 

      <th align="left" colspan="4">Minor Reference</th>                 

    </tr>                 

    <tr>                 

      <td></td>                 

      <td colspan="2"></td>                 

      <td>UAX #27:                 

        Unicode 3.1</td>                 

    </tr>                 

    <tr>                 

      <th align="left" colspan="4">Update Reference</th>                 

    </tr>                 

    <tr>                 

      <td></td>                 

      <td colspan="2"></td>                 

      <th align="left" n/a</th>n/a                 

    </tr>                 

    <tr>                 

      <th align="left" colspan="4"><a                 

        href="http://www.unicode.org/unicode/reports/">Unicode Standard Annexes</a></th>                 

    </tr>                 

    <tr>                 

      <td></td>                 

      <td colspan="2"></td>                 

      <td><a href="http://www.unicode.org/unicode/reports/tr9/tr9-9.html">UAX                    

        #9:&nbsp;The Bidirectional Algorithm, V3.1.0</a><br>                    

        <a href="http://www.unicode.org/unicode/reports/tr11/tr11-8.html">UAX                    

        #11:&nbsp;East Asian Width, V3.1.0</a><br>                    

        <a href="http://www.unicode.org/unicode/reports/tr13/tr13-8.html">UAX                    

        #13: Unicode Newline Guidelines, V3.1.0</a><br>                    

        <a href="http://www.unicode.org/unicode/reports/tr14/tr14-10.html">UAX                    

        #14: Line Breaking Properties, V3.1.0</a><br>                    

        <a href="http://www.unicode.org/unicode/reports/tr15/tr15-21.html">UAX                    

        #15: Unicode Normalization Forms, V3.1.0</a><br>                    

        <a href="http://www.unicode.org/unicode/reports/tr19/tr19-8.html">UAX                    

        #19: UTF-32, V3.1.0</a></td>                    

    </tr>                    

    <tr>                    

      <th align="left" colspan="4">Unicode Character Database</th>                    

    </tr>                    

    <tr>                    

      <td></td>                    

      <td colspan="2"></td>                    

      <th align="left"><a href="http://www.unicode.org/Public/3.1-Update">http://www.unicode.org/Public/3.1-Update</a>,                    

        or<br>                    

        <a href="ftp://www.unicode.org/Public/3.1-Update/">ftp://www.unicode.org/Public/3.1-Update/</a></th>                    

    <tr>                    

      <td></td>                    

      <td></td>                    

      <th colspan="2" align="left">Documentation</th>                    

    </tr>                    

    <tr>                    

      <td><i>N</i></td>                    

      <td></td>                    

      <td></td>                    

      <td><a                    

        href="http://www.unicode.org/Public/3.1-Update/DerivedProperties-3.1.0.html">DerivedProperties-3.1.0.html</a></td>                    

    </tr>                    

    <tr>                    

      <td><i>-</i></td>                    

      <td></td>                    

      <td></td>                    

      <td><a href="http://www.unicode.org/Public/3.0-Update/Index-3.0.0.txt">Index-3.0.0.txt</a></td>                    

    </tr>                    

    <tr>                    

      <td><i>T</i></td>                    

      <td></td>                    

      <td></td>                    

      <td><a                    

        href="http://www.unicode.org/Public/3.1-Update/NamesList-3.1.0.html">NamesList-3.1.0.html</a></td>                    

    </tr>                    

    <tr>                    

      <td><i>N</i></td>                    

      <td></td>                    

      <td></td>                    

      <td><a href="http://www.unicode.org/Public/3.1-Update/PropList-3.1.0.html">PropList-3.1.0.html</a></td>                    

    </tr>                    

    <tr>                    

      <td><i>T</i></td>                    

      <td></td>                    

      <td></td>                    

      <td><a href="http://www.unicode.org/Public/3.1-Update/ReadMe-3.1.0.txt">ReadMe-3.1.0.txt</a></td>                    

    <tr>                    

      <td><i>T</i></td>                    

      <td></td>                    

      <td></td>                    

      <td><a                    

        href="http://www.unicode.org/Public/3.1-Update/UnicodeCharacterDatabase-3.1.0.html">UnicodeCharacterDatabase-3.1.0.html</a></td>                    

    </tr>                    

    <tr>                    

      <td><i>T</i></td>                    

      <td></td>                    

      <td></td>                    

      <td><a                    

        href="http://www.unicode.org/Public/3.1-Update/UnicodeData-3.1.0.html">UnicodeData-3.1.0.html</a></td>                    

    </tr>                    

    <tr>                    

      <td></td>                    

      <td></td>                    

      <th colspan="2" align="left">Core Data</th>                    

    <tr>                    

      <td><i>-</i></td>                    

      <td></td>                    

      <td></td>                    

      <td><a                    

        href="http://www.unicode.org/Public/3.0-Update1/ArabicShaping-3.txt">ArabicShaping-3.txt</a></td>                    

    </tr>                    

    <tr>                    

      <td><i>-</i></td>                    

      <td></td>                    

      <td></td>                    

      <td><a                    

        href="http://www.unicode.org/Public/3.0-Update1/BidiMirroring-1.txt">BidiMirroring-1.txt</a></td>                    

    </tr>                    

    <tr>                    

      <td><i>D</i></td>                    

      <td></td>                    

      <td></td>                    

      <td><a href="http://www.unicode.org/Public/3.1-Update/Blocks-4.txt">Blocks-4.txt</a></td>                    

    </tr>                    

    <tr>                    

      <td><i>D</i></td>                    

      <td></td>                    

      <td></td>                    

      <td><a                    

        href="http://www.unicode.org/Public/3.1-Update/CompositionExclusions-3.txt">CompositionExclusions-3.txt</a></td>                    

    </tr>                    

    <tr>                    

      <td><i>D</i></td>                    

      <td></td>                    

      <td></td>                    

      <td><a                    

        href="http://www.unicode.org/Public/3.1-Update/EastAsianWidth-4.txt">EastAsianWidth-4.txt</a></td>                    

    </tr>                    

    <tr>         

      <td><i>-</i></td>                         

      <td></td>                         

      <td></td>                         

      <td><a href="http://www.unicode.org/Public/3.0-Update1/Jamo-3.txt">Jamo-3.txt</a></td>                       

    </tr>        

    <tr>                   

      <td><i>D</i></td>                   

      <td></td>                   

      <td></td>                   

      <td><a href="http://www.unicode.org/Public/3.1-Update/LineBreak-6.txt">LineBreak-6.txt</a></td>                   

    </tr>                   

    <tr>                   

      <td><i>D</i></td>                   

      <td></td>                   

      <td></td>                   

      <td><a href="http://www.unicode.org/Public/3.1-Update/NamesList-3.1.0.txt">NamesList-3.1.0.txt</a></td>                   

    </tr>                   

    <tr>                   

      <td><i>D</i></td>                   

      <td></td>                   

      <td></td>                   

      <td><a href="http://www.unicode.org/Public/3.1-Update/PropList-3.1.0.txt">PropList-3.1.0.txt</a></td>                   

    </tr>                   

    <tr>                   

      <td><i>D</i></td>                   

      <td></td>                   

      <td></td>                   

      <td><a href="http://www.unicode.org/Public/3.1-Update/Scripts-3.1.0.txt">Scripts-3.1.0.txt</a></td>                   

    </tr>                   

    <tr>                   

      <td><i>D</i></td>                   

      <td></td>                   

      <td></td>                   

      <td><a href="http://www.unicode.org/Public/3.1-Update/SpecialCasing-4.txt">SpecialCasing-4.txt</a></td>                   

    </tr>                   

    <tr>                   

      <td><i>D</i></td>                   

      <td></td>                   

      <td></td>                   

      <td><a                   

        href="http://www.unicode.org/Public/3.1-Update/UnicodeData-3.1.0.txt">UnicodeData-3.1.0.txt</a></td>                   

    </tr>                   

    <tr>                   

      <td><i>D</i></td>                   

      <td></td>                   

      <td></td>                   

      <td><a href="http://www.unicode.org/Public/3.1-Update/Unihan-3.1.txt">Unihan-3.1.txt</a></td>                   

    </tr>                   

    <tr>                   

      <td></td>                   

      <td></td>                   

      <th colspan="2" align="left">Derived Data</th>                   

    </tr>                   

    <tr>                   

      <td><i>D</i></td>                   

      <td></td>                   

      <td></td>                   

      <td><a href="http://www.unicode.org/Public/3.1-Update/CaseFolding-3.txt">CaseFolding-3.txt</a></td>                   

    </tr>                   

    <tr>                   

      <td><i>N</i></td>                   

      <td></td>                   

      <td></td>                   

      <td><a                   

        href="http://www.unicode.org/Public/3.1-Update/DerivedBinaryProperties-3.1.0.txt">DerivedBinaryProperties-3.1.0.txt</a></td>                   

    </tr>                   

    <tr>                   

      <td><i>N</i></td>                   

      <td></td>                   

      <td></td>                   

      <td><a                   

        href="http://www.unicode.org/Public/3.1-Update/DerivedCombiningClass-3.1.0.txt">DerivedCombiningClass-3.1.0.txt</a></td>                   

    </tr>                   

    <tr>                   

      <td><i>N</i></td>                   

      <td></td>                   

      <td></td>                   

      <td><a                   

        href="http://www.unicode.org/Public/3.1-Update/DerivedCoreProperties-3.1.0.txt">DerivedCoreProperties-3.1.0.txt</a></td>                   

    </tr>                   

    <tr>                   

      <td><i>N</i></td>                   

      <td></td>                   

      <td></td>                   

      <td><a                   

        href="http://www.unicode.org/Public/3.1-Update/DerivedDecompositionType-3.1.0.txt">DerivedDecompositionType-3.1.0.txt</a></td>                   

    </tr>                   

    <tr>                   

      <td><i>N</i></td>                   

      <td></td>                   

      <td></td>                   

      <td><a                   

        href="http://www.unicode.org/Public/3.1-Update/DerivedEastAsianWidth-3.1.0.txt">DerivedEastAsianWidth-3.1.0.txt</a></td>                   

    </tr>                   

    <tr>                   

      <td><i>N</i></td>                   

      <td></td>                   

      <td></td>                   

      <td><a                   

        href="http://www.unicode.org/Public/3.1-Update/DerivedGeneralCategory-3.1.0.txt">DerivedGeneralCategory-3.1.0.txt</a></td>                   

    </tr>                   

    <tr>                   

      <td><i>N</i></td>                   

      <td></td>                   

      <td></td>                   

      <td><a                   

        href="http://www.unicode.org/Public/3.1-Update/DerivedJoiningGroup-3.1.0.txt">DerivedJoiningGroup-3.1.0.txt</a></td>                   

    </tr>                   

    <tr>                   

      <td><i>N</i></td>                   

      <td></td>                   

      <td></td>                   

      <td><a                   

        href="http://www.unicode.org/Public/3.1-Update/DerivedJoiningType-3.1.0.txt">DerivedJoiningType-3.1.0.txt</a></td>                   

    </tr>                   

    <tr>                   

      <td><i>N</i></td>                   

      <td></td>                   

      <td></td>                   

      <td><a                   

        href="http://www.unicode.org/Public/3.1-Update/DerivedLineBreak-3.1.0.txt">DerivedLineBreak-3.1.0.txt</a></td>                   

    </tr>                   

    <tr>                   

      <td><i>N</i></td>                   

      <td></td>                   

      <td></td>                   

      <td><a                   

        href="http://www.unicode.org/Public/3.1-Update/DerivedNormalizationProperties-3.1.0.txt">DerivedNormalizationProperties-3.1.0.txt</a></td>                   

    </tr>                   

    <tr>                   

      <td><i>N</i></td>                   

      <td></td>                   

      <td></td>                   

      <td><a                   

        href="http://www.unicode.org/Public/3.1-Update/DerivedNumericType-3.1.0.txt">DerivedNumericType-3.1.0.txt</a></td>                   

    </tr>                   

    <tr>                   

      <td><i>N</i></td>                   

      <td></td>                   

      <td></td>                   

      <td><a                   

        href="http://www.unicode.org/Public/3.1-Update/DerivedNumericValues-3.1.0.txt">DerivedNumericValues-3.1.0.txt</a></td>                   

    </tr>                   

    <tr>                   

      <td></td>                   

      <td></td>                   

      <th colspan="2" align="left">Conformance Test Data</th>                   

    </tr>                   

    <tr>                   

      <td><i>D</i></td>                   

      <td></td>                   

      <td>&nbsp;&nbsp;</td>                   

      <td><a                   

        href="http://www.unicode.org/Public/3.1-Update/NormalizationTest-3.1.0.txt">NormalizationTest-3.1.0.txt</a></td>                   

    </tr>                   

  </table>                   

  <p><b><a name="ChangeStatusKey">Key:</a></b></p>                   

  <table border="1" cellspacing="0" cellpadding="2">                   

    <tr>                   

      <td><i>N</i></td>                   

      <td>New in this release</td>                   

    </tr>                   

    <tr>                   

      <td><i>D</i></td>                   

      <td>Data change (possibly also format/text change)</td>                   

    </tr>                   

    <tr>                   

      <td><i>F</i></td>                   

      <td>Data format change (possibly also text change)</td>                   

    </tr>                   

    <tr>                   

      <td><i>T</i></td>                   

      <td>Text annotation change</td>                   

    </tr>                   

    <tr>                   

      <td><i>-</i></td>                   

      <td>Unchanged</td>                   

    </tr>                   

  </table>                   

</blockquote>                   

<h3>New Character Allocations</h3>                                                                

<p>The primary feature of Unicode 3.1 is the addition of 44,946 new encoded                                                                 

characters. These characters cover several historic scripts, several sets of                                                                 

symbols, and a very large collection of additional CJK ideographs.</p>                                                                

<p>For the first time, characters are encoded beyond the original 16-bit                                                                 

codespace or Basic Multilingual Plane (BMP or Plane 0). These new characters,                                                                 

encoded at code positions of U+10000 or higher, are synchronized with the                                                                 

forthcoming standard ISO/IEC 10646-2. For further information, see <a                                                                

href="#relation">Article IX, Relation to 10646</a>. Unicode 3.1 and 10646-2                                                                 

define three new supplementary planes:</p>                                                                

<ul>                                                                

  <li>Supplementary Multilingual Plane (SMP) U+10000..U+1FFFF</li>                                                                

  <li>Supplementary Ideographic Plane (SIP) U+20000..U+2FFFF</li>                                                                

  <li>Supplementary Special-purpose Plane (SSP) U+E0000..U+EFFFF</li>                                                                

</ul>                                                                

<p>The Supplementary Multilingual Plane, or Plane 1, contains several historic                                                                 

scripts, and several sets of symbols: Old Italic, Gothic, Deseret, Byzantine                                                                 

Musical Symbols, (Western) Musical Symbols, and Mathematical Alphanumeric                                                                 

Symbols. Together these comprise 1594 newly encoded characters.</p>                                                                

<p>The Supplementary Ideographic Plane, or Plane 2, contains a very large                                                                 

collection of additional unified Han ideographs known as Vertical Extension B,                                                                 

comprising 42,711 characters, as well as 542 additional CJK Compatibility                                                                 

ideographs.</p>                                                                

<p>The Supplementary Special-purpose Plane, or Plane 14, contains a set of tag                                                                 

characters, 97 in all.</p>                                                                

<p>Complete introductions to the newly encoded scripts, symbols, and new                                                                 

additions to Han ideographs can be found in <a href="#block">Article V, Block                                                                 

Descriptions</a>, below.</p>                                                                

<p>In addition, Unicode 3.1 adds two mathematical symbols in the BMP:</p>                                                                

<p>U+03F4 GREEK CAPITAL THETA SYMBOL<br>                                                                

U+03F5 GREEK LUNATE EPSILON SYMBOL</p>                                                                

<p>These two characters are not part of ISO/IEC 10646-2, but are among the                                                                 

additions in the forthcoming Amendment 1 to ISO/IEC 10646-1:2000. They are                                                                 

included in Unicode 3.1 so that decompositions for the Mathematical Alphanumeric                                                                 

Symbols can be internally consistent.</p>                                                                

<p>Counting the additions to the three supplementary planes and the two                                                                 

characters on the BMP, Unicode 3.1 adds 44,946 new encoded characters. Together                                                                 

with the 49,194 already existing characters in Unicode 3.0, that comes to a                                                                 

grand total of 94,140 encoded characters in Unicode 3.1.                                                                

<p>Of those 94,140 characters, 70,207 are unified Han ideographs, and an                                                                 

additional 832 are CJK Compatibility ideographs -- slightly more than 75% of the                                                                 

encoded characters in the standard.</p>                                                                

<p>In addition, 32 more code points have been allocated as noncharacters. For                                                                 

more information, see <a href="#conformance">Article III, Conformance</a>.</p>                                                                

<p>See <a href="#charts">Article VI, Code Charts</a>, for links to online charts                                                                 

of the new characters for Unicode 3.1.</p>                                                                

<h3>Additional Features of Unicode 3.1</h3>                                                                

<p>Unicode 3.1 also features amended contributory data files, to bring the data                                                                 

files up to date against the much expanded repertoire of characters. A summary                                                                 

of the new data files and changes to old data files can be found in <a                                                                

href="#database">Article VIII, Unicode Character Database Changes</a>. A                                                                 

complete specification of the contributory data files constituting the Unicode                                                                 

Standard, Version 3.1 can be found in <a                                                                

href="../../standard/versions/enumeratedversions.html">Enumerated Versions</a>.</p>                                                                

<p>All errata and corrigenda to Unicode 3.0 and Unicode 3.0.1 are included in                                                                 

this specification. Major corrigenda and other changes having a bearing on                                                                 

conformance to the standard are listed in <a href="#conformance">Article III,                                                                 

Conformance</a>. Other minor errata are listed in <a href="#errata">Article VII,                                                                 

Errata</a>.</p>                                                                

<p>Most notable among the corrigenda to the standard is a tightening of the                                                                 

definition of UTF-8, to eliminate a possible security issue with                                                                 

non-shortest-form UTF-8.</p>                                                                

<h3>Conventions Used in this Document</h3>                                                                

<p>The sections of this document are referred to as &quot;articles&quot; to                                                                 

prevent confusion with references to sections of <i>The Unicode Standard,                                                                 

Version 3.0</i>. In addition, the articles in this document are numbered with                                                                 

Roman numerals, to highlight the distinction. The word &quot;section&quot;                                                                 

always refers to sections of <i>The Unicode Standard, Version 3.0</i>. Page                                                                 

numbers also refer to <i><a href="../../uni2book/u2ord.html">The Unicode                                                                 

Standard, Version 3.0</a></i>.</p>                                                                

<p>New or replacement text for the standard is indicated with <u>underlined</u>                                                                 

text, when this new text is a corrigendum of an existing section of the                                                                 

standard.</p>                                                                

<p>Deleted text from the standard is indicated with <strike>struck-through</strike>                                                                 

text.</p>                                                                

<p>In instances where entire new sections or subsections are to be added to the                                                                 

standard, as for the block descriptions for newly encoded scripts or symbol                                                                 

sets, new section numbers are provided that interleave reasonably with the                                                                 

existing sections of the published Unicode 3.0 book. And for these added                                                                 

sections, the text is not underlined, since the entire sections are new.</p>                                                                

<p>In this document, unambiguous dates of the current common era, such as 1999,                                                                 

are unlabeled. In cases of ambiguity, CE is used. Dates before the common era                                                                 

are labeled with BCE.</p>                                                                

<p>Some of the characters in Article 5, Block Descriptions, are Greek and may                                                                 

not be displayed by all browsers. For assistance, see <a                                                                

href="../../../help/display_problems.html">Display Problems</a>.</p>                                                                

<h2 class="bb"><a name="notation">II Notation</a>al Changes for the Standard</h2>                                                                

<p><b>Section 0.2 Notational Conventions,</b> page <i>xxviii:</i> change the                                                                 

description of the U+ notation to read:</p>                                                                

<blockquote>                                                                

  <p><u>In running text, an individual Unicode code point can be expressed as U+<i>n</i>,                                                                 

  where <i>n</i> is from four to six hexadecimal digits, using the digits 0-9                                                                 

  and A-F (for 10 through 15, respectively). There should be no leading zeros,                                                                 

  unless the codepoint would have fewer than four hexadecimal digits; for                                                                 

  example, U+0001, U+0012, U+0123, U+1234, U+12345, U+102345.</u></p>                                                                

</blockquote>                                                                

<p><b>Section 0.2 Notational Conventions</b>, page <i>xxviii</i>: replace the                                                                 

paragraph starting &quot;A sequence of characters&quot; with the following text:</p>                                                                

<blockquote>                                                                

  <p><u>A sequence of two or more code points may be represented by a comma-delimited list,                                                                 

  set off by angle brackets. For this purpose angle brackets consist of U+003C                                

  LESS-THAN SIGN and U+003E GREATER-THAN SIGN. Spaces are optional after the                                

  comma, and U+ notation for the code point is also optional. A sequence                                

  identified with this notation is called a Unicode Sequence Identifier (USI).</u></p>                                                               

  <p><u>When the usage is clear from the context, a sequence of characters may                                

  also be represented with generic short names, for example as in &quot;&lt;a,                                

  grave&gt;&quot;, or the angle brackets may be omitted.</u></p>                                                               

  <p><u>In contrast to sequences of code points, a sequence of one or more code <i>                                                                

  units</i> may be represented by a list set off by angle brackets, but without                                                                 

  comma delimitation or U+ notation. For example, the notation &quot;&lt;nn nn nn nn&gt;&quot;                                                                 

  represents a sequence of bytes, as for the UTF-8 encoding form of a Unicode                                                                 

  character. The notation &quot;&lt;nnnn nnnn&gt;&quot; represents a sequence of                                                                 

  16-bit code units, as for the UTF-16 encoding form of a Unicode character. In                                                                 

  the text, the angle brackets are occasionally omitted from this notation when                                                                 

  the usage is clear in context.</u></p>                                                                

  <p><u>In other environments, such as programming languages or mark-up,                                                                 

  alternative notation for sequences of code points or code units may be used.</u></p>                                                                

</blockquote>                                                                

<h2 class="bb"><a name="conformance">III Conformance</a></h2>                                                                

<h3>0.1 About the Unicode Standard (revision)</h3>                                                                

<p>On page <i>xxvii</i>, in&nbsp; the section, &quot;The Unicode Character                                                                              

Database and Technical Reports,&quot; the paragraph beginning, &quot;The                                                                              

following Unicode Technical Reports...&quot; is updated to read as follows:</p>                                                                             

<blockquote>                                                                             

  <p>The following Unicode <strike>Technical Reports </strike>Standard Annexes                                                                              

  are formally part of this standard:</p>                                                                             

  <ul>                                                                             

    <li><u>UAX #9: The Bidirectional Algorithm, Version 3.1.0</u></li>                                                                            

    <li><strike>UTR</strike> <u>UAX</u> #11: East Asian Width, Version <strike>5.0</strike>                                                                              

      <u>3.1.0</u></li>                                                                            

    <li><strike>UTR</strike> <u>UAX</u> #13: Unicode Newline Guidelines, Version                                                                              

      <strike>5.0</strike> <u>3.1.0</u></li>                                                                            

    <li><strike>UTR</strike> <u>UAX</u> #14: Line Breaking Properties, Version <strike>6.0</strike>                                                                              

      <u>3.1.0</u></li>                                                                            

    <li><strike>UTR</strike> <u>UAX</u> #15: Unicode Normalization Forms,                                                                              

      Version <strike>18.0</strike> <u>3.1.0</u></li>                                                                            

    <li><u>UAX #19: UTF-32, Version 3.1.0</u></li>                                                                            

  </ul>                                                                            

</blockquote>                                                                            

<h3>3.1 Conformance Requirements (revision)</h3>                                                                            

<p>There are three major changes to the conformance clauses of the Unicode                                                                             

Standard for Version 3.1. The first of these is the addition of new                                                                             

noncharacters and a clarification regarding noncharacter status. The second is a                                                                             

major corrigendum to the definition of UTF-8 to address security issues. The                                                                             

third change is that UTF-32 is now part of the standard. There are additional                                                                             

normative changes in Unicode 3.1 that have implications for conformance. These                                                                             

are described in <a href="#database">Article VIII, Unicode Character Database                                                                             

Changes</a>, and in <a href="#layout">Section 13.2 Layout Controls</a> of                                                                             

Article V, Block Descriptions.</p>                                                                            

<h3>Stability of the Standard</h3>                                                                            

<p>In <i>Section 3.1, Conformance Requirements</i> on page 37, add the following                                                                             

paragraph immediately after the first paragraph and before the subsection,                                                                             

&quot;Byte Ordering&quot;:</p>                                                                            

<blockquote>                                                                            

  <p><u>Each version of the Unicode Standard, once published, is absolutely                                                                             

  stable and will <i>never</i> change. Implementations or specifications that                                                                             

  refer to a specific version of the Unicode Standard can rely upon this                                                                             

  stability. If future versions of these implementations or specifications                                                                             

  upgrade to a future version of the Unicode Standard, then some changes may be                                                                             

  necessary.</u></p>                                                                            

</blockquote>                                                                            

<h3>Interpretation of Unicode Code Units</h3>                                                                            

<p>To clarify the interpretation of Unicode code units in the context of the                                                                             

transformation formats, conformance clause C1 has been reworded:</p>                                                                            

<blockquote>                                                                            

  <table border="0" cellspacing="10" cellpadding="0">                                                                            

    <tr>                                                                            

      <td valign="top">C1</td>                                                                            

      <td valign="top">&nbsp;A process shall interpret the Unicode code <strike>values                                                                              

        as 16-bit quantities</strike> <u>units in accordance with the Unicode                                                                              

        Transformation Format used</u>.</td>                                                                             

    </tr>                                                                             

  </table>                                                                             

  <ul>                                                                             

    <li><strike>Unicode values can be stored in native 16-bit machine words.</strike></li>                                                                             

    <li><u>The Unicode Standard defines code points (scalar values) that can be                                                                              

      encoded in any of three transformation formats (encoding forms): UTF-8,                                                                              

      UTF-16, or UTF-32.</u></li>                                                                             

    <li>For information on the use of wchar_t or other programming language                                                                              

      types to represent Unicode <strike>values</strike> <u>code units</u>, see <i>Section                                                                              

      5.2, ANSI/ISO C wchar_t</i>.</li>                                                                             

  </ul>                                                                             

</blockquote>                                                                             

<h3>Noncharacters</h3>                                                                             

<p>There are 34 specific code points in Unicode 3.0 that are characterized as <i>noncharacters</i>.                                                                              

Unicode 3.1 adds an additional 32 noncharacters. To clarify the status of all                                                                              

66, a definition (page 41) is added, and conformance rules C5 and C10 (pages 38,                                                                              

39) are amended as follows:</p>                                                                             

<blockquote>                                                                             

  <table border="0" cellspacing="10" cellpadding="0">                                                                             

    <tr>                                                                             

      <td valign="top"><u>D7b</u></td>                                                                             

      <td valign="top"><u><i>Noncharacter:</i> a code point that is permanently                                                                              

        reserved for internal use, and that should never be interchanged. In                                                                              

        Unicode 3.1, these consist of the values U+<i>n</i>FFFE and U+<i>n</i>FFFF                                                                              

        (where <i>n</i> is from 0 to 10<sub>16</sub>) and the values                                                                              

        U+FDD0..U+FDEF.</u></td>                                                                             

    </tr>                                                                             

  </table>                                                                             

  <ul>                                                                             

    <li><u>For more information, see the discussions under &quot;Special                                                                              

      Noncharacter Values&quot; in <i>Section 2.7, Special Character and                                                                              

      Noncharacter Values, </i>and under &quot;Noncharacters&quot; in <i>Section                                                                              

      13.6, Specials</i>.</u></li>                                                                             

    <li><u>These code points are permanently reserved as noncharacters. In the                                                                              

      future, it is possible that additional code points may be specified to                                                                              

      represent noncharacters.</u></li>                                                                             

  </ul>                                                                             

  <table border="0" cellspacing="10" cellpadding="0">                                                                             

    <tr>                                                                             

      <td valign="top">C5</td>                                                                             

      <td valign="top">A process shall not interpret <strike>either U+FFFE or                                                                              

        U+FFFF</strike> <u>a <i>noncharacter</i> code point</u> as an abstract                                                                              

        character.</td>                                                                             

    </tr>                                                                             

  </table>                                                                             

  <ul>                                                                             

    <li><u>The code points may be used internally, such as for sentinel values                                                                              

      or delimiters, but should not be exchanged publicly.</u></li>                                                                             

  </ul>                                                                             

  <table border="0" cellspacing="10" cellpadding="0">                                                                             

    <tr>                                                                             

      <td valign="top">C10</td>                                                                             

      <td valign="top">A process shall make no change in a valid coded character                                                                              

        representation other than the possible replacement of character                                                                              

        sequences by their canonical-equivalent sequences<b> </b><u>or the                                                                              

        deletion of <i>noncharacter</i> code points</u>, if that process                                                                              

        purports not to modify the interpretation of that coded character                                                                              

        sequence.</td>                                                                             

    </tr>                                                                             

  </table>                                                                             

  <ul>                                                                             

    <li><u>If a noncharacter which does not have a specific internal use is                                                                              

      unexpectedly encountered in processing, an implementation may signal an                                                                              

      error or delete or ignore the noncharacter. If these options are not                                                                              

      taken, the noncharacter should be treated as an unassigned code point. For                                                                              

      example, an API that returned a character property value for a                                                                              

      noncharacter would return the same value as the default value for an                                                                              

      unassigned code point.</u></li>                                                                             

  </ul>                                                                             

</blockquote>                                                                             

<h3>UTF-8 Corrigendum</h3>                                                                             

<p>The current conformance clause C12 in <a                                                                             

href="http://www.unicode.org/unicode/uni2book/u2.html"><i>The Unicode Standard,                                                                              

Version 3.0</i></a> forbids the <i>generation</i> of &quot;non-shortest                                                                              

form&quot; UTF-8, and forbids the <i>interpretation</i> of illegal sequences,                                                                              

but not the interpretation of &quot;non-shortest form&quot;. Where software does                                                                              

interpret the non-shortest forms, security issues can arise. For example:                                                                             

<ul>                                                                             

  <li>Process <i>A</i> performs security checks, but does not check for                                                                              

    non-shortest forms.</li>                                                                             

  <li>Process <i>B</i> accepts the byte sequence from process <i>A</i>, and                                                                              

    transforms it into UTF-16 while interpreting non-shortest forms.</li>                                                                             

  <li>The UTF-16 text may then contain characters that should have been filtered                                                                              

    out by process <i>A</i>.</li>                                                                             

</ul>                                                                             

<p>To address this issue, the Unicode Technical Committee has modified the                                                                              

definition of UTF-8 to forbid conformant implementations from interpreting                                                                              

non-shortest forms for <a href="http://www.unicode.org/glossary/#BMP_character">BMP                                                                              

characters</a>, and clarified some of the conformance clauses.                                                                             

<p><i>These modifications make use of updated notation: see the <a                                                                             

href="http://www.unicode.org/glossary">Glossary</a> for any unfamiliar terms.</i></p>                                                                             

<p><i><b>Change C12 to the following:</b></i>                                                                             

<table border="0" cellspacing="6" cellpadding="0">                                                                             

  <caption>&nbsp;</caption>                                                                             

  <tr>                                                                             

    <td align="CENTER" valign="TOP">C12</td>                                                                             

    <td align="LEFT" valign="TOP"><u>(a)</u> When a process generates data in a                                                                              

      Unicode Transformation Format, it shall not emit ill-formed <strike>byte</strike>                                                                              

      <u>code unit</u> sequences.<br>                                                                             

      <u>(b)</u> When a process interprets data in a Unicode Transformation                                                                              

      Format, it shall treat illegal <strike>byte</strike> <u>code unit</u>                                                                              

      sequences as an error condition.<br>                                                                             

      <u>(c) A conformant process shall not interpret illegal UTF code unit                                                                              

      sequences as characters.<br>                                                                             

      (d) Irregular UTF code unit sequences shall not be used for encoding any                                                                              

      other information.</u></td>                                                                             

  </tr>                                                                             

</table>                                                                             

<p><i><b>Add the following notes after C12:</b></i>                                                                             

<ul>                                                                             

  <li><u>The definition of each UTF specifies the illegal code unit sequences in                                                                              

    that UTF. For example, the definition of UTF-8 (D36) specifies that code                                                                              

    unit sequences such as &lt;C0 AF&gt; are illegal.</u></li>                                                                             

  <li><u>Internally, a particular function might be used that does not check for                                                                              

    illegal code unit sequences. However, a conformant process can use that                                                                              

    function <b>only</b> on data that has already been certified to not contain                                                                              

    any illegal code unit sequences.</u></li>                                                                             

  <li><u>Processes that require unique representation must not interpret                                                                              

    irregular UTF code unit sequences as characters. They may, for example,                                                                              

    reject or remove those sequences.</u></li>                                                                             

  <li><u>Processes may transform irregular code unit sequences into the                                                                              

    equivalent well-formed code unit sequences.</u></li>                                                                             

  <li><u>Conformant processes cannot interpret illegal code unit sequences.                                                                              

    However, the conformance clauses do not, for example, prevent utility                                                                              

    programs from operating on &quot;mangled&quot; text. For example, a UTF-8                                                                              

    file could have had CRLF sequences introduced at every 80 bytes by a bad                                                                              

    mailer program. This could result in some UTF-8 byte sequences being                                                                              

    interrupted by CRLFs, producing illegal byte sequences. This mangled text is                                                                              

    no longer UTF-8. It is permissible for a conformant program to repair such                                                                              

    text, recognizing that the mangled text was originally well-formed UTF-8                                                                              

    byte sequences. However, such repair of mangled data is a special case, and                                                                              

    must not be used in circumstances where it would cause security problems.</u></li>                                                                             

</ul>                                                                             

<i><b>Delete the second sentence in the note under D32:</b></i>                                                                             

<blockquote>                                                                             

  <p><strike>For example, UTF-8 allows nonshortest code value sequences to be                                                                              

  interpreted: a UTF-8 conformant process may map the code value sequence C0 80                                                                              

  (11000000<sub>2</sub> 10000000<sub>2</sub>) to the Unicode value U+0000, even                                                                              

  though a UTF-8 conformant process shall <i>never</i> generate that code value                                                                              

  sequence -- it shall generate the sequence 00 (00000000<sub>2</sub>) instead.</strike>                                                                             

</blockquote>                                                                             

<p><b><i>Modify D36 as follows, and add a note:</i><br>                                                                             

</b>&nbsp;                                                                             

<table border="0" cellspacing="6" cellpadding="0">                                                                             

  <tr>                                                                             

    <td align="CENTER" valign="TOP">D36</td>                                                                             

    <td align="LEFT" valign="TOP"><u>(a)</u> UTF-8 is the Unicode Transformation                                                                              

      Format that serializes a Unicode code point as a sequence of one to four                                                                              

      bytes, as specified in <i>Table 3.1, UTF-8 Bit Distribution.</i><br>                                                                             

      <u>(b) An illegal UTF-8 code unit sequence is any byte sequence that does                                                                              

      not match the patterns listed in <i>Table 3.1B, Legal UTF-8 Byte Sequences</i>.<i><br>                                                                             

      </i>(c) An irregular UTF-8 code unit sequence is a six-byte sequence where                                                                              

      the first three bytes correspond to a high surrogate, and the next three                                                                              

      bytes correspond to a low surrogate. As a consequence of C12, these                                                                              

      irregular UTF-8 sequences shall not be generated by a conformant process.</u></td>                                                                             

  </tr>                                                                             

</table>                                                                             

<ul>                                                                             

  <li>In UTF-8, &lt;004D, 0061, 0072, 006B&gt; is serialized as &lt;4D 61 72                                                                              

    6B&gt;.</li>                                                                             

  <li><u>The problematic &quot;non-shortest form&quot; byte sequences in UTF-8                                                                              

    were those where BMP characters could be represented in more than one way.                                                                              

    These sequences are illegal, since they are not allowed by Table 3.1B.</u></li>                                                                             

</ul>                                                                             

<p><i><b>Retain the paragraph and table immediately below D36, but replace the                                                                              

last sentence in the paragraph.</b></i></p>                                                                             

<blockquote>                                                                             

  <p>Table 3.1 specifies the bit distribution from a Unicode character (or                                                                              

  surrogate pair) into the one- to four-byte values of the corresponding UTF-8                                                                              

  sequence. Note that the four-byte form for surrogate pairs involves an                                                                              

  addition of 10000<sub>16</sub>, to account for the starting offset to the                                                                              

  encoded values referenced by surrogates. <u>For a discussion of the difference                                                                              

  in the formulation of UTF-8 in ISO/IEC 10646, see Section C.3, UCS                                                                              

  Transformation Formats.</u><strike> The definition of UTF-8 in Annex D of ISO/IEC                                                                              

  10646-1:2000 also allows for the use of five- and six-byte sequences to encode                                                                              

  characters that are outside the range of the Unicode character set; those                                                                              

  five- and six-byte sequences are illegal for the use of UTF-8 as a                                                                              

  transformation of Unicode characters.</strike></p>                                                                             

  <div align="center">                                                                             

    <center>                                                                             

    <table border="1" cellspacing="0" cellpadding="2">                                                                             

      <caption><b><font size="4">Table 3.1. UTF-8 Bit Distribution</font></b></caption>                                                                             

      <tr>                                                                             

        <th valign="top" style="background-color: #990000"><font color="#FFFFFF">Scalar                                                                              

          Value</font></th>                                                                             

        <th valign="top" style="background-color: #990000"><font color="#FFFFFF">UTF-16</font></th>                                                                             

        <th valign="top" style="background-color: #990000"><font color="#FFFFFF">1st                                                                              

          Byte</font></th>                                                                             

        <th valign="top" style="background-color: #990000"><font color="#FFFFFF">2nd                                                                              

          Byte</font></th>                                                                             

        <th valign="top" style="background-color: #990000"><font color="#FFFFFF">3rd                                                                              

          Byte</font></th>                                                                             

        <th valign="top" style="background-color: #990000"><font color="#FFFFFF">4th                                                                              

          Byte</font></th>                                                                             

      </tr>                                                                             

      <tr>                                                                             

        <td valign="top"><code><font size="2">00000000 0xxxxxxx</font></code></td>                                                                             

        <td valign="top"><code><font size="2">00000000 0xxxxxxx</font></code></td>                                                                             

        <td valign="top"><code><font size="2">0xxxxxxx</font></code></td>                                                                             

        <td valign="top"><font size="2">&nbsp;</font></td>                                                                             

        <td valign="top">&nbsp;</td>                                                                             

        <td valign="top">&nbsp;</td>                                                                             

      </tr>                                                                             

      <tr>                                                                             

        <td valign="top"><code><font size="2">00000yyy yyxxxxxx</font></code></td>                                                                             

        <td valign="top"><code><font size="2">00000yyy yyxxxxxx</font></code></td>                                                                             

        <td valign="top"><code><font size="2">110yyyyy</font></code></td>                                                                             

        <td valign="top"><code><font size="2">10xxxxxx</font></code></td>                                                                             

        <td valign="top"><font size="2">&nbsp;</font></td>                                                                             

        <td valign="top">&nbsp;</td>                                                                             

      </tr>                                                                             

      <tr>                                                                             

        <td valign="top"><code><font size="2">zzzzyyyy yyxxxxxx</font></code></td>                                                                             

        <td valign="top"><code><font size="2">zzzzyyyy yyxxxxxx</font></code></td>                                                                             

        <td valign="top"><code><font size="2">1110zzzz</font></code></td>                                                                             

        <td valign="top"><code><font size="2">10yyyyyy</font></code></td>                                                                             

        <td valign="top"><code><font size="2">10xxxxxx</font></code></td>                                                                             

        <td valign="top"><font size="2">&nbsp;</font></td>                                                                             

      </tr>                                                                             

      <tr>                                                                             

        <td valign="top"><code><font size="2">000uuuuu zzzzyyyy<br>                                                                             

          yyxxxxxx</font></code></td>                                                                             

        <td valign="top"><code><font size="2">110110ww wwzzzzyy<br>                                                                             

          110111yy yyxxxxxx&nbsp;</font></code></td>                                                                             

        <td valign="top"><code><font size="2">11110uuu</font></code></td>                                                                             

        <td valign="top"><code><font size="2">10uuzzzz</font></code></td>                                                                             

        <td valign="top"><code><font size="2">10yyyyyy</font></code></td>                                                                             

        <td valign="top"><code><font size="2">10xxxxxx</font></code></td>                                                                             

      </tr>                                                                             

    </table>                                                                             

    </center>                                                                             

  </div>                                                                             

  <ul>                                                                             

    <li><font size="2">Where uuuuu = wwww + 1 (to account for addition of 10000<sub>16</sub>                                                                              

      as in <i>Section 3.7, Surrogates).</i></font></li>                                                                             

  </ul>                                                                             

</blockquote>                                                                             

<p><i><b>Delete the two text paragraphs after Table 3.1. (The relevant portions                                                                              

have been elevated into definitions or conformance clauses.)</b></i></p>                                                                             

<blockquote>                                                                             

  <p><strike>When converting a Unicode scalar value to UTF-8, the shortest form                                                                              

  that can represent those values shall be used. This practice preserves                                                                              

  uniqueness of encoding. For example, the Unicode binary value                                                                              

  &lt;0000000000000001&gt; is encoded as &lt;00000001&gt;, not as &lt;11000000                                                                              

  10000001&gt;. The latter is an example of an irregular UTF-8 byte sequence.                                                                              

  Irregular UTF-8 sequences shall not be used for encoding any other                                                                              

  information.</strike>                                                                             

  <p><strike>When converting from UTF-8 to a Unicode scalar value,                                                                              

  implementations do not need to check that the shortest encoding is being used.                                                                              

  This simplifies the conversion algorithm.</strike>                                                                             

</blockquote>                                                                             

<p><b><i>Replace them by the following table and text:</i><br>                                                                             

</b>&nbsp;<center>                                                                             

<blockquote>                                                                             

  <table border="1" cellspacing="0" cellpadding="4" cols="5">                                                                             

    <caption><b><font size="4">Table 3.1B. Legal UTF-8 Byte Sequences</font></b></caption>                                                                             

    <tr>                                                                             

      <th bgcolor="#CCCCCC" style="background-color: #990000" width="10%"><font                                                                             

        color="#FFFFFF">&nbsp;Code Points</font></th>                                                                             

      <th width="10%" style="background-color: #990000"><font color="#FFFFFF">1st                                                                              

        Byte</font></th>                                                                             

      <th width="10%" style="background-color: #990000"><font color="#FFFFFF">2nd                                                                              

        Byte</font></th>                                                                             

      <th width="10%" style="background-color: #990000"><font color="#FFFFFF">3rd                                                                              

        Byte</font></th>                                                                             

      <th width="10%" style="background-color: #990000"><font color="#FFFFFF">4th                                                                              

        Byte</font></th>                                                                             

    </tr>                                                                             

    <tr>                                                                             

      <th style="background-color: #990000" width="10%"><tt><font                                                                             

        color="#FFFFFF">U+0000..U+007F</font></tt></th>                                                                             

      <td width="10%"><tt>00..7F</tt></td>                                                                             

      <td width="10%"><tt>&nbsp;</tt></td>                                                                             

      <td width="10%"><tt>&nbsp;</tt></td>                                                                             

      <td width="10%"><tt>&nbsp;</tt></td>                                                                             

    </tr>                                                                             

    <tr>                                                                             

      <th style="background-color: #990000" width="10%"><tt><font                                                                             

        color="#FFFFFF">U+0080..U+07FF</font></tt></th>                                                                             

      <td width="10%"><tt>C2..DF</tt></td>                                                                             

      <td width="10%"><tt>80..BF&nbsp;</tt></td>                                                                             

      <td width="10%"><tt>&nbsp;</tt></td>                                                                             

      <td width="10%"><tt>&nbsp;</tt></td>                                                                             

    </tr>                                                                             

    <tr>                                                                             

      <th style="background-color: #990000" width="10%"><tt><font                                                                             

        color="#FFFFFF">U+0800..U+0FFF</font></tt></th>                                                                             

      <td width="10%"><tt>E0</tt></td>                                                                             

      <td width="10%"><tt><u>A0</u>..BF</tt></td>                                                                             

      <td width="10%"><tt>80..BF&nbsp;</tt></td>                                                                             

      <td width="10%"><tt>&nbsp;</tt></td>                                                                             

    </tr>                                                                             

    <tr>                                                                             

      <th style="background-color: #990000" width="10%"><tt><font                                                                             

        color="#FFFFFF">U+1000..U+FFFF</font></tt></th>                                                                             

      <td width="10%"><tt>E1..EF</tt></td>                                                                             

      <td width="10%"><tt>80..BF</tt></td>                                                                             

      <td width="10%"><tt>80..BF&nbsp;</tt></td>                                                                             

      <td width="10%"><tt>&nbsp;</tt></td>                                                                             

    </tr>                                                                             

    <tr>                                                                             

      <th style="background-color: #990000" width="10%"><tt><font                                                                             

        color="#FFFFFF">U+10000..U+3FFFF</font></tt></th>                                                                             

      <td width="10%"><tt>F0</tt></td>                                                                             

      <td width="10%"><tt><u>90</u>..BF</tt></td>                                                                             

      <td width="10%"><tt>80..BF</tt></td>                                                                             

      <td width="10%"><tt>80..BF</tt></td>                                                                             

    </tr>                                                                             

    <tr>                                                                             

      <th style="background-color: #990000" width="10%"><tt><font                                                                             

        color="#FFFFFF">U+40000..U+FFFFF</font></tt></th>                                                                             

      <td width="10%"><tt>F1..F3</tt></td>                                                                             

      <td width="10%"><tt>80..BF</tt></td>                                                                             

      <td width="10%"><tt>80..BF</tt></td>                                                                             

      <td width="10%"><tt>80..BF</tt></td>                                                                             

    </tr>                                                                             

    <tr>                                                                             

      <th style="background-color: #990000" width="10%"><tt><font                                                                             

        color="#FFFFFF">U+100000..U+10FFFF</font></tt></th>                                                                             

      <td width="10%"><tt>F4</tt></td>                                                                             

      <td width="10%"><tt>80..<u>8F</u></tt></td>                                                                             

      <td width="10%"><tt>80..BF&nbsp;</tt></td>                                                                             

      <td width="10%"><tt>80..BF</tt></td>                                                                             

    </tr>                                                                             

  </table>                                                                             

</center>                                                                             

<p><u>Table 3.1B. lists all of the byte sequences that are legal in UTF-8. A                                                                              

range of byte values such as A0..BF indicates that any byte from A0 to BF                                                                              

(inclusive) is legal in that position. Any byte value outside of the ranges                                                                              

listed is illegal. For example, the byte sequence &lt;C0 AF&gt; is <i>illegal</i>                                                                              

since C0 is not legal in the 1st Byte column. The byte sequence &lt;E0 9F 80&gt;                                                                              

is <i>illegal</i> since in the row where E0 is legal as a first byte, 9F is not                                                                              

legal as a second byte. The byte sequence &lt;F4 80 83 92&gt; is <i>legal</i>,                                                                              

since every byte in that sequence matches a byte range in a row of the table                                                                              

(the last row).</u>                                                                             

</blockquote>                                                                             

<ul>                                                                             

  <li><u>Cases where a trailing byte range is not 80..BF are underlined in the                                                                              

    table to draw attention to them. These occur only in the second byte of a                                                                              

    sequence.</u></li>                                                                             

</ul>                                                                             

<p><i><b>Add to Appendix C: Relationship to ISO/IEC 10646, Section C.3: UCS                                                                              

Transformation Formats, at the end of the subsection UTF-8:</b></i></p>                                                                             

<blockquote>                                                                             

  <p><br>                                                                             

  <u>The definition of UTF-8 in Annex D of ISO/IEC 10646-1:2000 also allows for                                                                              

  the use of five- and six-byte sequences to encode characters that are outside                                                                              

  the range of the Unicode character set; those five- and six-byte sequences are                                                                              

  illegal for the use of UTF-8 as a transformation of Unicode characters. ISO/IEC                                                                              

  10646 does not allow mapping of unpaired surrogates, nor U+FFFE and U+FFFF                                                                              

  (but it <i>does</i> allow other <a                                                                             

  href="http://www.unicode.org/glossary/#noncharacter">noncharacters</a>).</u></p>                                                                             

</blockquote>                                                                             

<h3>Status of UTF-32</h3>                                                                             

<p>Unicode Technical Report #19, UTF-32, has been elevated to the status of a                                                                              

Unicode Standard Annex, making UTF-32 officially a part of the Unicode Standard.                                                                              

UAX #19 adds specific definition clauses to <i>Section 3.8, Transformations</i>,                                                                              

of <i>The Unicode Standard, Version 3.0</i>. See <a href="../tr19/">UAX #19</a>                                                                              

for the exact definitions of UTF-32 as well as a discussion of the relation of                                                                              

UTF-32 to ISO/IEC 10646 and UCS-4.</p>                                                                             

<p>With the addition of UTF-32, the Unicode Standard now has three sanctioned                                                                              

encoding forms: UTF-8, UTF-16, and UTF-32. These are the 8-bit, 16-bit, and                                                                              

32-bit forms, respectively, for representing the Unicode scalar values in                                                                              

particular implementations of the standard.</p>                                                                             

<p>Considerations of byte-order serialization lead to a further subdivision of                                                                              

the encoding forms into 5 sanctioned encoding schemes for the Unicode Standard:                                                                              

UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, and UTF-32LE.</p>                                                                             

<p>Because UTF-32 is a fixed-width, 32-bit encoding form, the numerical value of                                                                              

a Unicode character in UTF-32 is always precisely identical to the Unicode                                                                              

scalar value.</p>                                                                             

<p>The encoding scheme UTF-32BE (UTF-32 serialized as bytes in most significant                                                                              

byte first order) is structurally the same as UCS-4, as defined in ISO/IEC                                                                              

10646-1:2000.</p>                                                                             

<p>See also <a href="../tr17/">Unicode Technical Report #17, Character Encoding                                                                              

Model</a>, for a discussion of the general framework for understanding the                                                                              

Unicode character encoding and its relationship to the Unicode Transformation                                                                              

Formats.</p>                                                                             

<h3>3.9 Special Character Properties (revision)</h3>                                                                             

<p>Add the following entry to the end of the special character properties                                                                              

listing, on page 50:</p>                                                                             

<ul>                                                                             

  <li>Musical format control</li>                                                                             

</ul>                                                                             

<blockquote>                                                                             

  1D173 MUSICAL SYMBOL BEGIN BEAM<br>                                                                             

  1D174 MUSICAL SYMBOL END BEAM<br>                                                                             

  1D175 MUSICAL SYMBOL BEGIN TIE<br>                                                                             

  1D176 MUSICAL SYMBOL END TIE<br>                                                                             

  1D177 MUSICAL SYMBOL BEGIN SLUR<br>                                                                             

  1D178 MUSICAL SYMBOL END SLUR<br>                                                                             

  1D179 MUSICAL SYMBOL BEGIN PHRASE<br>                                                                             

  1D17A MUSICAL SYMBOL END PHRASE                                                                             

</blockquote>                                                                             

<h3>Chapter 4, Character Properties (revision)</h3>                                                                             

<p>All of the General Category values plus the case mappings in UnicodeData.txt                                                                              

and SpecialCasing.txt are now normative. The case mapping row from <i>Table 4-2,                                                                              

Informative Character Properties</i>, page 74 is moved to <i>Table 4-1,                                                                              

Normative Character Properties</i>. The word &quot;informative&quot; is struck                                                                              

from <i>Table 4-5, General Category</i>, page 88. The header of <i>Section 4.5,                                                                              

General Category--Normative in Part, </i>page 87 is changed to <i>Section 4.5,                                                                              

General Category--Normative.</i> The other textual changes in Chapter 4                                                                              

resulting from this change in status are not detailed here.</p>                                                                             

<p>On page 73, make the following changes:</p>                                                                    

<blockquote>                                                                     

<p><i><b>Normative Properties.</b></i> <i>Normative</i> means that implementations that claim conformance to the Unicode Standard (at a particular version) and that make use of a particular property must follow the specifications of the standard for that property to be conformant. <insert><u>Thus, for example, the Bidirectional                                                                     

Character Type is required for conformance whenever displaying bidirectional                                                                     

text, such as Arabic or Hebrew.</u></insert> The term <i>normative </i>when applied to a character property does                                                                   

<i> not</i> mean that the value of the property will never change. Corrections and extensions to the standard in the future may require minor changes to normative values, even though the Unicode Technical Committee strives to minimize such changes.</p>                                                                     

  <p><b><i>Informative Properties.</i></b> If a character property is only <i>informative</i>, a conformant implementation is free to use or change such values as it may require,                                                                   

  while still remaining conformant to the standard. <u>However, their use is strongly recommended.</u>                                                                   

  Particular implementations may choose to override the properties that are not normative. In that case, the implementer has the option of establishing a protocol to convey that information.                                                                   

  <p><u><b><i>Normative References.</i></b> Other specifications may choose to make                                                                     

normative references to Unicode character properties irrespective                                                                     

of their status as normative or informative in the Unicode Standard.</u></p>                                                                                    

</blockquote>                                                                    

<p>On page 102, add the following at the bottom of the page:</p>                                                                           

<blockquote>                                                                           

  <p><b><i><u>Identifier Stability. </u></i></b><u>Unicode General Category values are kept as stable as possible, but they                                                                             

  may change in ways that affect identifiers in new versions (See <a                                                                            

  href="../../standard/policies.html">Unicode Policies</a> for more                                                                             

  information.) When another standard or product upgrades to a new version of                                                                             

  the Unicode Standard, it may have to handle characters that were formerly part                                                                             

  of ID_Start or ID_Continue, but are no longer.</u></p>                                                                            

  <p><u>This situation can be handled by having two explicit backwards                                                                             

  compatibility lists: ID_Start_Supplement and ID_Continue_Supplement. The                                                                             

  implementation's specification of identifiers would include the union of the                                                                             

  respective Unicode properties and those supplement lists.</u></p>                                                                            

</blockquote>                                                                            

<h3>Unicode Standard Annex # 9, The Bidirectional Algorithm (revision)</h3>                                                                            

<p>UAX #9 supersedes the text in <i>Section 3.12, Bidirectional Behavior</i>, in                                                                             

<i>The Unicode Standard, Version 3.0</i>. There are minor, non-normative textual                                                                             

revisions to the text of <a href="../tr9/">UAX #9</a> for Unicode 3.1.</p>                                                                            

<h3>Unicode Standard Annex #15 Unicode Normalization Forms (revision)</h3>                                                                            

<p>In a corrigendum to UAX #15, U+FB1D YOD WITH HIRIQ has been added to the Composition Exclusion List.                                      

For more information, see <a                                                                            

href="../tr15/">UAX #15</a>.</p>                                                                            

<h2 class="bb"><a name="guidelines">IV Guidelines</a></h2>                                                                            

<p>The following text amends portions of <i>Chapter 5, Implementation Guidelines</i>                                                                             

in <i>The Unicode Standard, Version 3.0</i>.</p>                                                                            

<h3>5.2 ANSI/ISO C wchar_t (revision)</h3>                                                                            

<p><i>Section 5.2, ANSI/ISO C wchar_t</i>, pages 107-108, the text is amended                                                                             

with the following additions and deletions.</p>                                                                            

<blockquote>                                                                            

  With the wchar_t wide character type, ANSI/ISO C provides for the inclusion of                                                                             

  fixed-width, wide characters. ANSI/ISO C leaves the semantics of the wide                                                                             

  character set to the specific implementation but requires that the characters                                                                             

  from the portable C execution set correspond to their wide character                                                                             

  equivalents by zero extension. The Unicode characters in the ASCII range                                                                             

  U+0020 to U+007E satisfy these conditions. Thus, if an implementation uses                                                                             

  ASCII to code the portable C execution set, the use of the Unicode character                                                                             

  set for the wchar_t type, <strike>with a width of 16 bits </strike><u>in                                                                             

  either UTF-16 or UTF-32 form</u>, fulfills the requirement.                                                                            

</blockquote>                                                                            

<blockquote>                                                                            

  The width of wchar_t is compiler-specific and can be as little as 8 bits.                                                                             

  Consequently, programs that need to be portable across any C or C++ compiler                                                                             

  should not use wchar_t for storing Unicode text. The wchar_t type is intended                                                                             

  for storing compiler-defined wide characters, which may be Unicode characters                                                                             

  in some compilers. However, <strike>some </strike>programmers <u>who want a                                                                             

  UTF-16 implementation </u>can use a macro or typedef (for example, UNICHAR)                                                                             

  that can be compiled as unsigned short or wchar_t depending on the target                                                                             

  compiler and platform. <u>Other programmers who want a UTF-32 implementation                                                                             

  can use a macro or typedef which might be compiled as unsigned int or wchar_t,                                                                             

  depending on the target compiler and platform. </u>This choice enables correct                                                                             

  compilation on different platforms and compilers. Where a 16-bit                                                                             

  implementation of wchar_t is guaranteed, such macros or typedefs may be                                                                             

  predefined (for example, WCHAR on Win32 API).                                                                            

</blockquote>                                                                            

<blockquote>                                                                            

  On systems where the native character type or wchar_t is implemented as a                                                                             

  32-bit quantity, an implementation may <u>use the UTF-32 form </u><strike>transiently                                                                             

  use 32-bit quantities</strike> to represent Unicode characters. <strike>during                                                                             

  processing. The internal workings of this representation are treated as a                                                                             

  black box and are not Unicode-conformant. In particular, any API or runtime                                                                             

  library interfaces that accept strings of 32-bit characters are not                                                                             

  Unicode-conformant. If such an implementation interchanges 16-bit Unicode                                                                             

  characters with the outside world, then this interchange can be conformant as                                                                             

  long as the interface for this interchange complies with the requirements of <i>Chapter                                                                             

  3, Conformance</i>.</strike>                                                                            

</blockquote>                                                                            

<blockquote>                                                                            

  <u>A limitation of the ISO/ANSI C model is its assumption that characters can                                                                             

  always be processed in isolation.</u> <u>Implementations that choose to go                                                                             

  beyond the ISO/ANSI C model may find it useful to mix widths within their                                                                             

  APIs.</u> <u>For example, an implementation may have a 32-bit wchar_t and                                                                             

  process strings in any of UTF-8, UTF-16 or UTF-32 forms. Another                                                                             

  implementation may have a 16-bit wchar_t and process strings as UTF-8 or                                                                             

  UTF-16, but have additional APIs that process individual characters as UTF-32,                                                                             

  or deal with pairs of UTF-16 code units.</u>                                                                            

</blockquote>                                                                            

<h3>Unassigned Code Points</h3>                                                                            

<p><i>Section 5.3, Unknown and Missing Characters: Unassigned and Private Use                                                                             

Character Codes,</i> pages 108-109: add the following to the end of the                                                                             

subsection.</p>                                                                            

<blockquote>                                                                            

  <p>In practice, applications must deal with unassigned code points or unknown                                                                             

  private use characters. This may occur, for example, when the application is                                                                             

  handling text that originated on a system implementing a later release of                                                                             

  Unicode, with additional assigned characters. To work properly in                                                                             

  implementations, unassigned code points must be given default properties as if                                                                             

  they were characters, since various algorithms require properties to be                                                                             

  assigned to every character in order to function at all. These properties are                                                                             

  not uniform across all unassigned code points, since certain ranges of code                                                                             

  points need different properties to maximize compatibility.</p>                                                                            

  <p>Normally, code points outside the repertoire of supported characters would                                                                             

  be displayed with a fall-back glyph, such as a black box. However, format and                                                                             

  control characters must not have visible glyphs (although they may have an                                                                             

  effect on other characters in display). These characters are also ignored                                                                             

  except with respect to specific, defined processes: for example, ZERO WIDTH                                                                             

  NON-JOINER is ignored in collation. To allow a greater degree of compatibility                                                                             

  across versions of the standard, the ranges U+2060..U+206F, U+FFF0..U+FFFC,                                                                             

  and U+E0000..U+E0FFF are reserved for format and control characters (General                                                                             

  Category = Cf). Unassigned code points in these ranges should be ignored in                                                                             

  processing and display.</p>                                                                            

  <p>The Unicode Bidirectional Algorithm assigns a Bidirectional Category to                                                                              

  unassigned code points based on the expected direction of characters to be                                                                              

  added in the future. For more information, see Bidirectional Character Types                                                                              

  in <a href="http://www.unicode.org/unicode/reports/tr9/">Unicode Standard                                                                              

  Annex #9:&nbsp;The Bidirectional Algorithm</a>.</p>                                                                             

  <p><a href="http://www.unicode.org/unicode/reports/tr14/">Unicode Standard                                                                              

  Annex #14: Line Breaking Properties</a> supplies the property &quot;XX&quot;                                                                              

  for all unassigned code points in Definitions.</p>                                                                             

  <p>In determining character widths for East Asian display, <a                                                                             

  href="http://www.unicode.org/unicode/reports/tr11/">Unicode Standard Annex                                                                              

  #11:&nbsp;East Asian Width</a> includes a section on Unassigned and Private                                                                              

  Use characters.</p>                                                                             

  <p>In <a href="http://www.unicode.org/unicode/reports/tr15/">Unicode Standard                                                                              

  Annex #15, Unicode Normalization Forms</a>, unassigned code points are given                                                                              

  the Canonical Combining Class = 0, and no decomposition mapping.</p>                                                                             

</blockquote>                                                                             

  <h3>Identifiers</h3>                                                                             

  <p><i>Section 5.16, Identifiers: Specific Character Additions</i><b><i>,</i></b>                                                                              

  page 134: the subsection name is changed to <i>Specific Character Adjustments,</i>                                                                              

  and the following note is added:</p>                                                                             

<blockquote>                                                                            

    <p><u><b>Note: </b>a useful set of characters to consider for exclusion from                                                                              

    identifiers consists of all characters whose compatibility mappings have a <code>&lt;font&gt;</code>                                                                              

    tag.</u></p>                                                                             

</blockquote>                                                                             

<h3>5.11 Language Tagging (revision)</h3>                                                                             

<p><i>Section 5.11, Language Tagging in Plain Text, </i>page 114: delete the                                                                              

following paragraph:</p>                                                                             

<blockquote>                                                          

<p><strike>For interchange purposes, it is becoming common to use tagged                                                                              

information, which is embedded in the text. Unicode Technical Report #7,                                                                              

&quot;Plane 14 Characters for Language Tags,&quot; which is found on the CD-ROM                                                                              

or in its up-to-date version on the Unicode Web site, provides a proposed                                                                              

mechanism for representing language tags. Like most tagging mechanisms, these                                                                              

language tags are stateful: a start tag establishes an attribute for the text,                                                                              

and an end tag concludes it.</strike></p>                                                                             

</blockquote>                                                          

<p>The subsection <i>Working with Language Tags,</i> pages 114-115, has been                                                                              

moved to the newly created <i><a href="#tag">Section 13.7, Tag Characters</a></i>,                                                                              

which is part of Article V, Block Descriptions. This is because its                                                                              

recommendations are specific to the tag characters described there.</p>                                                                             

<h2 class="bb"><a name="block">V Block Descriptions</a></h2>                                                                             

<p>Note: The numbering used here for block descriptions and revised text follows                                                                              

<i>The Unicode Standard, Version 3.0</i> for ease of cross-reference.</p>                                                                             

<h3>6.1 General Punctuation (revision)</h3>                                                                             

<h3>Numeric Separators</h3>                                                                             

<p><i>Section 6.1, General Punctuation, Punctuation: U+0020-U+00BF,</i> page 149:                                                                              

the following note is added:</p>                                                                             

<blockquote>                                                                             

  <p><u><b>Note: </b>any of the characters U+002C, U+002E, U+060C, U+066B, or                                                                              

  U+066C (and possibly others) can be used as numeric separator characters,                                                                              

  depending on the locale and user customizations.</u></p>                                                                             

</blockquote>                                                                             

<h3>CJK Symbols and Punctuation: U+3000-U+303F</h3>                                                                             

<p><i>Section 6.1, General Punctuation, CJK Symbols and Punctuation:                                                                              

U+3000-U+303F</i>, page 155: The first paragraph is updated as follows:</p>                                                                             

<blockquote>                                                                             

  <p>This block encodes punctuation marks and symbols used primarily by writing                                                                              

  systems that employ Han ideographs. <u>Some of the punctuation marks, in                                                                              

  particular the brackets, are used in other typographic contexts as well.</u>                                                                              

  Most of these characters are found in East Asian standards.</p>                                                                             

</blockquote>                                                                             

<p><i>Section 6.1 General Punctuation, CJK Symbols and Punctuation:                                                                              

U+3000-U+303F</i>, page 155: add the following paragraph after the paragraph on                                                                              

&quot;U+3006&quot;:</p>                                                                             

<blockquote>                                                                             

  <p><u>U+3008, U+3009 angle brackets have ambiguous width. They are wide in an                                                                              

  East Asian context, but are narrow when used in other contexts, such as                                                                              

  mathematics. There are other characters in this block that have the same                                                                              

  characteristics, including double angle brackets, tortoise shell brackets, and                                                                              

  white square brackets.</u></p>                                                                             

</blockquote>                                                                             

<h3>7.5 Georgian (revision)</h3>                                                                             

<p>Note: The following text replaces the entire text of <i>Section 7.5, Georgian</i>,                                                

on page 173.</p>                                                                            

<h4>Georgian: U+10A0-U+10FF</h4>                                                                            

<p>The Georgian script is used primarily for writing the Georgian language and                                                                             

its dialects. It is also used for the Svan and Mingrelian languages, and in the                                                                             

past was used for Abkhaz and other languages of the Caucasus.</p>                                                                            

<p><b><i>Script Forms.</i></b> The Georgian script originates from an                                                                             

inscriptional form called <i>Asomtavruli</i>, from which was derived a                                                                             

manuscript form called <i>Nuskhuri</i>. Together these forms are categorized as <i>Khutsuri</i>                                                                             

(ecclesiastical), but <i>Khutsuri</i> is not itself the name of a script form.                                                                             

Although no longer seen in most modern texts, the <i>Nuskhuri</i> style is still                                                                             

used for liturgical purposes. It was replaced, through a history now uncertain,                                                                             

by an alphabet called <i>Mkhedruli</i> (military), which is now the form used                                                                             

for nearly all modern Georgian writing.</p>                                                                            

<p><b><i>Case Forms</i></b>. The Georgian alphabet is fundamentally caseless,                                                                             

and is used as such in most texts. However, possibly owing to the influence of                                                                             

case forms in other alphabets, modern Georgian is occasionally written with                                                                             

uppercase capital letters. In this typographic departure, it is the <i>Asomtavruli</i>                                                                             

forms that serve to represent uppercase letters, while the lowercase is <i>Mkhedruli</i>                                                                             

or <i>Nuskhuri</i>. This usage parallels the evolution of the Latin alphabet, in                                                                             

which the original linear monumental style came to be considered uppercase,                                                                             

while manuscript styles of the same alphabet came to be represented as                                                                             

lowercase. The Unicode encoding of Georgian follows the Latin analogy: the range                                                                             

U+10A0..U+10CF is used to encode the uppercase capital forms (<i>Asomtavruli</i>),                                                                             

and the basic alphabetic range U+10D0..U+10FF may be regarded as lowercase (<i>Mkhedruli</i>                                                                             

or <i>Nuskhuri</i>). In lowercase (i.e. normal caseless) Georgian text, <i>Mkhedruli</i>                                                                             

or <i>Nuskhuri</i> are distinguished via font, as are regular and italic forms                                                                             

in Latin lowercase.</p>                                         

<div align="center"><table cellSpacing=0 cellPadding=4 border=1>                                        

  <tbody>                                        

  <tr>                                        

    <th align=right>Font style                                         

    <th>"uppercase"<br>U+10A0..U+10CF                                         

    <th>basic/"lower"<br>U+10D0..U+10FF                                         

  <tr>                                        

    <th align=right>Secular                                         

    <td align=center>Asomtavruli                                         

    <td align=center>Mkhedruli                                         

  <tr>                                        

    <th align=right>Ecclesiastical                                         

    <td align=center>Asomtavruli                                         

    <td align=center>Nuskhuri </td></tr></tbody></table></div>                                        

                                                                           

<p>The figure below shows how the Georgian code chart would appear if presented in an                            

ecclesiastical font:</p>                                                                          

                                                                          

<p align="center"><img border="0" src="georgian-asom-nuskh2.gif" alt="Georgian code chart showing ecclesiastical font" width="297" height="717"></p>                                                                         

                                                                         

<p>Because Georgian is predominantly used as a caseless alphabet, no default                                                                           

case mappings are provided for Georgian in the Unicode Character Database. It is                                                                           

inadvisable for generic Unicode text processing to convert Georgian <i>Mkhedruli</i>                                                                           

text to <i>Asomtavruli</i> via a casing operation. In instances where software                                                                           

dealing with Georgian text treats <i>Asomtavruli</i> forms as uppercase letters                                                                           

and requires case folding, this should be done via extended casing rules that                                    

constitute a higher-level protocol.</p>                                                                         

<p><b><i>Georgian Paragraph Separator.</i></b> The Georgian paragraph separator                                                                          

has a distinct representation, so it has been separately encoded as U+10FB. It                                                                          

visually marks a paragraph end, but it must be followed by a newline character                                                                          

as described in <a href="../tr13/">Unicode Standard Annex #13, Unicode Newline                                                                          

Guidelines</a>, to cause a paragraph termination.</p>                                                                         

<p><b><i>Other Punctuation.</i></b> For the Georgian full stop, use U+0589                                                                          

ARMENIAN FULL STOP or U+002E FULL STOP.</p>                                                                         

<p>For additional punctuation to be used with this script, see C0 Controls and                                                                          

ASCII Punctuation (U+0000..U+007F) and General Punctuation (U+2000..U+206F).</p>                                                                         

<h3>7.10 Old Italic (new section)</h3>                                                                         

<h4>Old Italic: U+10300-U+1032F</h4>                                                                         

<p>The Old Italic script unifies a number of related historical alphabets                                                                          

located on the Italian peninsula. Some of these were used for non-Indo-European                                                                          

languages (Etruscan and probably North Picene), and some for various                                                                          

Indo-European languages belonging to the Italic branch (Faliscan and members of                                                                          

the Sabellian group, including Oscan, Umbrian, and South Picene). The ultimate                                                                          

source for the alphabets in ancient Italy is Euboean Greek used at Ischia and                                                                          

Cumae in the bay of Naples in the eighth century BCE. Unfortunately, no Greek                                                                          

abecedaries from southern Italy have survived. Faliscan, Oscan, Umbrian, North                                                                          

Picene, and South Picene all derive from an Etruscan form of the alphabet.</p>                                                                         

<p>There are some 10,000 inscriptions in Etruscan. By the time of the earliest                                                                          

Etruscan inscriptions, circa 700 BCE, local distinctions are already found in                                                                          

the use of the alphabet. Three major stylistic divisions are identified: the                                                                          

Northern, Southern, and Caere/Veii. Use of Etruscan can be divided into two                                                                          

stages, owing largely to the phonological changes that occurred: the                                                                          

&quot;archaic Etruscan alphabet&quot;, used from the seventh to the fifth                                                                          

centuries BCE, and the &quot;neo-Etruscan alphabet&quot;, used from the fourth                                                                          

to the first centuries BCE. Glyphs for eight of the letters differ between the                                                                          

two periods; additionally, neo-Etruscan abandoned the letters KA, KU, and EKS.</p>                                                                         

<p>The unification of these alphabets into a single Old Italic script requires                                                                          

language-specific fonts because the glyphs most commonly used may differ                                                                          

somewhat depending on the language being represented.</p>                                                                         

<p>Most of the languages have added characters to the common repertoire:                                                                          

Etruscan and Faliscan add LETTER EF; Oscan adds LETTER EF, LETTER II, and LETTER                                                                          

UU; Umbrian adds LETTER EF, LETTER ERS, and LETTER CHE; North Picene adds LETTER                                                                          

UU; and Adriatic adds LETTER II and LETTER UU.</p>                                                                         

<p>The Latin script itself derives from a south Etruscan model, probably from                                                                          

Caere or Veii, around the mid-seventh century BCE or a bit earlier, but because                                                                          

there are significant differences between Latin and Faliscan of the seventh and                                                                          

sixth centuries BCE in terms of formal differences (glyph shapes,                                                                          

directionality) and differences in the repertoire of letters used, this warrants                                                                          

a distinctive character block. Fonts for early Latin should use the <i>uppercase</i>                                                                          

code positions U+0041..U+005A. The unified Alpine script, which includes the                                                                          

Venetic, Rhaetic, Lepontic, and Gallic alphabets, has not yet been proposed for                                                                          

addition to the Unicode Standard but is considered to differ enough from both                                                                          

Old Italic and Latin to warrant independent encoding. The Alpine script is                                                                          

thought to be the source for Runic, which is encoded at U+16A0..U+16FF.</p>                                                                         

<p>Character names assigned to the Old Italic block are unattested but have been                                                                          

reconstructed according to the analysis made by Geoffrey Sampson. While the                                                                          

Greek character names (ALPHA, BETA, GAMMA, etc.) were borrowed directly from the                                                                          

Phoenician names (modified to Greek phonology), the Etruscans are thought to                                                                          

have abandoned the Greek names in favor of a phonetically-based nomenclature,                                                                          

where stops were pronounced with a following -e sound, and liquids and sibilants                                                                          

(which can be pronounced more or less on their own) were pronounced with a                                                                          

leading <i>e-</i> sound (so [k], [d] became [ke:], [de:] but [l:], [m:] became                                                                          

[el], [em]. It is these names, according to Sampson, which were borrowed by the                                                                          

Romans when they took their script from the Etruscans.</p>                                                                         

<p><b><i>Directionality.</i></b> Most early Etruscan texts have right-to-left                                                                          

directionality. From the third century BCE, left-to-right texts appear, showing                                                                          

the influence of Latin. Oscan, Umbrian, and Faliscan also generally have                                                                          

right-to-left directionality. Boustrophedon appears rarely, and not especially                                                                          

early (for instance, the Forum inscription dates to 550-500 BCE). Despite this,                                                                          

for reasons of implementation simplicity, many scholars prefer left-to-right                                                                          

presentation of texts, as this is also their practice when transcribing the                                                                          

texts into Latin script. Accordingly, the Old Italic script has a default                                                                          

directionality of strong left-to-right in this standard. When directional                                                                          

overrides are used to produce right-to-left presentation, the glyphs in fonts                                                                          

must be mirrored from the glyphs shown in the tables below.</p>                                                                         

<p><b><i>Punctuation.</i></b> The earliest inscriptions are written with no                                                                          

space between words in what is called <i>scriptio continua</i>. There are                                                                          

numerous Etruscan inscriptions with dots separating word forms, attested as                                                                          

early as the second quarter of the seventh century BCE. This punctuation is                                                                          

sometimes, but rarely, used to separate syllables rather than words. From the                                                                          

sixth century BCE words were often separated by one, two, or three dots spaced                                                                          

vertically above each other.</p>                                                                         

<p><b><i>Numerals.</i></b> Etruscan numerals are not well-attested in the                                                                          

available materials, but are employed in the same fashion as Roman numerals.                                                                          

Several additional numerals are attested, but as their use is at present                                                                          

uncertain, they are not yet encoded in the Unicode Standard.</p>                                                                         

<p><b><i>Glyphs.</i></b> The default glyphs in the code charts are based on the                                                                          

most common shapes found for each letter. Most of these are similar to the                                                                          

Marsiliana abecedary (mid-seventh century BCE). Note that the phonetic values                                                                          

for U+10317 OLD ITALIC LETTER EKS [ks] and U+10319 OLD ITALIC LETTER KHE [kh]                                                                          

show the influence of western, Euboean Greek; eastern Greek has U+03A7 GREEK                                                                          

CAPITAL LETTER CHI [x] and U+03A8 GREEK CAPITAL LETTER PSI [ps], instead.</p>                                                                         

<p align="center"><img border="0" src="old-italic-map.gif" alt="Map of Old Italic" width="365" height="330"></p>                                                                         

<p>The geographic distribution of the Old Italic script is shown in the figure                                                                          

above. In the figure, the approximate distribution of ancient languages which                                                                          

used Old Italic alphabets is shown in white. Areas for ancient languages which                                                                          

used other scripts are shown in gray, and the labels for those languages are                                                                          

shown in oblique type. In particular, note that the ancient Greek colonies of                                                                          

the southern Italian and the Sicilian coasts used the Greek script proper. And                                                                          

languages such as Ligurian, Venetic, etc., of the far north of Italy made use of                                                                          

alphabets of the Alpine script. Rome, of course, is also shown in gray, since                                                                          

Latin was written with the Latin alphabet, now encoded in the Latin script.</p>                                                                         

<h3>7.11 Gothic (new section)</h3>                                                                         

<h4>Gothic: U+10330-U+1034F</h4>                                                                         

<p>The Gothic script was devised in the fourth century by the Gothic bishop,                                                                          

Wulfila (311-383 CE), to provide his people with a written language and a means                                                                          

of reading his translation of the Bible. Written Gothic materials are largely                                                                          

restricted to fragments of Wulfila's translation of the Bible; these fragments                                                                          

are of considerable importance in New Testament textual studies. The chief                                                                          

manuscript, kept at Uppsala, is the Codex Argenteus or &quot;the Silver                                                                          

Book,&quot; which is partly written in gold on purple parchment. Gothic is an                                                                          

East Germanic language; this branch of Germanic has died out and thus the Gothic                                                                          

texts are of great importance in historical and comparative linguistics. Wulfila                                                                          

appears to have used the Greek script as a source for the Gothic, as can be seen                                                                          

from the basic alphabetical order. Some of the character shapes suggest Runic or                                                                          

Latin influence, but this is apparently coincidental.</p>                                                                         

<p align="left"><b><i>Diacritics.</i></b> The tenth letter U+10339 GOTHIC LETTER                                                                          

EIS is used with U+0308 COMBINING DIAERESIS when word-initial, when                                                                          

syllable-initial after a vowel, and in compounds with a verb as second member as                                                                          

shown below:</p>                                                                         

<p align="center"><img border="0" src="gothic-ex1.gif" alt="Gothic example" width="384" height="72"></p>                                                                         

<p align="left">To indicate contractions or omitted letters, U+0305 COMBINING                                                                          

OVERLINE is used.</p>                                                                         

<p><b><i>Numerals.</i></b> Gothic letters, like those of other early Western                                                                          

alphabets, can be used as numbers; two of the characters have only a numeric                                                                          

value, and are not used alphabetically. To indicate numeric use of a letter, it                                                                          

is either flanked on either side by U+00B7 MIDDLE DOT, or it is followed by both                                                                          

U+0304 COMBINING MACRON and U+0331 COMBINING MACRON BELOW as shown in the                                                                          

following example:</p>                                                                         

<p align="center"><img border="0" src="gothic-ex2.gif" alt="Gothic example" width="237" height="29"></p>                                                                         

<p><b><i>Punctuation.</i></b> Gothic manuscripts are written with no space                                                                          

between words in what is called <i>scriptio continua</i>. Sentences and major                                                                          

phrases are often separated by U+0020 SPACE, U+00B7 MIDDLE DOT or U+003A COLON.</p>                                                                         

<h3>10.1 Han (revision)</h3>                                                                         

<p>Because of the addition of CJK Unified Ideographs Extension B, change the                                                                          

definition of UnifiedIdeograph on page 269 from the following:</p>                                                                         

<pre><i>UnifiedIdeograph ::</i>= U+3400 | U+3401 | ... | U+4DB4 | U+4DB5 | U+4E00 | U+4E01 | ... 

                   | U+9FA4 | U+9FA5 | U+FA0E |U+FA0F | U+FA11 | U+FA13 | U+FA14 

                   | U+FA1F |U+FA21 | U+FA23 | U+FA24 | U+FA27 | U+FA28 |U+FA29</pre>                                                                         

<p>to this:</p>                                                                         

<pre><i>UnifiedIdeograph ::</i>= U+3400 | U+3401 | ... | U+4DB4 | U+4DB5 | U+4E00 | U+4E01 | ... 

                   | U+9FA4 | U+9FA5 | U+FA0E |U+FA0F | U+FA11 | U+FA13 | U+FA14 

                   | U+FA1F |U+FA21 | U+FA23 | U+FA24 | U+FA27 | U+FA28 |U+FA29

                   | U+20000| U+20001| ... | U+2A6D5| U+2A6D6</pre>                                                                         

<h3>10.1 Han (new subsections)</h3>                                                                         

<h4>CJK Unified Ideographs Extension B: U+20000-U+2A6D6</h4>                                                                         

<p>The ideographs in the CJK Unified Ideographs Extension B represent an                                                                          

additional set of 42,711 ideographs beyond the 27,484 included in <i>The Unicode                                                                          

Standard, Version 3.0</i>.</p>                                                                         

<p><i>Section 10.1, Han</i> in <i>The Unicode Standard</i> describes the basic                                                                          

principles underlying the selection, organization, and unification of Han                                                                          

ideographs. These same principles apply to the ideographs in the CJK Unified                                                                          

Ideographs Extension B block.</p>                                                                         

<p>The ideographs in this block are derived from the six IRG sources: G-source,                                                                          

H-source, T-source, J-source, K-source, and V-source. There is no U-source for                                                                          

ideographs in the CJK Unified Ideographs Extension B block. The H-source                                                                          

represents a new IRG source beyond the ones used for earlier blocks of Han                                                                          

ideographs and is used for characters derived from standards published by the                                                                          

Hong Kong SAR.</p>                                                                         

<p>The standards and other references associated with these six IRG sources are                                                                          

listed in the table below. For each of the six IRG sources, the second column of                                                                          

the table contains an abbreviated name of the source; the third column gives a                                                                          

descriptive name. The abbreviated names are used in various data files published                                                                          

by the Unicode Consortium and ISO/IEC to identify the specific IRG sources. For                                                                          

a more detailed explanation of the format of this table, refer to <i>Table 10-1,                                                                          

Sources for Unified Han</i>, on page 259 of <i>The Unicode Standard, Version 3.0</i>.</p>                                                                         

<div align="center">                                                                         

  <center>                                                                         

  <table border="2" cellpadding="2" cellspacing="0" width="594">                                                                         

    <tr>                                                                         

      <td width="97" valign="top" align="left">G source:</td>                                                                         

      <td width="53" valign="top" align="left">G_KX</td>                                                                         

      <td width="422" valign="top" align="left">KangXi dictionary ideographs                                                                          

        (including the addendum) not already encoded in the BMP</td>                                                                         

    </tr>                                                                         

    <tr>                                                                         

      <td width="97" valign="top" align="left">&nbsp;</td>                                                                         

      <td width="53" valign="top" align="left">G_HZ</td>                                                                         

      <td width="422" valign="top" align="left">Hanyu Da Zidian ideographs not                                                                          

        already encoded in the BMP</td>                                                                         

    </tr>                                                                         

    <tr>                                                                         

      <td width="97" valign="top" align="left">&nbsp;</td>                                                                         

      <td width="53" valign="top" align="left">G_CY</td>                                                                         

      <td width="422" valign="top" align="left">Ci Yuan</td>                                                                         

    </tr>                                                                         

    <tr>                                                                         

      <td width="97" valign="top" align="left">&nbsp;</td>                                                                         

      <td width="53" valign="top" align="left">G_CH</td>                                                                         

      <td width="422" valign="top" align="left">Ci Hai</td>                                                                         

    </tr>                                                                         

    <tr>                                                                         

      <td width="97" valign="top" align="left">&nbsp;</td>                                                                         

      <td width="53" valign="top" align="left">G_HC</td>                                                                         

      <td width="422" valign="top" align="left">Hanyu Da Cidian</td>                                                                         

    </tr>                                                                         

    <tr>                                                                         

      <td width="97" valign="top" align="left">&nbsp;</td>                                                                         

      <td width="53" valign="top" align="left">G_BK</td>                                                                         

      <td width="422" valign="top" align="left">Chinese Encyclopedia</td>                                                                         

    </tr>                                                                         

    <tr>                                                                         

      <td width="97" valign="top" align="left">&nbsp;</td>                                                                         

      <td width="53" valign="top" align="left">G_FZ</td>                                                                         

      <td width="422" valign="top" align="left">Founder Press System</td>                                                                         

    </tr>                                                                         

    <tr>                                                                         

      <td width="97" valign="top" align="left">&nbsp;</td>                                                                         

      <td width="53" valign="top" align="left">G_4K</td>                                                                         

      <td width="422" valign="top" align="left">Siku Quanshu</td>                                                                         

    </tr>                                                                         

    <tr>                                                                         

      <td width="97" valign="top" align="left">H source:</td>                                                                         

      <td width="53" valign="top" align="left">H</td>                                                                         

      <td width="422" valign="top" align="left">Hong Kong Supplementary                                                                          

        Character Set</td>                                                                         

    </tr>                                                                         

    <tr>                                                                         

      <td width="97" valign="top" align="left">T source:</td>                                                                         

      <td width="53" valign="top" align="left">T4</td>                                                                         

      <td width="422" valign="top" align="left">CNS 11643-1992, 4th plane</td>                                                                         

    </tr>                                                                         

    <tr>                                                                         

      <td width="97" valign="top" align="left">&nbsp;</td>                                                                         

      <td width="53" valign="top" align="left">T5</td>                                                                         

      <td width="422" valign="top" align="left">CNS 11643-1992, 5th plane</td>                                                                         

    </tr>                                                                         

    <tr>                                                                         

      <td width="97" valign="top" align="left">&nbsp;</td>                                                                         

      <td width="53" valign="top" align="left">T6</td>                                                                         

      <td width="422" valign="top" align="left">CNS 11643-1992, 6th plane</td>                                                                         

    </tr>                                                                         

    <tr>                                                                         

      <td width="97" valign="top" align="left">&nbsp;</td>                                                                         

      <td width="53" valign="top" align="left">T7</td>                                                                         

      <td width="422" valign="top" align="left">CNS 11643-1992, 7th plane</td>                                                                         

    </tr>                                                                         

    <tr>                                                                         

      <td width="97" valign="top" align="left">&nbsp;</td>                                                                         

      <td width="53" valign="top" align="left">TF</td>                                                                         

      <td width="422" valign="top" align="left">CNS 11643-1992, 15th plane</td>                                                                         

    </tr>                                                                         

    <tr>                                                                         

      <td width="97" valign="top" align="left">J source:</td>                                                                         

      <td width="53" valign="top" align="left">J3</td>                                                                         

      <td width="422" valign="top" align="left">JIS X 0213:2000, level 3</td>                                                                         

    </tr>                                                                         

    <tr>                                                                         

      <td width="97" valign="top" align="left">&nbsp;</td>                                                                         

      <td width="53" valign="top" align="left">J4</td>                                                                         

      <td width="422" valign="top" align="left">JIS X 0213:2000, level 4</td>                                                                         

    </tr>                                                                         

    <tr>                                                                         

      <td width="97" valign="top" align="left">K source:</td>                                                                         

      <td width="53" valign="top" align="left">K4</td>                                                                        

      <td width="422" valign="top" align="left">PKS 5700-3:1998</td>                                                                        

    </tr>                                                                        

    <tr>                                                                        

      <td width="97" valign="top" align="left">V source:</td>                                                                        

      <td width="53" valign="top" align="left">V0</td>                                                                        

      <td width="422" valign="top" align="left">TCVN 5773:1993</td>                                                                        

    </tr>                                                                        

    <tr>                                                                        

      <td width="97" valign="top" align="left">&nbsp;</td>                                                                        

      <td width="53" valign="top" align="left">V2</td>                                                                        

      <td width="422" valign="top" align="left">VHN 01:1998</td>                                                                        

    </tr>                                                                        

    <tr>                                                                        

      <td width="97" valign="top" align="left">&nbsp;</td>                                                                        

      <td width="53" valign="top" align="left">V3</td>                                                                        

      <td width="422" valign="top" align="left">VHN 02:1998</td>                                                                        

    </tr>                                                                        

  </table>                                                                        

  </center>                                                                        

</div>                                                                        

<p>As with other Han ideograph blocks, the ideographs in the CJK Unified                                                                         

Ideographs Extension B block are derived from versions of national standards                                                                         

submitted to the IRG by its members. They may in some instances be slightly                                                                         

different from published versions of these standards.</p>                                                                        

<p>As with other CJK unified ideographs, the names for these characters are                                                                         

algorithmic. Thus, CJK UNIFIED IDEOGRAPH-20000 is the name for the ideograph at                                                                         

U+20000.</p>                                                                        

<p>These ideographs may be used in Ideographic Description Sequences (see <i>The                                                                         

Unicode Standard, Version 3.0, Section 10.1, Han</i>, pages 268-271).</p>                                                                        

<h4>CJK Compatibility Ideographs Supplement: U+2F800-U+2FA1D</h4>                                                                        

<p>This block consists of additional compatibility ideographs required for                                                                         

round-trip compatibility with CNS 11643-1992, planes 3, 4, 5, 6, 7, and 15. They                                                                         

should not be used for any other purpose and, in particular, may not be used in                                                                         

Ideographic Description Sequences.<br>                                                                        

<br>                                                                        

The names for the compatibility ideographs are also algorithmic. Thus, the name                                                                         

for the compatibility ideograph U+2F800 is CJK COMPATIBILITY IDEOGRAPH-2F800.</p>                                                              

<h3>10.5 Bopomofo (revision)</h3>                                                               

<p>On page 278, modify the "Standard Mandarin Bopomofo" paragraph as follows:</p>                                                               

<p>The order of the Mandarin Bopomofo letters U+3105.. U+3129 is standard worldwide. The code offset of the first letter U+3105 BOPOMOFO LETTER                                                        

B from a multiple of 16 is included to match the offset in the ISO-registered standard GB 2312.                                                        

The character U+3127 BOPOMOFO LETTER I <u> may be rendered as either a                                                        

horizontal stroke or a vertical stroke </u><strike>is usually written as a vertical stroke when Bopomofo text is set                                                       

vertically.</strike> <u>Often the glyph is chosen to stand                                                        

perpendicular to the text baseline (e.g. a horizontal stroke in                                                        

vertically-set text), but other usage is also common.</u> In the Unicode                                                        

Standard,<strike> this representation is considered to be a rendering variation; the variant is not assigned a separate character                                                       

code.</strike><u> the form shown in the charts is a horizontal stroke; the vertical                                                        

stroke form is considered to be a rendering variant. The variant glyph is                                                        

not assigned a separate character code.</u></p>                                                        

<h3>11.5 Deseret (new section)</h3>                                                                        

<h4>Deseret: U+10400-U+1044F</h4>                                                                        

<p>Deseret is a phonemic alphabet devised to write the English language. It was                                                                         

originally developed in the 1850s at the University of Deseret, now the                                                                         

University of Utah. It was promoted by The Church of Jesus Christ of Latter-day                                                                         

Saints, also known as the &quot;Mormon&quot; or LDS Church, under Church                                                                         

President Brigham Young (1801-1877). The name Deseret is taken from a word in                                                                         

the Book of Mormon defined to mean &quot;honeybee&quot; and reflects the LDS use                                                                         

of the beehive as a symbol of cooperative industry. Most literature about the                                                                         

script treats the term Deseret Alphabet as a proper noun and capitalizes it as                                                                         

such.</p>                                                                        

<p>Among the designers of the Deseret Alphabet was George D. Watt, who had                                             

been trained in shorthand and served as Brigham Young's secretary.                                             

It is possible that, under Watt's influence, Sir Isaac Pitman's 1847                                             

English Phonotypic Alphabet was used as the model for the Deseret                                             

Alphabet.</p>                                            

<p>The Church commissioned two typefaces and published four books using the                                                                          

Deseret Alphabet. The Church-owned <i>Deseret News</i> also published passages                                                                          

of scripture using the alphabet on occasion. In addition, some historical                                                                          

records, diaries, and other materials were handwritten using this script, and it                                             

had limited use on coins and signs. There is also one tombstone in Cedar City,                                             

Utah, written in the Deseret Alphabet. However, the script failed to gain wide acceptance and was not actively promoted after                                                                          

1869. Today, the Deseret Alphabet remains of interest primarily to historians                                                                          

and hobbyists.</p>                                                                         

<p><b><i>Letter Names and Shapes.</i></b> Pedagogical materials produced by the                                                                          

LDS Church gave names to all of the non-vowel letters and indicated the vowel                                                                          

sounds with English examples. In the Unicode Standard, the spelling of the                                                                          

non-vowel letter names has been modified to clarify their pronunciations, and                                                                          

the vowels have been given names which emphasize the parallel structure of the                                                                          

two vowel runs.</p>                                                                         

<p>The glyphs used in the Unicode Standard are derived from the second typeface                                                                          

commissioned by the LDS Church and represent the shapes most commonly found.                                                                          

Alternate glyphs are found in the first typeface and in some instructional                                                                          

material.</p>                                                                         

<p><b><i>Structure.</i></b> The script consists of thirty-eight letters. The                                                                          

alphabet is bicameral; capital and small letters differ only in size and not in                                                                          

shape. The order of the letters is phonetic: letters for similar classes of                                                                          

sound are grouped together. In particular, most consonants come in                                                                          

unvoiced/voiced pairs.</p>                                                                         

<p><b><i>Sorting.</i></b> The order of the letters in the Unicode Standard is                                                                          

the one used in all but one of the nineteenth-century descriptions of the                                                                          

alphabet. The exception is one in which the letters WU and YEE are inverted. The                                                                          

order YEE-WU follows the order of the &quot;coalescents&quot; in Pitman's work;                                                                          

the order WU-YEE appears in a greater number of Deseret materials however, and                                                                          

has been followed here.</p>                                                                         

<p>There is no evidence that any early materials written using the Deseret                                                                          

Alphabet were alphabetized. It is assumed that sorting and collation would have                                                                          

been based directly on the order of the letters within the alphabet.</p>                                                                         

<p><b><i>Typographic Conventions.</i></b> The Deseret Alphabet is written from                                                                          

left to right. Punctuation, capitalization, and digits are the same as in                                                                          

English. All words are written phonemically with the exception of short words                                                                          

that have pronunciations equivalent to letter names.</p>                                                                         

<p align="center"><img border="0" src="deseret-ex1.gif" width="294" height="132" alt="Deseret example"></p>                                                                         

<p><b><i>Phonetics.</i></b> An approximate IPA transcription of the sounds                                                                          

represented by the Deseret Alphabet is shown below.</p>                                                                         

<p align="center"><img border="0" src="deseret-ipa-chart.gif" alt="Deseret IPA chart" width="273" height="365"></p>                                                                         

<h3>12.2 Mathematical Alphanumeric Symbols (new subsection)</h3>                                                                         

<h4>Mathematical Alphanumeric Symbols: U+1D400-U+1D7FF</h4>                                                                         

<p>The Mathematical Alphanumeric Symbols block contains a large extension of                                                                              

letterlike symbols used in mathematical notation, typically for variables. The                                                                              

characters in this block are intended for use only in mathematical or technical                                                                              

notation; they are not intended for use in non-technical text. When used with                                                                              

markup languages, for example with <a href="#mathml">MathML</a>&nbsp; <i><a                                                                             

href="http://www.w3.org/TR/REC-MathML/">Mathematical Markup Language (MathML&trade;)</a>                                                                              

</i>the characters are expected to be used directly, instead of indirectly via                                                                              

entity references or by composing them from base letters and style markup.&nbsp;</p>                                                                             

<p><b><i>Words Used as Variables.</i></b> In some specialties, whole words are                                                                              

used as variables, not just single letters. For these cases, style markup is                                                                              

preferred because in ordinary mathematical notation the juxtaposition of                                                                              

variables generally implies multiplication, not word formation as in ordinary                                                                              

text. Markup not only provides the necessary scoping in these cases, it also                                                                              

allows the use of a more extended alphabet.</p>                                                                             

<h4>Mathematical Alphabets</h4>                                                                             

<p><b><i>Basic Set of Alphanumeric Characters. </i></b>Mathematical notation                                                                              

uses a basic set of mathematical alphanumeric characters which consists of:</p>                                                                             

<ul>                                                                             

   <li>the set of basic Latin digits (0 - 9) (U+0030..U+0039)</li>                                                                            

   <li>the set of basic upper- and lowercase Latin letters (a - z, A - Z)</li>                                                                            

   <li>the uppercase Greek letters &#0913; - &#0937; (U+0391..U+03A9),                                                                              

plus the nabla &#8711; (U+2207) and the variant of theta &#1012; given by                                                                              

U+03F4</li>                                                                             

   <li>the lowercase Greek letters &#0945; - &#0969; (U+03B1..U+03C9),                                                                              

plus the partial differential sign &#8706; (U+2202) and the six glyph variants of                                                                 

&#0949;, &#0952;, &#0954;, &#0966;, &#0961;, and &#0960;,                                                                             

	given by U+03F5, U+03D1, U+03F0, U+03D5, U+03F1, and U+03D6.                                                                           

   </li>                                                                             

</ul>                                                                             

<p>Only unaccented forms of the letters are used for mathematical notation,                                                                              

because general accents such as the acute accent would interfere with common                                                                              

mathematical diacritics. Examples of common mathematical diacritics that can                                                                              

interfere with general accents are the circumflex, macron, or the single or                                                                              

double dot above, the latter two of which are used in physics to denote                                                                              

derivatives with respect to the time variable. Mathematical symbols with                                                                              

diacritics are always represented by combining character sequences.</p>                                                                             

<p>For some characters in the basic set of Greek characters, two variants of the                                                                              

same character are included. This is because they can appear in the same                                                                              

mathematical document with different meanings, even though they would have the                                                                              

same meaning in Greek text.</p>                                                                             

<p><b><i>Additional Characters.</i></b> In addition to this basic set,                                                                              

mathematical notation also uses the four Hebrew-derived characters                                                                              

(U+2135..U+2138). Occasional uses of other alphabetic and numeric characters are                                                                              

known. Examples include U+0428 CYRILLIC CAPITAL LETTER SHA, U+306E HIRAGANA                                                                              

LETTER NO, and Eastern Arabic-Indic digits (U+06F0..U+06F9). However, these                                                                              

characters are used in only the basic form.</p>                                                                             

<p><b><i>Semantic Distinctions.</i></b> Mathematical notation requires a number                                                                              

of Latin and Greek alphabets that initially appear to be mere font variations of                                                                              

one another. For example, the letter H can appear as plain, or upright (H), bold                                                                              

(<b>H</b>), italic (<i>H</i>) and script. However, in any given document, these                                                                              

characters have distinct, and usually unrelated mathematical semantics. For                                                                              

example, a normal H represents a different variable from a bold <b>H</b>, etc.&nbsp;If                                                                              

these attributes are dropped in plain text, the distinctions are lost and the                                                                              

meaning of the text is altered.&nbsp;Without the distinctions, the well-known                                                                              

Hamiltonian formula</p>                                                                             

<blockquote>                                                                             

  <p><img border="0" src="hamilton.gif" width="218" height="43" alt="Hamiltonian formula"></p>                                                                           

</blockquote>                                                                           

<p>turns into this <i>integral</i> equation in the variable H</p>                                                                           

<blockquote>                                                                           

  <img border="0" src="integral.gif" width="213" height="40" alt="Integral equation">                                                                           

</blockquote>                                                                           

<p>By encoding a separate set of alphabets, it is possible to preserve such                                                                            

distinctions in plain text.</p>                                                                           

<p><b><i>Mathematical Alphabets. </i></b>The alphanumeric symbols encountered in                                                                            

mathematics and encoded in the Unicode Standard are given in the following                                                                            

table:</p>                                                                           

<div align="center">                                                                           

  <table border="2" cellpadding="2">                                                                           

    <tr>                                                                           

      <td valign="top">                                                                           

        <p><b>Math Style</b></p>                                                                           

      </td>                                                                           

      <td valign="top">                                                                           

        <p><b>Characters from Basic Set</b></p>                                                                           

      </td>                                                                           

      <td valign="top">                                                                           

        <p><b>Location</b></p>                                                                           

      </td>                                                                           

    </tr>                                                                           

    <tr>                                                                           

      <td valign="top">                                                                           

        <p>plain (upright, serifed)</p>                                                                           

      </td>                                                                           

      <td valign="top">                                                                           

        <p>Latin, Greek and digits</p>                                                                           

      </td>                                                                           

      <td valign="top">                                                                           

        <p>BMP</p>                                                                           

      </td>                                                                           

    </tr>                                                                           

    <tr>                                                                           

      <td valign="top">                                                                           

        <p>bold</td>                                                                           

      <td valign="top">                                                                           

        <p>Latin, Greek and digits</p>                                                                           

      </td>                                                                           

      <td valign="top">                                                                           

        <p>Plane 1</p>                                                                           

      </td>                                                                           

    </tr>                                                                           

    <tr>                                                                           

      <td valign="top">                                                                           

        <p>italic</td>                                                                           

      <td valign="top">                                                                           

        <p>Latin and Greek</p>                                                                           

      </td>                                                                           

      <td valign="top">                                                                           

        <p>Plane 1*</p>                                                                           

      </td>                                                                           

    </tr>                                                                           

    <tr>                                                                           

      <td valign="top">                                                                           

        <p>bold italic</p>                                                                           

      </td>                                                                           

      <td valign="top">                                                                           

        <p>Latin and Greek</p>                                                                           

      </td>                                                                           

      <td valign="top">                                                                           

        <p>Plane 1</p>                                                                           

      </td>                                                                           

    </tr>                                                                           

    <tr>                                                                           

      <td valign="top">                                                                           

        <p>script (calligraphic)</p>                                                                           

      </td>                                                                           

      <td valign="top">                                                                           

        <p>Latin</td>                                                                           

      <td valign="top">                                                                           

        <p>Plane 1*</p>                                                                           

      </td>                                                                           

    </tr>                                                                           

    <tr>                                                                           

      <td valign="top">                                                                           

        <p>bold script (calligraphic)</p>                                                                           

      </td>                                                                           

      <td valign="top">                                                                           

        <p>Latin</p>                                                                           

      </td>                                                                           

      <td valign="top">                                                                           

        <p>Plane 1</p>                                                                           

      </td>                                                                           

    </tr>                                                                           

    <tr>                                                                           

      <td valign="top">                                                                           

        <p>Fraktur</p>                                                                           

      </td>                                                                           

      <td valign="top">                                                                           

        <p>Latin</td>                                                                           

      <td valign="top">                                                                           

        <p>Plane 1*</p>                                                                           

      </td>                                                                           

    </tr>                                                                           

    <tr>                                                                           

      <td valign="top">                                                                           

        <p>bold Fraktur</p>                                                                           

      </td>                                                                           

      <td valign="top">                                                                           

        <p>Latin</td>                                                                           

      <td valign="top">                                                                           

        <p>Plane 1</p>                                                                           

      </td>                                                                           

    </tr>                                                                           

    <tr>                                                                           

      <td valign="top">                                                                           

        <p>double-struck</td>                                                                           

      <td valign="top">                                                                           

        <p>Latin and digits</p>                                                                           

      </td>                                                                           

      <td valign="top">                                                                           

        <p>Plane 1*</p>                                                                           

      </td>                                                                           

    </tr>                                                                           

    <tr>                                                                           

      <td valign="top">                                                                           

        <p>sans-serif</td>                                                                           

      <td valign="top">                                                                           

        <p>Latin and digits</p>                                                                           

      </td>                                                                           

      <td valign="top">                                                                           

        <p>Plane 1</p>                                                                           

      </td>                                                                           

    </tr>                                                                           

    <tr>                                                                           

      <td valign="top">                                                                           

        <p>sans-serif bold</p>                                                                           

      </td>                                                                           

      <td valign="top">                                                                           

        <p>Latin, Greek and digits</p>                                                                           

      </td>                                                                           

      <td valign="top">                                                                           

        <p>Plane 1</p>                                                                           

      </td>                                                                           

    </tr>                                                                           

    <tr>                                                                           

      <td valign="top">                                                                           

        <p>sans-serif italic</p>                                                                           

      </td>                                                                           

      <td valign="top">                                                                           

        <p>Latin</td>                                                                           

      <td valign="top">                                                                           

        <p>Plane 1</p>                                                                           

      </td>                                                                           

    </tr>                                                                           

    <tr>                                                                           

      <td valign="top">                                                                           

        <p>sans-serif bold italic</p>                                                                           

      </td>                                                                           

      <td valign="top">                                                                           

        <p>Latin and Greek</p>                                                                           

      </td>                                                                           

      <td valign="top">                                                                           

        <p>Plane 1</p>                                                                           

      </td>                                                                           

    </tr>                                                                           

    <tr>                                                                           

      <td valign="top">                                                                           

        <p>monospace</p>                                                                           

      </td>                                                                           

      <td valign="top">                                                                           

        <p>Latin and digits</p>                                                                           

      </td>                                                                           

      <td valign="top">                                                                           

        <p>Plane 1</p>                                                                           

      </td>                                                                           

    </tr>                                                                           

  </table>                                                                           

</div>                                                                           

<p align="center">* Some of these alphabets have characters in the BMP as noted                                                                            

in the text that follows.</p>                                                                           

<p>The plain letters have been unified with the existing characters in the Basic                                                                              

Latin and Greek blocks. There are 25 double-struck, italic, Fraktur and script                                                                              

characters that already exist in the Letterlike Symbols block (U+2100..U+214F).                                                                              

These are explicitly unified with the characters in this block and corresponding                                                                              

holes have been left in the mathematical alphabets.&nbsp;</p>                                                                             

<p>The alphabets in this block encode only semantic distinction, but not which                                                                              

specific font will be used to supply the actual plain, script, Fraktur,                                                                              

double-struck, sans-serif, or monospace glyphs. Especially the script and                                                                              

double-struck styles can show considerable variation across fonts. Mathematical                                                                              

Alphanumeric Symbols are not to be used for non-mathematical styled text.</p>                                                                             

<p><i><b>Compatibility Decompositions.</b></i> All mathematical alphanumeric                                                                              

symbols have compatibility decompositions to the base Latin and Greek letters --                                                                              

folding away such distinctions, however, is usually not desirable as it loses                                                                              

the semantic distinctions for which these characters were encoded. See <a                                                                             

href="../tr15/">Unicode Standard Annex #15, Unicode Normalization Forms</a> for                                                                              

more information.</p>                                                                             

<h4>Fonts Used for Mathematical Alphabets</h4>                                                                             

<p>Mathematicians place strict requirements on the <i>specific</i> fonts being                                                                              

used to represent mathematical variables. Readers of a mathematical text need to                                                                              

be able to distinguish single letter variables from each other, even when they                                                                              

don't appear in close proximity. They must be able to recognize the letter                                                                              

itself, whether it is part of the text or is a mathematical variable, and lastly                                                                              

which mathematical alphabet it is from.</p>                                                                             

<p>Mathematical variables are most commonly set in a form of italics, but not                                                                              

all italic fonts can be used successfully. In common text fonts, the <i>italic                                                                              

letter v</i> and <i>Greek letter nu</i> are not very distinct. A rounded <i>italic                                                                              

letter v</i> is therefore preferred in a mathematical font. There are other                                                                              

characters which sometimes have similar shapes and require special attention to                                                                              

avoid ambiguity. Examples are shown in the table below.</p>                                                                             

<p align="center"><img border="0" src="greek.gif" alt="Examples" width="369" height="217"></p>                                                                             

<p><b><i>Hard-to-distinguish Letters.</i></b> Not all sans-serif fonts allow an                                                                              

easy distinction between <i>lowercase l</i>, and <i>uppercase I</i><span                                                                             

style="font-family:Arial"> </span>and not all monospaced (monowidth) fonts allow a                                                                              

distinction between the <i>letter l</i> and the <i>digit one</i>. Such fonts are                                                                              

not usable for mathematics. In Fraktur, the letters <span                                                                             

style="font-family:TmsBlackLttPF">I</span> and <span                                                                             

style="font-family:TmsBlackLttPF">J </span>in particular must be made                                                                              

distinguishable. Overburdened Black Letter forms are inappropriate. Similarly,                                                                              

the <i>digit zero</i> must be distinct from the <i>uppercase letter O</i> for                                                                              

all mathematical alphanumeric sets. Some characters are so similar that even                                                                              

mathematical fonts do not attempt to provide distinct glyphs for them.                                                                              

Their use is normally avoided in mathematical notation unless no confusion is                                                                              

possible in a given context, e.g. <i>uppercase A</i> and <i>uppercase Alpha</i>.</p>                                                                            

<p><i><b>Font Support for Combining Diacritics.</b></i> Mathematical equations                                                                             

require that characters be combined with diacritics (dots, tilde, circumflex, or                                                                             

arrows above are common), as well as followed or preceded by super- or                                                                             

subscripted letters or numbers. This requirement leads to designs for <i>italic</i>                                                                             

styles that are less inclined, and <i>script</i> styles that have smaller                                                                             

overhangs and less slant than equivalent styles commonly used for text such as                                                                             

wedding invitations.</p>                                                                            

<p><i><b>Typestyle for Script Characters.</b></i> In some instances, a                                                                             

deliberate unification with a non-mathematical symbol has been undertaken; for                                                                             

example, U+2133 is unified with the pre-1949 symbol for the German currency unit                                                                             

<i>Mark</i> and U+2113 is unified with the common non-SI symbol for the liter.                                                                             

This unification restricts the range of glyphs that can be used for this                                                                             

character in the charts. Therefore the font used for the reference glyphs in the                                                                             

code charts uses a simplified ‘English Script’ style, as per recommendation                                                                             

by the American Mathematical Society. For consistency, other script characters                                                                             

in the Letterlike Symbols block are now shown in the same typestyle.</p>                                                                            

<p><i><b>Double-struck Characters.</b></i> The double-struck glyphs shown in                                                                             

earlier editions of the standard attempted to match the design used for all the                                                                             

other Latin characters in the standard, which is based on Times. The current set                                                                             

of fonts was prepared in consultation with the American Mathematical Society and                                                                             

leading mathematical publishers, and shows much simpler forms that are derived                                                                             

from the forms written on a blackboard. However, both serifed and non-serifed                                                                             

forms can be used in mathematical texts, and inline fonts are found in works                                                                             

published by certain publishers.</p>                                                                            

<h3>12.10 Byzantine Musical Symbols (new section)</h3>                                                                            

<h4>Byzantine Musical Symbols: U+1D000-U+1D0FF</h4>                                                                            

<p>Byzantine musical notation first appeared in the seventh or eighth century                                                                             

CE, developing more fully by the tenth century. Byzantine Musical Symbols are                                                                             

chiefly used to write the religious music and hymns of the the Christian                                                                             

Orthodox Church, though folk music manuscripts are also known. In 1881, the                                                                             

Orthodox Patriarchy Musical Committee redefined some of the signs and                                                                             

established the New Analytical Byzantine Musical Notation System, which is in                                                                             

use today. About 95% of the more than 7000 musical manuscripts using this system                                                                             

are in Greek. Other manuscripts are in Russian, Bulgarian, Romanian, and Arabic.</p>                                                                            

<p><b><i>Processing.</i></b> Computer representation of Byzantine Musical                                                                             

Symbols is quite recent, although typographic publication of religious music                                                                             

books began in 1820. Two kinds of applications have been developed: applications                                                                             

to enable musicians to write the books they use, and applications which compare                                                                             

or convert this musical notation system to the standard Western system. (See <i>Musical                                                                             

Symbols</i>, U+1D100..U+1D1FF.)</p>                                                                            

<p>Byzantine Musical Symbols are divided into fifteen classes according to                                                                             

function. Characters interact with one another in the horizontal and vertical                                                                             

dimension. There are three horizontal &quot;stripes&quot; in which various                                                                             

classes generally appear, and rules as to how other characters interact within                                                                             

them. These rules are still being specified, and at present the plain-text                                                                             

manipulation of Byzantine musical symbols, like that of Western musical symbols,                                                                             

is outside the scope of the Unicode Standard.</p>                                                                            

<h3>12.11 Musical Symbols (new section)</h3>                                                                            

<h4>Musical Symbols: U+1D100-U+1D1FF</h4>                                                                            

<p>The Musical Symbols encoded in the Unicode Standard are intended to cover basic                                       

Western&nbsp; musical notation and its antecedents: mensural notation, and plainsong                                                                              

(or Gregorian) notation. The most comprehensive coded language in regular use                                                                              

for representing sound is the common musical notation (CMN) of the Western                                                                              

world. Western musical notation is a system of symbols that is relatively, but                                                                              

not completely, self-consistent and relatively stable but still, like music                                                                              

itself, evolving. It is an open-ended system that has survived over time partly                                                                              

because of its flexibility and extensibility. In the Unicode Standard, Musical                                                                              

Symbols have been drawn primarily from CMN. Commonly recognized additions to the                                                                              

CMN repertoire, such as quarter-tone accidentals, cluster noteheads, and                                                                              

shape-note noteheads have also been included.</p>                                                                             

<p>Graphical score elements are not included in the Musical Symbols                                                                              

block. These are pictographs usually created for a specific repertoire                                                                              

(sometimes even a single piece). Characters which have some specialized meaning                                                                              

in music but are found in other character sets, are also not included. These                                                                              

include numbers for time signatures and figured basses, letters for section                                                                              

labels and Roman numeral harmonic analysis, etc.</p>                                                                             

<p>Musical Symbols are used worldwide in a more-or-less standard manner by a                                                                              

very large group of users. The symbols frequently occur in running text and may                                                                              

be treated as simple spacing characters with no special properties, with a few                                                                              

exceptions. Musical symbols are used in contexts such as theoretical works,                                                                              

pedagogical texts, terminological dictionaries, bibliographic databases,                                                                              

thematic catalogues, and databases of musical data. The Musical Symbol                                                                              

characters are also intended to be used within higher-level protocols, such as                                                                              

music description languages and file formats for the representation of musical                                                                              

data and musical scores.</p>                                                                             

<p>Because of the complexities of layout and of pitch representation in general,                                                                              

the encoding of musical pitch is intentionally outside the scope of the Unicode                                                                              

Standard. The Musical Symbol block provides a common set of elements for                                                                              

interchange and processing. Encoding of pitch, and layout of resulting musical                                                                              

structure, involves not only specifications for the vertical relationship                                                                              

between multiple notes simultaneously, but in multiple staves, between                                                                              

instrumental parts, and so forth. These musical features are expected to be                                                                              

handled entirely in higher-level protocols making use of the proposed graphical                                                                              

elements. Lack of pitch encoding is not a shortcoming, but is a necessary                                                                              

feature of the encoding.</p>                                                                             

<p>Three characters, U+266D MUSIC FLAT SIGN, U+266E MUSIC NATURAL SIGN, and                                                                              

U+266F MUSIC SHARP SIGN, which occur frequently in music notation are encoded in                                                                              

the Miscellaneous Symbols block (U+2600..U+267F). However, four characters also                                                                              

encoded in that block are to be interpreted merely as dingbats or miscellaneous                                                                              

symbols, not as representing actual musical notes. These are:</p>                                                                             

<ul>                                                                             

  <li>U+2669 QUARTER NOTE</li>                                                                             

  <li>U+266A EIGHTH NOTE</li>                                                                             

  <li>U+266B BEAMED EIGHTH NOTES</li>                                                                             

  <li>U+266C BEAMED SIXTEENTH NOTES</li>                                                                             

</ul>                                                                             

<p>The <i>punctum</i>, or Gregorian <i>brevis</i>,                                            

    a square shape, is unified with the U+1D147 MUSICAL SYMBOL SQUARE NOTEHEAD                                      

BLACK.  The                                            

    Gregorian <i>semi-brevis</i>, a diamond or lozenge shape, is unified                                            

    with U+1D1BA MUSICAL SYMBOL SEMIBREVIS BLACK.  Thus, Gregorian notation, medieval notation, and                                            

    modern notation require either separate fonts in practice, or                                            

    need font features to differentiate subtly different shapes                                            

    where required.                                            

</p>                                                                             

<p><b><i>Processing.</i></b> Most musical symbols can be thought of as simple                                                                              

spacing characters when used in-line within texts and examples, even though they                                                                              

behave in a more complex manner in full musical layout. Some characters are                                                                              

meant only to be combined with others to produce combined character sequences,                                                                              

representing musical notes and their particular articulations. Musical symbols                                                                              

can be input, processed, and displayed in a manner similar to mathematical                                                                              

symbols. When embedded in text, most of the symbols are simple spacing                                                                              

characters with no special properties. There are a few characters with format                                                                              

control functions which are described below.</p>                                                                             

<p><b><i>Input Methods</i></b>. Musical symbols can be entered via standard                                                                              

alphanumeric keyboard, piano keyboard or other device, or by a graphical method.                                                                              

Keyboard input of the musical symbols may make use of techniques similar to                                                                              

those used for Chinese, Japanese, and Korean. In addition, input methods                                                                              

utilizing pointing devices or piano keyboards could be developed similar to                                                                              

those in existing musical layout systems. For example, within a graphical user                                                                              

interface, the user could choose symbols from a palette-style menu.</p>                                                                             

<p><i><b>Directionality.</b></i> There are no known bidirectional implications                                                                              

for Musical Symbols. When combined with right-to-left texts, in Hebrew or Arabic                                                                              

for example, the music notation is still written left-to-right as usual. The                                                                              

words are divided into syllables and placed under or above the notes in the same                                                                              

fashion as for Latin scripts. The individual words or syllables corresponding to                                                                              

each note, however, are written in the dominant direction of the script.</p>                                                                             

<p><i><b>Format Characters.</b></i> Extensive ligature-like beams are used                                                                              

frequently in music notation between groups of notes having short values. The                                                                              

practice is widespread and very regular, and is amenable to algorithmic                                                                              

handling. The format characters U+1D173 MUSICAL SYMBOL BEGIN BEAM and U+1D174                                                                              

MUSICAL SYMBOL END BEAM can be used to indicate the extents of beam groupings. In some exceptional cases, beams are left-unclosed on                                                                              

one end. This can be indicated with a U+1D159 MUSICAL SYMBOL NULL NOTEHEAD                                                                              

character if no stem is to appear at the end of the beam.</p>                                                                             

<p>Similarly, format characters have been provided for other connecting                                                                              

structures. The characters U+1D175 MUSICAL SYMBOL BEGIN TIE, U+1D176 MUSICAL                                                                              

SYMBOL END TIE, U+1D177 MUSICAL SYMBOL BEGIN SLUR, U+1D178 MUSICAL SYMBOL END                                                                              

SLUR, U+1D179 MUSICAL SYMBOL BEGIN PHRASE, and U+1D17A MUSICAL SYMBOL END PHRASE                                                                              

indicate the extent of these features. Like beaming, these features are easily                                                                              

handled in an algorithmic fashion.</p>                                                                             

<p>These pairs of characters modify the layout and grouping of notes and phrases                                                                              

in full music notation. When musical examples are written or rendered in plain                                                                              

text without special software, the start/end format characters may be rendered                                                                              

as brackets or left uninterpreted.&nbsp; To the extent possible, more                                                                              

sophisticated in-line software may interpret them in their actual format control                                                                              

capacity, rendering slurs, beams, and so forth as appropriate.</p>                                                                             

<p><b><i>Precomposed Note Characters.</i></b> For maximum flexibility, the                                                                              

character set includes both precomposed note values and primitives from which                                                                              

complete notes may be constructed. The precomposed versions are provided mainly                                                                              

for convenience. However, if any normalization form is applied, the characters                                                                              

will be decomposed. For further information, see <a                                                                             

href="http://www.unicode.org/unicode/reports/tr15/">Unicode Standard Annex #15,                                                                              

Unicode Normalization Forms</a>. The canonical equivalents for these characters                                                                              

are given in the Unicode Character Database, and illustrated in the table below.                                                                              

In this table and subsequent examples, the names of the Unicode Musical Symbol                                                                              

characters are abbreviated by omitting the phrases MUSICAL SYMBOL or MUSICAL                                                                              

SYMBOL ORNAMENT.</p>                                                                             

<table border="0" cellspacing="2" cellpadding="2">                                                                             

  <tr>                                                                             

    <td valign="top">&nbsp;                                                                             

    <td valign="top"><b>Precomposed note</b>                                                                             

    <td valign="top"><b>Equivalent to</b>                                                                             

  <tr>                                                                             

    <td valign="top"><img src="half-note.gif" alt="half note" width="85"                                                                             

      height="44">                                                                             

    <td valign="top">1D15E HALF NOTE                                                                             

    <td valign="top">1D157 VOID NOTEHEAD + 1D165 COMBINING STEM                                                                             

  <tr>                                                                             

    <td valign="top"><img src="quarter-note.gif" alt="quarter note" width="85"                                                                             

      height="44">                                                                             

    <td valign="top">1D15F QUARTER NOTE                                                                             

    <td valign="top">1D158 NOTEHEAD BLACK + 1D165 COMBINING STEM                                                                             

  <tr>                                                                             

    <td valign="top"><img src="eighth-note.gif" alt="eighth note" width="136"                                                                             

      height="44">                                                                             

    <td valign="top">1D160 EIGHTH NOTE                                                                             

    <td valign="top">1D158 NOTEHEAD BLACK + 1D165 COMBINING STEM + 1D16E                                                                              

      COMBINING FLAG-1                                                                             

  <tr>                                                                             

    <td valign="top"><img src="sixteenth-note.gif" alt="sixteenth note"                                                                             

      width="136" height="44">                                                                             

    <td valign="top">1D161 SIXTEENTH NOTE                                                                             

    <td valign="top">1D158 NOTEHEAD BLACK + 1D165 COMBINING STEM + 1D16F                                                                              

      COMBINING FLAG-2                                                                             

  <tr>                                                                             

    <td valign="top"><img src="thirty-second-note.gif" alt="thirty-second note"                                                                             

      width="136" height="44">                                                                             

    <td valign="top">1D162 THIRTY-SECOND NOTE                                                                             

    <td valign="top">1D158 NOTEHEAD BLACK + 1D165 COMBINING STEM + 1D170                                                                              

      COMBINING FLAG-3                                                                             

  <tr>                                                                             

    <td valign="top"><img src="sixty-fourth-note.gif" alt="sixty-fourth note"                                                                             

      width="136" height="44">                                                                             

    <td valign="top">1D163 SIXTY-FOURTH NOTE                                                                             

    <td valign="top">1D158 NOTEHEAD BLACK + 1D165 COMBINING STEM + 1D171                                                                              

      COMBINING FLAG-4                                                                             

  <tr>                                                                             

    <td valign="top"><img src="one-twenty-eighth-note.gif"                                                                             

      alt="one hundred twenty-eighth note" width="136" height="44">                                                                             

    <td valign="top">1D164 ONE HUNDRED TWENTY-EIGHTH NOTE                                                                             

    <td valign="top">1D158 NOTEHEAD BLACK + 1D165 COMBINING STEM + 1D172                                                                              

      COMBINING FLAG-5                                                                             

</table>                                                                             

<p><b><i>Alternative Noteheads.</i></b> More complex notes built up from                                                                              

alternative noteheads, stems, flags, and articulation symbols are necessary for                                                                              

complete implementations and complex scores. Examples of their use include                                                                              

American shape-note and modern percussion notations. For example:</p>                                                                             

<table border="0" cellspacing="2" cellpadding="1">                                                                             

  <tr>                                                                             

    <td valign="top"><img src="square-notehead.gif" alt="square notehead"                                                                             

      width="85" height="44">                                                                             

    <td valign="top">1D147 SQUARE NOTEHEAD BLACK + 1D165 COMBINING STEM                                                                             

  <tr>                                                                             

    <td valign="top"><img src="x-notehead.gif" alt="x notehead" width="85" height="44">                                                                             

    <td valign="top">1D143 X NOTEHEAD + 1D165 COMBINING STEM                                                                             

</table>                                                                             

<p><i><b>Augmentation Dots and Articulation Symbols.</b></i> Augmentation dots                                                                              

and articulation symbols may be appended to either the precomposed or built-up                                                                              

notes. In addition, augmentation dots and articulation symbols may be repeated                                                                              

as necessary to build a complete note symbol. Examples of the use of                                                                              

augmentation dots are shown in the table below.</p>                                                                             

<table border="0" cellspacing="2" cellpadding="1">                                                                             

  <tr>                                                                             

    <td valign="top"><img src="eighth-note-aug.gif" alt="augmented eighth note"                                                                             

      width="176" height="44">                                                                             

    <td valign="top">1D160 EIGHTH NOTE + 1D16D COMBINING AUGMENTATION DOT                                                                             

    <td valign="top">1D158 NOTEHEAD BLACK + 1D165 COMBINING STEM + 1D16E                                                                              

      COMBINING FLAG-1 + 1D16D COMBINING AUGMENTATION DOT                                                                             

  <tr>                                                                             

    <td valign="top">&nbsp;<img border="0" src="quarter-note-stacatto.gif"                                                                             

      width="129" height="44">                                                                             

    <td valign="top">1D15F QUARTER NOTE + 1D17C COMBINING STACCATO                                                                             

    <td valign="top">1D158 NOTEHEAD BLACK + 1D165 COMBINING STEM + 1D17C                                                                              

      COMBINING STACCATO                                                                             

  <tr>                                                                             

    <td valign="top">&nbsp;<img border="0" src="eighth-note-acc-aug-aug.gif"                                                                             

      width="263" height="44">                                                                             

    <td valign="top">1D160 EIGHTH NOTE + 1D16D COMBINING AUGMENTATION DOT +                                                                              

      1D16D COMBINING AUGMENTATION DOT + 1D17B COMBINING ACCENT                                                                             

    <td valign="top">1D158 NOTEHEAD BLACK + 1D165 COMBINING STEM + 1D16E                                                                              

      COMBINING FLAG-1 + 1D17B COMBINING ACCENT + 1D16D COMBINING AUGMENTATION                                                                              

      DOT + 1D16D COMBINING AUGMENTATION DOT                                                                             

</table>                                                                             

<p><b><i>Ornamentation Chart.</i></b> Included below is a list of common                                                                              

eighteenth-century ornaments and the combining sequences of characters from                                                                              

which they can be generated.</p>                                                                             

<table border="0" cellspacing="2" cellpadding="1">                                                                             

  <tr>                                                                             

    <td valign="top"><img src="orn-2-3.gif" alt="ornament" width="59"                                                                             

      height="20">                                                                             

    <td valign="top">1D19C STROKE-2 + 1D19D STROKE-3                                                                             

  <tr>                                                                             

    <td valign="top"><img src="orn-2-6-3.gif" alt="ornament" width="59" height="20">                                                                             

    <td valign="top">1D19C STROKE-2 + 1D1A0 STROKE-6 + 1D19D STROKE-3                                                                             

  <tr>                                                                             

    <td valign="top"><img src="orn-6-2-2-3.gif" alt="ornament" width="59" height="20">                                                                             

    <td valign="top">1D1A0 STROKE-6 + 1D19C STROKE-2 + 1D19C STROKE-2 + 1D19D                                                                              

      STROKE-3                                                                             

  <tr>                                                                             

    <td valign="top"><img src="orn-2-2-6-3.gif" alt="ornament" width="59" height="20">                                                                             

    <td valign="top">1D19C STROKE-2 + 1D19C STROKE-2 + 1D1A0 STROKE-6 + 1D19D                                                                              

      STROKE-3                                                                             

  <tr>                                                                             

    <td valign="top"><img src="orn-2-2-9.gif" alt="ornament" width="59"                                                                             

      height="20">                                                                             

    <td valign="top">1D19C STROKE-2 + 1D19C STROKE-2 + 1D1A3 STROKE-9                                                                             

  <tr>                                                                             

    <td valign="top"><img src="orn-7-2-2-3.gif" alt="ornament" width="59"                                                                             

      height="20">                                                                             

    <td valign="top">1D1A1 STROKE-7 + 1D19C STROKE-2 + 1D19C STROKE-2 + 1D19D                                                                              

      STROKE-3                                                                             

  <tr>                                                                             

    <td valign="top"><img src="orn-8-2-2-3.gif" alt="ornament" width="59"                                                                             

      height="20">                                                                             

    <td valign="top">1D1A2 STROKE-8 + 1D19C STROKE-2 + 1D19C STROKE-2 + 1D19D                                                                              

      STROKE-3                                                                             

  <tr>                                                                             

    <td valign="top"><img src="orn-2-2-3-5.gif" alt="ornament" width="59"                                                                             

      height="20">                                                                             

    <td valign="top">1D19C STROKE-2 + 1D19C STROKE-2 + 1D19D STROKE-3 + 1D19F                                                                              

      STROKE-5                                                                             

  <tr>                                                                             

    <td valign="top"><img src="orn-7-2-2-6-3.gif" alt="ornament" width="59" height="20">                                                                             

    <td valign="top">1D1A1 STROKE-7 + 1D19C STROKE-2 + 1D19C STROKE-2 + 1D1A0                                                                              

      STROKE-6 + 1D19D STROKE-3                                                                             

  <tr>                                                                             

    <td valign="top"><img src="orn-7-2-2-3-5.gif" alt="ornament" width="59"                                                                             

      height="20">                                                                             

    <td valign="top">1D1A1 STROKE-7 + 1D19C STROKE-2 + 1D19C STROKE-2 + 1D19D                                                                              

      STROKE-3 + 1D19F STROKE-5                                                                             

  <tr>                                                                             

    <td valign="top"><img src="orn-8-2-2-6-3.gif" alt="ornament" width="59" height="20">                                                                             

    <td valign="top">1D1A2 STROKE-8 + 1D19C STROKE-2 + 1D19C STROKE-2 + 1D1A0                                                                              

      STROKE-6 + 1D19D STROKE-3                                                                             

  <tr>                                                                             

    <td valign="top"><img src="orn-1-2-2-3.gif" alt="ornament" width="59"                                                                             

      height="20">                                                                             

    <td valign="top">1D19B STROKE-1 + 1D19C STROKE-2 + 1D19C STROKE-2 + 1D19D                                                                              

      STROKE-3                                                                             

  <tr>                                                                             

    <td valign="top"><img src="orn-1-2-2-3-4.gif" alt="ornament" width="59"                                                                             

      height="20">                                                                             

    <td valign="top">1D19B STROKE-1 + 1D19C STROKE-2 + 1D19C STROKE-2 + 1D19D                                                                              

      STROKE-3 + 1D19E STROKE-4                                                                             

  <tr>                                                                             

    <td valign="top"><img src="orn-2-3-4.gif" alt="ornament" width="59"                                                                             

      height="20">                                                                             

    <td valign="top">1D19C STROKE-2 + 1D19D STROKE-3 + 1D19E STROKE-4                                                                             

</table>                                                                             

<h3><a name="layout">13.2 Layout Control</a>s (revision)</h3>                                                                             

<h4>Controlling Ligatures</h4>                                                                             

<p>In some orthographies the same letters may either ligate or not, depending on                                                                              

the intended reading. To account for this, the semantics of the ZWNJ and ZWJ                                                                              

have been extended.</p>                                                                             

<p><i>Section 13.2, Controlling Ligatures,</i><b> </b>page 318: the text is                                                                              

superseded by the following.</p>                                                                             

<blockquote>                                                                             

  <p>To allow for finer control over ligature formation, starting with Unicode                                                                              

  3.0.1 the definitions of the following characters have been broadened to cover                                                                              

  ligatures as well as cursive connection:</p>                                                                             

  <p><img align="middle" alt="X" src="U200C.gif" align="middle" width="39"                                                                             

  height="64"> U+200C ZERO WIDTH NON-JOINER</p>                                                                             

  <ul>                                                                             

    <li>The intended semantic is to break both cursive connections and ligatures                                                                              

      in rendering.</li>                                                                             

  </ul>                                                                             

  <p><img align="middle" alt="X" src="U200D.gif" align="middle" width="39"                                                                             

  height="64"> U+200D ZERO WIDTH JOINER</p>                                                                             

  <ul>                                                                             

    <li>The intended semantic is to produce a more connected rendering of                                                                              

      adjacent characters than would otherwise be the case, <i>if possible.</i>                                                                              

      In particular:<br>                                                                             

      <ol>                                                                             

        <li>If the two characters could form a ligature, but do not normally,                                                                              

          ZWJ requests that the ligature be used.</li>                                                                             

        <li>Otherwise, if either of the characters could cursively connect, but                                                                              

          do not normally, ZWJ requests that each of the characters take a                                                                              

          cursive-connection form where possible.                                                                             

          <ul>                                                                             

            <li>In a sequence like &lt;X, ZWJ, Y&gt;, where a cursive form                                                                              

              exists for X, but not for Y, the presence of ZWJ requests a                                                                              

              cursive form for X.</li>                                                                             

          </ul>                                                                             

        </li>                                                                             

        <li>Otherwise, where neither a ligature nor cursive connection are                                                                              

          available, the ZWJ has no effect.</li>                                                                             

      </ol>                                                                             

    </li>                                                                             

  </ul>                                                                             

  <p>In other words given three broad categories below, ZWJ requests that glyphs                                                                              

  in the highest available category (for the given font) be used; ZWNJ requests                                                                              

  that glyphs in the lowest available category (for the given font) be used:</p>                                                                             

  <ol>                                                                             

    <li>unconnected</li>                                                                             

    <li>cursively connected</li>                                                                             

    <li>ligated</li>                                                                             

  </ol>                                                                             

  <p>For those unusual circumstances where someone wants to forbid ligatures in                                                                              

  a sequence XY, but promote cursive connection, the sequence &lt;X, ZWJ, ZWNJ,                                                                              

  ZWJ, Y&gt; can be used. The ZWNJ breaks ligatures, while the two adjacent                                                                              

  joiners cause the X and Y to take adjacent cursive forms (where they exist).                                                                              

  Similarly, if someone wanted to have X take a cursive form but Y be isolated,                                                                              

  then the sequence &lt;X, ZWJ, ZWNJ, Y&gt; could be used (as in previous                                                                              

  versions of the Unicode Standard). Examples are shown in the table below.</p>                                                                             

  <p>Note: Zero width joiner (ZWJ) has a special function when used with Indic                                                                              

  scripts. See <i>Section 9.1, Devanagari</i>, page 215.</p>                                                                             

  <p><i><b>Examples.</b></i> The following provide samples of desired renderings                                                                              

  when the joiner or non-joiner are inserted between two characters. In the                                                                              

  Arabic examples, the characters on the left side are in visual order already,                                                                              

  but have not yet been shaped. This presumes that all of the glyphs are                                                                              

  available in the font. If, for example, the ligatures are not available, the                                                                              

  display would fallback to the unligatured forms.</p>                                                                             

  <p align="center"><img border="0" src="zwjaction.gif" alt="Sample Display Actions" width="455" height="380"></p>                                                                             

</blockquote>                                                                             

<blockquote>                                                                             

  <p><i><b>Implementation Notes.</b></i> For modern font technologies, such as                                                                              

  OpenType or AAT, font vendors should add ZWJ to their ligature mapping tables                                                                              

  as appropriate. Thus where a font had a mapping from <code>&quot;f&quot; +                                                                              

  &quot;i&quot;</code> to <img alt="middle" alt="middle" src="UFB01.gif"                                                                             

  width="11" height="32" align="middle">, the font designer should add the                                                                              

  additional mapping from <code>&quot;f&quot; + ZWJ + &quot;i&quot;</code> to <img                                                                             

  alt="middle" alt="middle" src="UFB01.gif" width="11" height="32"                                                                             

  align="middle">. On the other hand, ZWNJ will normally have the desired effect                                                                              

  naturally for most fonts without any change, since it simply obstructs the                                                                              

  normal ligature/cursive connection behavior. As with all other alternate                                                                              

  format characters, fonts should use an invisible zero-width glyph for                                                                              

  representation of both ZWJ and ZWNJ.</p>                                                                             

  <p><i><b>Effects on Existing Data.</b></i> Existing data should only rarely contain                                              

ZWJ between characters that normally connect cursively, since in previous                                              

versions of the standard such use was simply redundant. In poor                                              

implementations such a redundant ZWJ conceivably could have resulted in a                                              

broken cursive connection -- data generated for such implementations would                                              

almost certainly be free of ZWJs not needed for shaping. The vast majority                                              

of existing data can be rendered with newer implementations without any                                              

change in appearance.</p>                                             

<p><i><b>Effects on Existing Implementations.</b></i> Existing rendering algorithms                                              

support ZWJ only as far as it affects shaping. If such an implementation                                              

receives newer text, the ZWJ either has no effect, or, in a poor                                              

implementation of a shaping algorithm, could lead to a broken cursive                                              

connection. However, occurrence of ZWJ was never restricted, so even                                              

existing algorithms should have been prepared to handle it gracefully.</p>                                                                             

</blockquote>                                                                             

<h3><a name="tag">13.7 Tag Characters</a> (new section)</h3>                                                                             

<h4>Tag Characters: U+E0000-U+E007F</h4>                                                                             

<p>The characters in this block provide a mechanism for language tagging in                                                                              

Unicode plain text. <i>However, the use of these characters is strongly                                                                              

discouraged.</i> The characters in this block are reserved for use with special                                                                              

protocols. They are <i>not</i> to be used in the absence of such protocols, or                                                                              

with <i>any</i> protocols that provide alternate means for language tagging,                                                                              

such as HTML or XML. The requirement for language information embedded in plain                                                                              

text data is often overstated. See <i>Section 5.11, Language Information in                                                                              

Plain Text</i> in <i>The Unicode Standard, Version 3.0</i>.</p>                                                                             

<p>This block encodes a set of 95 special-use tag characters to enable the                                                                              

spelling out of ASCII-based string tags using characters which can be strictly                                                                              

separated from ordinary text content characters in Unicode. These tag characters                                                                              

can be embedded by protocols into plain text. They can be identified and/or                                                                              

ignored by implementations with trivial algorithms because there is no                                                                              

overloading of usage for these tag characters--they can only express tag values                                                                              

and never textual content itself.</p>                                                                             

<p>In addition to these 95 characters, one language tag identification character                                                                              

and one cancel tag character are also encoded. The language tag identification                                                                              

character identifies a tag string as a language tag; the language tag itself                                                                              

makes use of RFC 3066 (or its successors) language tag strings spelled out using                                                                              

the tag characters from this block.&nbsp;                                                                             

<p>Four terms (tagging, annotation, out-of-band and in-band) which are used in                                                                              

special senses here are defined in the <a href="../../../glossary/">Glossary</a>.</p>                                                                             

<h4>Syntax for Embedding Tags</h4>                                                                             

In order to embed any ASCII-derived tag in Unicode plain text, the tag is                                                                              

spelled out with corresponding tag characters, prefixed with the relevant tag                                                                              

identification character. The resultant string is embedded directly in the text.                                                                             

<p><i><b>Tag Identification.</b> </i>The tag identification character is used as                                                                              

a mechanism for identifying tags of different types. In the future, this could                                                                              

enable multiple types of tags embedded in plain text to coexist.                                                                             

<p><i><b>Tag Termination.</b></i> No termination character is required for the                                                                              

tag itself, because all characters that make up the tag are numerically distinct                                                                              

from any non-tag character. A tag terminates either at the first non-tag                                                                              

character (i.e. any other normal Unicode value), or at next tag identification                                                                              

character. A detailed BNF syntax for tags is listed below.                                                                             

<p><i><b>Language Tags.</b> </i>A string of tag characters prefixed by U+E0001                                                                              

LANGUAGE TAG is specified to constitute a language tag. Furthermore, the tag                                                                              

values for the language tag are to be spelled out as specified in RFC 3066,                                                                              

making use only of registered tag values or of user-defined language tags                                                                              

starting with the characters &quot;x-&quot;.</p>                                                                             

<p>For example, consider embedding a language tag for Japanese. The Japanese tag                                                                              

from RFC 3066 is &quot;ja&quot; (composed of ISO 639 language id) or,                                                                              

alternatively, &quot;ja-JP&quot; (composed of ISO 639 language id plus ISO 3166                                                                              

country id). Since RFC 3066 specifies that language tags are not case                                                                              

significant, it is recommended that for language tags, the entire tag be                                                                              

lowercased before conversion to tag characters.                                                                             

<p>Thus the entire language tag in its &quot;ja-JP&quot; would be converted to                                                                              

the tag characters as follows:                                                                             

<p>U+E0001 U+E006A U+E0061 U+E002D U+E006A U+E0070                                                                             

<p>The language tag, in its shorter, &quot;ja&quot; form, would be expressed as                                                                              

follows:                                                                             

<p>U+E0001 U+E006A U+E0061                                                                             

<p><b><i>Tag Scope and Nesting</i>. </b>The value of an established tag                                                                              

continues from the point the tag is embedded in text until either:</p>                                                                             

<blockquote>                                                                             

  A. The text itself goes out of scope, as defined by the application. (E.g. for                                                                              

  line-oriented protocols, when reaching the end-of-line or end-of-string; for                                                                              

  text streams, when reaching the end-of-stream; etc.)                                                                             

</blockquote>                                                                             

or                                                                             

<blockquote>                                                                             

  B. The tag is explicitly canceled by the U+E007F CANCEL TAG character.                                                                             

</blockquote>                                                                             

Tags of the <i>same</i> type cannot be nested in any way. For example, if a new                                                                              

embedded language tag occurs following text which was already language tagged,                                                                              

the tagged value for subsequent text simply changes to that specified in the new                                                                              

tag.                                                                             

<p>Tags of different types can have interdigitating scope, but not hierarchical                                                                              

scope. In effect, tags of different types completely ignore each other, so that                                                                              

the use of language tags can be completely asynchronous with the use of future                                                                              

tag types.                                                                             

<p><i><b>Canceling Tag Values.</b></i> The main function of CANCEL TAG is to                                                                              

make possible operations such as blind concatenation of strings in a tagged                                                                              

context without the propagation of inappropriate tag values across the string                                                                              

boundaries. There are two uses of CANCEL TAG. To cancel a tag value of a                                                                              

particular type, prefix the CANCEL TAG character with the tag identification                                                                              

character of the appropriate type. For example, the complete string to cancel a                                                                              

language tag is:</p>                                                                             

<p>U+E0001 U+E007F                                                                             

<p>The value of the relevant tag type returns to the default state for that tag                                                                              

type, namely: no tag value specified, the same as untagged text. To cancel <i>any</i>                                                                              

tag values of any type which may be in effect, use CANCEL TAG without a prefixed                                                                              

tag identification character.                                                                             

<blockquote>                                                                             

  <p><b>Note: </b>Currently there is no observable difference in the two uses of                                                                              

  CANCEL TAG, because only one tag identification character (and therefore one                                                                              

  tag type) is defined. Inserting a bare CANCEL TAG in places where only the                                                                              

  language tag needs to be canceled, could lead to unanticipated side effects if                                                                              

  this text were to be inserted in the future into a text that supports more                                                                              

  than one tag type.                                                                             

</blockquote>                                                                             

<p align="center"><i>&nbsp;<img border="0" src="tagdes2.gif" alt="Tag Characters" width="559" height="423"></i></p>                                                                             

<h4>Working With Language Tags</h4>                                                                             

<p><i><b>Avoiding Language Tags.</b> </i>Because of the extra implementation                                                                              

burden, language tags should be avoided in plain text unless language                                                                              

information is required and it is known that the receivers of the text will                                                                              

properly recognize and maintain the tags. However, where language tags must be                                                                              

used, implementers should consider the following implementation issues involved                                                                              

in supporting language information with tags and decide how to handle tags where                                                                              

they are not fully supported. This discussion applies to any mechanism for                                                                              

providing language tags in a plain text environment.</p>                                                                             

<i>                                                                             

<p><b>Higher-Level Protocols.</b> </i>Language tags should also be avoided                                                                              

wherever higher-level protocols, such as a rich-text format, HTML or MIME,                                                                              

provide language attributes. This practice prevents cases where the higher-level                                                                              

protocol and the language tags disagree. See <a href="../tr20/">Unicode                                                                              

Technical Report #20, &quot;Unicode in XML and other Markup Languages&quot;</a><i>.</p>                                                                             

<p><b>Effect of Tags on Interpretation of Text.</b></i> Implementations that                                                                              

support language tags, may need to take them into account for special                                                                              

processing, such as hyphenation or choice of font. However, the tag characters                                                                              

themselves have no display and do not affect line breaking, character shaping or                                                                              

joining, or any other format or layout properties. Processes interpreting the                                                                              

tag may choose to impose such behavior based on the tag value that it                                                                              

represents.</p>                                                                             

<p><i><b>Display.</b> </i>Characters in the tag character block have no visible                                                                              

rendering in normal text and the language tags themselves are not displayed.                                                                              

This choice may not require modification of the displaying program, if the fonts                                                                              

on that platform have the language tag characters mapped to zero-width,                                                                              

invisible glyphs. For debugging or other operations which must render the tags                                                                              

themselves visible, it is advisable that the tag characters be rendered using                                                                              

the corresponding ASCII character glyphs (perhaps modified systematically to                                                                              

differentiate them from normal ASCII characters). But the tag character values                                                                              

are chosen so that the tag characters will be interpretable in most debuggers                                                                              

even without display support.</p>                                                                             

<i>                                                                             

<p><b>Processing.</b></i> Sequential access to the text is generally                                                                              

straightforward. If language codes are not relevant to the particular processing                                                                              

operation, then they should be ignored. Random access to stateful tags is more                                                                              

problematic. Because the current state of the text depends upon tags previous to                                                                              

it, the text must be searched backward, sometimes all the way to the start. With                                                                              

these exceptions, tags pose no particular difficulties as long as no                                                                              

modifications are made to the text.</p>                                                                             

<i>                                                                             

<p><b>Range Checking for Tag Characters.</b></i> Tag characters are encoded in                                                                              

Plane 14 to support easy range checking. The following C/C++ source code                                                                              

snippets show efficient implementations of range checks for characters E0000 to                                                                              

E007F expressed in each of the three significant Unicode encoding forms. Range                                                                              

checks allow implementations that do not want to support these tag characters to                                                                              

efficiently filter for them.</p>                                                                             

<p>Range check expressed in UTF-32:                                                                             

<blockquote>                                                                             

  if ( ((unsigned)&nbsp; *s)&nbsp; - 0xE0000&nbsp; &lt;= 0x7F&nbsp; )                                                                             

</blockquote>                                                                             

Range check expressed in UTF-16:                                                                             

<blockquote>                                                                             

  if ( ( *s == 0xDB40 ) &amp;&amp; ( ((unsigned)*(s+1)) - 0xDC00&nbsp; &lt;=                                                                              

  0x7F ) )                                                                             

</blockquote>                                                                             

Expressed in UTF-8:                                                                             

<blockquote>                                                                             

  if ( ( *s == 0xF3 ) &amp;&amp; ( *(s+1) == 0xA0 ) &amp;&amp; ( ( *(s+2) &amp;                                                                              

  0xFE ) == 0x80 ) )&nbsp;                                                                             

</blockquote>                                                                             

Alternatively, the range checks for UTF-32 and UTF-16 can be coded with bit                                                                              

masks. Both versions should be equally efficient.                                                                             

<p>Range check expressed in UTF-32:                                                                             

<blockquote>                                                                             

  if ( ((*s) &amp; 0xFFFFFF80)&nbsp; == 0xE0000 )                                                                             

</blockquote>                                                                             

Range check expressed in UTF-16:                                                                             

<blockquote>                                                                             

  if ( ( *s == 0xDB40 ) &amp;&amp; ( *(s+1) &amp; 0xDC80) == 0xDC00 )                                                                             

</blockquote>                                                                             

<i>                                                                             

<p><b>Editing and Modification.</b></i> Inline tags present particular problems                                                                              

for text changes, because they are stateful. Any modifications of the text are                                                                              

more complicated, as those modifications need to be aware of the current                                                                              

language status and the &lt;<font face="Courier New" size="2">start</font>&gt;...&lt;<font                                                                             

face="Courier New" size="2">end</font>&gt; tags must be properly maintained. If                                                                              

an editing program is unaware that certain tags are stateful and cannot process                                                                              

them correctly, then it is very easy for the user to modify text in ways that                                                                              

corrupt it. For example, a user might delete part of a tag or paste text                                                                              

including a tag into the wrong context.</p>                                                                             

<i>                                                                             

<p><b>Dangers of Incomplete Support.</b> </i>Even programs that do not interpret                                                                              

the tags should not allow editing operations to break initial tags or leave tags                                                                              

unpaired. Unpaired tags should be discarded upon a save or send operation.</p>                                                                             

<p>Nonetheless, malformed text may be produced and transmitted by a tag-unaware                                                                              

editor. Therefore, implementations that do not ignore language tags must be                                                                              

prepared to receive malformed tags. On reception of a malformed or unpaired tag,                                                                              

language tag-aware implementations should reset the language to NONE, and then                                                                              

ignore the tag.</p>                                                                             

<h4>Unicode Conformance Issues</h4>                                                                             

The rules for Unicode conformance for the tag characters are exactly the same as                                                                              

for any other Unicode characters. A conformant process is not required to                                                                              

interpret the tag characters. If it does interpret them, it should interpret                                                                              

them according to the standard, i.e. as spelled-out tags. However, there is no                                                                              

requirement to provide a particular interpretation of the text because it is                                                                              

tagged with a given language. If an application does not interpret tag                                                                              

characters, it should leave their values undisturbed and do whatever it does                                                                              

with any other uninterpreted characters.                                                                             

<p>The presence of a well-formed tag is no guarantee that the data is correctly                                                                              

tagged. For example, an application could erroneously label French data with a                                                                              

Spanish tag.                                                                             

<p>Implementations of Unicode which already make use of out-of-band mechanisms                                                                              

for language tagging or &quot;heavy-weight&quot; in-band mechanisms such as XML                                                                              

or HTML will continue to do exactly what they are doing and will ignore the tag                                                                              

characters completely, and may prohibit their use in order to prevent conflict                                                                              

with the equivalent markup.                                                                             

<h4>Tag Syntax Description</h4>                                                                             

An extended BNF (Backus-Naur Form) description of the tags specified in this                                                                              

technical report is found below. Note the following BNF extensions used in this                                                                              

formalism:                                                                             

<p>1. Semantic constraints are specified by rules in the form of an assertion                                                                              

specified between double braces; the variable $$ denotes the string consisting                                                                              

of all terminal symbols matched by the non-terminal.                                                                             

<blockquote>                                                                             

  Example: {{ Assert ( $$[0] == '?' ); }}                                                                             

</blockquote>                                                                             

<blockquote>                                                                             

  Meaning: The first character of the string matched by this non-terminal must                                                                              

  be '?'                                                                             

</blockquote>                                                                             

2. A number of predicate functions are employed in semantic constraint rules                                                                              

which are not otherwise defined; their name is sufficient for determining their                                                                              

predication.                                                                             

<blockquote>                                                                             

  Example: IsRFC3066LanguageIdentifier ( tag-argument )                                                                             

</blockquote>                                                                             

<blockquote>                                                                             

  Meaning: tag-argument is a valid RFC3066 language identifier                                                                             

</blockquote>                                                                             

3. A lexical expander function, TAG, is employed to denote the tag form of an                                                                              

ASCII character; the argument to this function is either a character or a                                                                              

character set specified by a range or enumeration expression.                                                                             

<blockquote>                                                                             

  Example: TAG('-')                                                                             

</blockquote>                                                                             

<blockquote>                                                                             

  Meaning: TAG HYPHEN-MINUS                                                                             

</blockquote>                                                                             

<blockquote>                                                                             

  Example: TAG([A-Z])                                                                             

</blockquote>                                                                             

<blockquote>                                                                             

  Meaning: TAG LATIN CAPITAL LETTER A ... TAG LATIN CAPITAL LETTER Z                                                                             

</blockquote>                                                                             

4. A macro is employed to denote terminal symbols that are character literals                                                                              

which can't be directly represented in ASCII. The argument to the macro is the                                                                              

UNICODE character name.                                                                             

<blockquote>                                                                             

  Example: '${TAG CANCEL}'                                                                             

</blockquote>                                                                             

<blockquote>                                                                             

  Meaning: character literal whose code value is U+E007F                                                                             

</blockquote>                                                                             

5. Occurrence indicators used are '+' (one or more) and '*' (zero or more);                                                                              

optional occurrence is indicated by enclosure in '[' and ']'.                                                                             

<h4>Formal Tag Syntax</h4>                                                                             

<pre>tag&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; :&nbsp;&nbsp; language-tag

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp; cancel-all-tag

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ;</pre>                                                                             

<pre>language-tag&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; :&nbsp;&nbsp; language-tag-introducer language-tag-argument

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ;</pre>                                                                             

<pre>language-tag-argument&nbsp;&nbsp; :&nbsp;&nbsp; tag-argument

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {{ Assert ( IsRFC3066LanguageIdentifier ( $$ ); }}

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp; tag-cancel

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ;</pre>                                                                             

<pre>cancel-all-tag&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; :&nbsp;&nbsp; tag-cancel

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ;</pre>                                                                             

<pre>tag-argument&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; :&nbsp;&nbsp; tag-character+

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ;</pre>                                                                             

<pre>tag-character&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; :&nbsp;&nbsp; { c : c in

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; TAG( { a : a in printable ASCII characters or SPACE } ) }

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ;</pre>                                                                             

<pre>language-tag-introducer :&nbsp;&nbsp; '${TAG LANGUAGE}'

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ;</pre>                                                                             

<pre>tag-cancel&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; :&nbsp;&nbsp; '${TAG CANCEL}'

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ;</pre>                                                                             

<font size="2">                                                                             

<pre>&nbsp;</pre>                                                                             

</font>                                                                             

<h2 class="bb"><a name="charts">VI Code Charts</a></h2>                                                                             

<p>The following code charts contain the characters added in Unicode 3.1. They                                                                              

are shown together with the characters that were part of Unicode 3.0. New                                                                              

characters are shown on a yellow background in these code charts.</p>                                                                             

<ul>                                                                             

  <li><a href="/charts/PDF/U31-0370.pdf">Greek and Coptic</a></li>                                                                             

  <li><a href="/charts/PDF/U31-10300.pdf">Old Italic</a></li>                                                                             

  <li><a href="/charts/PDF/U31-10330.pdf">Gothic</a></li>                                                                             

  <li><a href="/charts/PDF/U31-10400.pdf">Deseret</a></li>                                                                             

  <li><a href="/charts/PDF/U31-1D000.pdf">Byzantine Musical Symbols</a></li>                                                                             

  <li><a href="/charts/PDF/U31-1D100.pdf">Musical Symbols</a></li>                                                                             

  <li><a href="/charts/PDF/U31-1D400.pdf">Mathematical Alphanumeric Symbols</a></li>                                                                             

  <li><a href="/charts/PDF/U31-20000.pdf">CJK Unified Ideographs Extension B</a></li>                                                                             

  <li><a href="/charts/PDF/U31-2F800.pdf">CJK Compatibility Ideographs                                                                              

    Supplement</a></li>                                                                             

  <li><a href="/charts/PDF/U31-E0000.pdf">Tag Characters</a></li>                                                                             

</ul>                                                                             

<blockquote>                                                                          

  <table border="1" width="85%" height="15" cellpadding="3"                                                                          

  bordercolor="#000000" cellspacing="0">                                                                          

    <tr>                                                                          

      <td width="85%" height="15" bordercolor="#000000">                                                                          

        <p align="center"><b><i><u>Code Charts Notice:</u></i></b>                                                                             

        <p>At the time of publication, complete fonts for the CJK Unified                                                                              

        Extension B were not available. Therefore the charts are missing some                                                                              

        glyphs. However, the characters in those positions in the charts are                                                                              

        unambiguously defined in Unihan.txt in the Unicode Character Database.</p>                                                                             

      </td>                                                                             

    </tr>                                                                             

  </table>                                                                             

</blockquote>                               

<p>Unicode 3.0 defined 34 noncharacters, 32 of which are in supplementary                           

planes. Unicode 3.1 defines 32 additional noncharacters in the BMP. The following                           

lists the ranges of&nbsp; noncharacters with links to the corresponding charts:</p>                          

<ul>                          

<li><a href="/charts/PDF/U31-FB50.pdf">FDD0-FDEF</a></li>                         

<li><a href="/charts/PDF/UFFF0.pdf">FFFE-FFFF</a></li>                         

<li><a href="/charts/PDF/U31-1FF80.pdf">1FFFE-1FFFF</a></li>                         

<li><a href="/charts/PDF/U31-2FF80.pdf">2FFFE-2FFFF</a></li>                         

<li><a href="/charts/PDF/U31-3FF80.pdf">3FFFE-3FFFF</a></li>                         

<li><a href="/charts/PDF/U31-4FF80.pdf">4FFFE-4FFFF</a></li>                         

<li><a href="/charts/PDF/U31-5FF80.pdf">5FFFE-5FFFF</a></li>                        

<li><a href="/charts/PDF/U31-6FF80.pdf">6FFFE-6FFFF</a></li>                        

<li><a href="/charts/PDF/U31-7FF80.pdf">7FFFE-7FFFF</a></li>                        

<li><a href="/charts/PDF/U31-8FF80.pdf">8FFFE-8FFFF</a></li>                        

<li><a href="/charts/PDF/U31-9FF80.pdf">9FFFE-9FFFF</a></li>                        

<li><a href="/charts/PDF/U31-AFF80.pdf">AFFFE-AFFFF</a></li>                        

<li><a href="/charts/PDF/U31-BFF80.pdf">BFFFE-BFFFF</a></li>                        

<li><a href="/charts/PDF/U31-CFF80.pdf">CFFFE-CFFFF</a></li>                        

<li><a href="/charts/PDF/U31-DFF80.pdf">DFFFE-DFFFF</a></li>                        

<li><a href="/charts/PDF/U31-EFF80.pdf">EFFFE-EFFFF</a></li>                        

<li><a href="/charts/PDF/U31-FFF80.pdf">FFFFE-FFFFF</a></li>                      

<li><a href="/charts/PDF/10FF80.pdf">10FFFE-10FFFF</a></li>                        

</ul>                         

<p>&nbsp;</p>                                                                          

                                                                    

<h2 class="bb"><a name="errata">VII Errata</a></h2>                                                                             

<p>This article contains errata rolled up since the publication of <i>The                                                                              

Unicode Standard, Version 3.0</i>. These errata are listed in the table below,                                                                              

organized by date and category.</p>                                                                             

<p>An online glossary was created that contained the contents of the glossary                                                                              

found in <i>The Unicode Standard, Version 3.0</i>. Since that time, this                                                                              

glossary has been updated. Global changes have been made to the language to                                                                              

clarify the distinction between code point and code unit. The following                                                                              

definitions have been added: <i>Annotation</i>, <i>BMP Code Point</i>, <i>BMP                                                                              

Character</i>, <i>Code Position</i>, <i>Code Unit</i>, <i>In-band</i>, <i>Noncharacter</i>,                                                                              

<i>Out-of-band</i>, <i>Plane</i>, <i>Row</i>, <i>Supplementary Code Point</i>, <i>Supplementary                                                                              

Character</i>, <i>Supplementary Planes</i>, <i>Surrogate Code Point</i>, <i>Surrogate                                                                              

Character</i>, <i>Tagging</i>, <i>Unicode Sequence Identifier</i>.</p>                                                                             

<table border="1">                                                                             

  <tr>                                                                             

    <th width="20%">Date&nbsp;</th>                                                                             

    <th width="85%">Summary&nbsp;</th>                                                                             

  </tr>                                                                             

  <tr>                                                                             

    <td width="20%" valign="top">2001 March 13</td>                                                                            

    <td width="80%">                                                                            

      Normalization                                             

      Corrigendum posted.                                       

      <br>NOTE: This corrigendum is incorporated in, and superseded by, this                                            

      document.                                       

    </td>                                                                            

  </tr>                                                                            

  <tr>                                                                            

    <td width="20%" valign="top">2001 January 17</td>                                                                             

    <td width="80%">                                                                             

      <p><i>Runic Alphabet, p. 174, the 9th symbol in the old futhark (10 lines                                                                              

      up from bottom of page) </i>is incorrect and should be U+16BA RUNIC LETTER                                                                              

      HAGLAZ H</p>                                                                             

      <p><i>p. 194, correction of subscript in L2</i>, text should read                                                                              

      (ALEF.LAM)<sub>r</sub> rather than (ALEF.LAM)<sub>l</sub></p>                                                                             

      <p><i>p. 201, bulleted item 1</i>, &quot;galath&quot; should read                                                                              

      &quot;dalath&quot;</p>                                                                             

      <p><i>p. 280, last sentence under &quot;Yi Radicals&quot;</i>, delete                                                                              

      &quot;, with a &quot;b&quot; added as a suffix&quot;</p>                                                                             

      <p><i>p. 324, second to last line</i>, &quot;FF<sub>16</sub>&quot; should                                                                              

      read &quot;BB<sub>16</sub>&quot;</p>                                                                             

      <p><i>p. 402</i>, the header &quot;Dependent vowel signs&quot; should                                                                              

      appear ahead of 093E DEVANAGARI VOWEL SIGN AA, instead of its current                                                                              

      location ahead of 093F DEVANAGARI VOWEL SIGN I.</p>                                                                             

    </td>                                                                             

  </tr>                                                                             

  <tr>                                                                             

    <td width="20%" valign="top">2000 November 29</td>                                                                             

    <td width="80%">UTF-8                                                                              

      Corrigendum<br>                                                                             

      Modifies the definition of UTF-8 to forbid conformant implementations from                                                                              

      interpreting non-shortest forms for BMP characters, and clarifies some of                                                                              

      the conformance clauses.                                            

      <br>NOTE: This corrigendum is incorporated in, and superseded by, this                                            

      document.                                           

    </td>                                                                            

  </tr>                                                                            

  <tr>                                                                            

    <td width="20%" valign="top">2000 September 5</td>                                                                            

    <td width="80%"><i>R.4, Selected References, p. 1008</i><br>                                                                            

      The misleadingly worded cross-reference at &quot;<i>W3C                                                                             

      Recommendation&quot;</i> should be deleted.</td>                                                                            

  </tr>                                                                            

  <tr>                                                                            

    <td width="20%" rowspan="3" valign="top">2000 August 31</td>                                                                            

    <td width="80%">                                                                            

      <p align="left"><b>Textual Errata</b></p>                                                                            

      <p><i>Bulleted list, Codespace Assignment for Graphic Characters, p. 23</i><br>                                                                            

      In the fourth bullet under <i>Codespace Assignment for Graphic Characters</i>,                                                                             

      &quot;128-byte boundaries or 1,024 byte-boundaries&quot; -- change                                                                             

      &quot;byte&quot; to &quot;code position&quot;.</p>                                                                            

      <p><i>Second bullet, second paragraph, p. 31</i><br>                                                                            

      Change U+00D4 to U+00F4 and U+004F to U+006F to match the characters used                                                                             

      in the example.</p>                                                                            

      <p><i>Line boundary control, p. 48</i><br>                                                                            

      Add &quot;2001 EM QUAD&quot; to the list under &quot;Line Boundary                                                                             

      Control&quot;.</p>                                                                            

      <p><i>Indic dead-character formation, p. 49</i><br>                                                                            

      Add &quot;0E3A THAI CHARACTER PHINTHU&quot; to the list under &quot;Indic                                                                             

      dead-character formation&quot;.</p>                                                                            

      <p><i>Step 1 of Hangul Syllable Composition, p. 54</i><br>                                                                            

      Replace the text in <i>Step 1</i> with the following: &quot;Iterate                                                                             

      through the sequence of characters in D, performing the following                                                                             

      steps:&quot;</p>                                                                            

      <p><i>Normalization, Alternative Spellings, p. 112</i><br>                                                                            

      Delete the last sentence of the bulleted item: &quot;<i>Similarly, if a                                                                             

      new combining mark is added to this standard, it may allow decompositions                                                                             

      for precomposed characters that did not have decompositions before.</i>&quot;</p>                                                                            

      <p><i>Figure 13-2, Controlling Ligatures, p. 318</i><br>                                                                            

      Move the last phrase (<i>where hyphens indicate cursive joining</i>) of                                                                             

      the sentence &quot;Usage of optional ligatures such as <i>fi</i> is not                                                                             

      currently controlled by any codes within the Unicode Standard but is                                                                             

      determined by protocols or resources external to the text sequence <i>where                                                                             

      hyphens indicate cursive joining</i>.&quot; to the end of the sentence                                                                             

      before Figure 13-2 &quot;For example, a cursive Latin font would produce                                                                             

      the results shown in Figure 13-2 <i>where hyphens indicate cursive joining</i>.&quot;<br>                                                                            

      (Note that this section is overridden by <a                                                                            

      href="../../standard/versions/Unicode3.0.1.html">Unicode 3.0.1</a>)</p>                                                                            

    </td>                                                                            

  </tr>                                                                            

  <tr>                                                                            

    <td width="80%"><b>Figure and Table Errata</b>                                                                            

      <p><i>Figure 1-1, p. 2</i><br>                                                                            

      The third Arabic character in the Unicode Text column should show a glyph                                                                             

      for alef, and the correct code point is 0000 0110 0010 0111 (U+0627),                                                                             

      instead of 0000 0110 0011 0111 (U+0637).</p>                                                                            

      <p><i>Figure 2-1, p. 10<br>                                                                            

      </i>The Devanagari example is not well-formed. Click <a                                                                            

      href="../../uni2errata/figure_2_1.html">here</a> to see the corrected                                                                             

      figure.</p>                                                                            

      <p><i>Figure 2-3, p. 14</i><br>                                                                            

      The correct code point for the sixth character, DEVANAGARI VOWEL SIGN I,                                                                             

      is 0000 1001 0011 1111 (U+093F), not 0000 1001 0011 0100 (U+0934).</p>                                                                            

      <p><i>Figure 2-6, p. 19<br>                                                                            

      </i>In the fourth line of encoding examples, the values &quot;61&quot;                                                                             

      should all be replaced by &quot;41&quot;, since the examples show an                                                                             

      uppercase &quot;A&quot;, not a lowercase &quot;a&quot;.</p>                                                                            

      <p><i>Table 4-7, p. 97</i><br>                                                                            

      Add the following entry after the line for U+5104: U+4EBF 100,000,000                                                                             

      (10,000 x 10,000)</p>                                                                            

      <p><i>Table 5-5, p. 129</i><br>                                                                            

      Remove the duplicated entries for NumericPrefix and NumericPostfix.</p>                                                                            

      <p><i>Table 5-5, p. 130</i><br>                                                                            

      In the fourth line, change the text &quot;All Unicode characters&quot; to                                                                             

      &quot;All other Unicode characters&quot;.</p>                                                                            

      <p><i>Figure 5-6, p. 119<br>                                                                            

      </i>The clipping example is not clipped. For the correct version, see <i>The                                                                             

      Unicode Standard, Version 2.0</i>, page 5-13.</p>                                                                            

      <p><i>Figure 8-2, p. 190<br>                                                                            

      </i>The left-most (or final) <i>heh</i> in the &quot;Joining&quot; line                                                                             

      should be in final form.</p>                                                                            

      <p><i>Reference to Figure 9-4, p. 250<br>                                                                            

      </i>Under &quot;<i>Explicit Virama</i>&quot;, last line of paragraph                                                                             

      should refer to Figure 9-4, not Figure 9-7.</p>                                                                            

      <p><i>Table 13-1, p. 318<br>                                                                            

      </i>Interchange the abbreviations &quot;RLO&quot; and &quot;LRO&quot; in                                                                             

      the last two lines of this table.</p>                                                                            

      <p><i>Table D-3, p. 976<br>                                                                            

      </i>Change the sixth row, first column, from &quot;048E..048F&quot; to                                                                             

      &quot;048C..048F&quot;. Change the sixth row, second column, from                                                                             

      &quot;4&quot; to &quot;6&quot;. Add the character names CYRILLIC CAPITAL                                                                             

      LETTER SEMISOFT SIGN and CYRILLIC SMALL LETTER SEMISOFT SIGN to the sixth                                                                             

      row, third column.</p>                                                                            

      <p>Change the tenth row, first column, from &quot;0780..07B1&quot; to                                                                             

      &quot;0780..07B0&quot;. Change the tenth row, second column, from                                                                             

      &quot;50&quot; to &quot;49&quot;.</p>                                                                            

      <p>Change the fourteenth row, second column, from &quot;346&quot; to                                                                             

      &quot;345&quot;.</p>                                                                            

      <p><i>Table D-3, p. 977<br>                                                                            

      </i>In the first row, third column, change &quot;TIRONIAN SIGH ET&quot; to                                                                             

      &quot;TIRONIAN SIGN ET&quot;.</p>                                                                            

    </td>                                                                            

  </tr>                                                                            

  <tr>                                                                            

    <td width="80%"><b>Glyph Errata</b>                                                                            

      <p><i>Ethiopic</i><br>                                                                            

      125C, one duplicated glyph (should be like 124C plus bow) plus several bad                                                                             

      quality glyphs which will get improved by use of the corrected font<br>                                                                            

      (see <a href="http://www.unicode.org/charts/PDF/U1200.pdf">http://www.unicode.org/charts/PDF/U1200.pdf</a>)</p>                                                                            

      <p><i>Set minus</i><br>                                                                            

      2216, the glyph should be rotated left so that it makes approximately a 40                                                                             

      degree angle to the horizontal<br>                                                                            

      (see <a href="http://www.unicode.org/charts/PDF/U2200.pdf">http://www.unicode.org/charts/PDF/U2200.pdf</a>)</p>                                                                            

      <p><i>Khmer Rial</i><br>                                                                            

      17DB, remove vertical tick underneath the currency symbol<br>                                                                            

      (see <a href="http://www.unicode.org/charts/PDF/U1780.pdf">http://www.unicode.org/charts/PDF/U1780.pdf</a>)</p>                                                                            

      <p><i>Start of Header<br>                                                                            

      </i>0001, correct to SOH<br>                                                                            

      (see <a href="http://www.unicode.org/charts/PDF/U0000.pdf">http://www.unicode.org/charts/PDF/U0000.pdf</a>)</p>                                                                            

      <p><i>Start of Text</i><br>                                                                            

      0002, correct to STX<br>                                                                            

      (see <a href="http://www.unicode.org/charts/PDF/U0000.pdf">http://www.unicode.org/charts/PDF/U0000.pdf</a>)</p>                                                                            

      <p><i>Arabic Separators<br>                                                                            

      </i>066B and 066C, the glyphs for these two characters revert to their                                                                             

      Unicode 2.0 shapes<br>                                                                            

      (see <a href="http://www.unicode.org/charts/PDF/U0600.pdf">http://www.unicode.org/charts/PDF/U0600.pdf</a>)</p>                                                                            

      <p><i>All Equal To<br>                                                                            

      </i>224C, change lazy s to reverse tilde</p>                                                                            

      <p><i>Black squares<br>                                                                            

      </i>25AA and 25AB, adjust size and position<br>                                                                            

      (see <a href="http://www.unicode.org/charts/PDF/U25A0.pdf">http://www.unicode.org/charts/PDF/U25A0.pdf</a>)</p>                                                                            

      <p><i>C1 control character &quot;index&quot;</i><br>                                                                            

      0084, remove glyph and C1 control alias, INDEX, and replace with the                                                                             

      notation and glyph for a control code not specified in ISO 6429<br>                                                                            

      (see <a href="http://www.unicode.org/charts/PDF/U0080.pdf">http://www.unicode.org/charts/PDF/U0080.pdf</a>)</p>                                                                            

    </td>                                                                            

  </tr>                                                                            

  <tr>                                                                            

    <td width="20%" valign="top">2000 August 30</td>                                                                            

    <td width="80%"><a href="../../standard/versions/Unicode3.0.1.html">Unicode                                                                             

      3.0.1</a> (update version)                                           

      <br>NOTE: This update is incorporated in, and superseded by, this                                            

      document.                                           

    </td>                                                                            

  </tr>                                                                            

  <tr>                                                                            

    <td width="20%" valign="top">2000 May 2</td>                                                                            

    <td width="80%">Correction of typographical errors in the Glossary. The                                                                             

      definition of BNF on p. 984 should read &quot;context-free,&quot; not                                                                             

      &quot;content-free.&quot; The definition of SGML on p. 994 should read                                                                             

      &quot;Standard Generalized Markup Language.&quot;</td>                                                                            

  </tr>                                                                            

  <tr>                                                                            

    <td width="20%" valign="top">2000 April 6</td>                                                                            

    <td width="80%">Fixed font errors for U+17BE..U+17C5 in Khmer block on pages                                                                             

      473-474. To download as a PDF file, click <a                                                                            

      href="http://www.unicode.org/unicode/uni2errata/Khmer.pdf">here</a>.</td>                                                                            

  </tr>                                                                            

  <tr>                                                                            

    <td width="20%" valign="top">2000 March 15</td>                                                                            

    <td width="80%">Corrected version of page 851, Han Radical-Stroke index.                                                                             

      Download as a <a href="../../uni2errata/851/Correction.tiff">TIFF</a> or <a                                                                            

      href="../../uni2errata/851/Correction.pdf">PDF</a> file.</td>                                                                            

  </tr>                                                                            

</table>                                                                            

<h2 class="bb"><a name="database">VIII Unicode Character Database Changes</a></h2>                                                                            

<p>The main change to the <a href="http://www.unicode.org/Public/3.1-Update/"> Unicode Character Database for Unicode 3.1</a> is the                                                                             

extension of the data files to cover the character repertoire addition. This                                                                             

most importantly impacts UnicodeData.txt, LineBreaks.txt, and                                                                             

EastAsianWidth.txt, each of which has been extended to cover all the newly                                                                             

encoded characters. Also, an updated informative NamesList.txt file is provided                                                                             

to cover the new repertoire.</p>                                                                            

<p>As of the Unicode 3.0.1 update, UnicodeData.txt already had entries for the                                                                             

user-defined characters beyond U+FFFF, but it is important to note that now                                                                             

UnicodeData.txt (and LineBreaks.txt and EastAsianWidth.txt) have many, many new                                                                             

entries for encoded characters making use of the five-hex-digit notation for the                                                                             

Unicode scalar values, e.g. 1D16E, 2F880, E0061, and so forth. Parsers of the                                                                             

Unicode Character Database files will need to be adjusted accordingly.</p>                                                                            

<p>The format of UnicodeData.txt has not changed. However, the formats of                                                                             

LineBreaks.txt and EastAsianWidth.txt have been adjusted slightly; the name of                                                                             

the Unicode character is now appended in a comment field, instead of in a data                                                                             

field, so that it will be clear that the normative source of the Unicode                                                                             

character name is only UnicodeData.txt.</p>                                                                            

<p>Blocks.txt has been extended to cover the new blocks from Planes 1, 2, and                                                                             

14.</p>                                                                            

<p>The notes to SpecialCasing.txt have been updated, and a special casing rule                                                                             

has been added for i/I in Azeri.</p>                                                                            

<p>The notes to CaseFolding.txt have been greatly extended, and the                                                                             

classification used for the folding has been modified. New symbols for the                                                                             

folding partition are in use, so check this file carefully before feeding it to                                                                             

an automated process. There are also repertoire additions to cover Deseret case                                                                             

folding.</p>                                                                            

<p>The supplementary property list file, PropList.txt, has been changed rather                                                                             

extensively. The format has been modified, to make it easier to parse. Property                                                                             

specifications that were redundant with UnicodeData.txt have been removed. The                                                                             

UTC has now reviewed the contents of PropList.txt and has incorporated it                                                                             

formally into the set of data files in the Unicode Character Database.                                                                             

PropList.txt contains listings of normative and informative properties. For                                                                             

details, see PropList.html.                                                                             

Further changes and updates to Proplist.txt will be subject to formal UTC review                                                                             

and control.</p>                                                                            

<p>A number of derived data files have been added. These contain                               

information that can be completely derived from other data files, but is                               

presented in a different format for ease of use. For more information, see                               

DerivedProperties.html.</p>                                                                            

<h3>Data File Format</h3>                                                                            

<p>The first field of each line in the Unicode Character Database files                                                                             

represents a code point. The remaining fields are properties associated with                                                                             

that code point. The format for these files has been extended in Unicode 3.1 to                                                                             

allow the specification of a range of code points. Each code point in the range                                                                             

has the associated properties. Such ranges are specified with &quot;..&quot;.                                                                             

For example:</p>                                                                            

<pre>0000..007F; Basic Latin

0080..00FF; Latin-1 Supplement



1680      ; White_space # Zs OGHAM SPACE MARK

2000..200A; White_space # Zs [11] EN QUAD..HAIR SPACE</pre>                                                                            

<p>The Blocks.txt file has been changed to use this format.</p>                                                                            

<p>For more details on the data file format, see UnicodeCharacterDatabase.html.</p>                                                                           

<h3>New Normative Properties</h3>                                                                           

<p>As detailed in <a href="#conformance">Article III, Conformance</a>, all of                                                                            

the General Category values plus the case mappings in UnicodeData.txt and                                                                            

SpecialCasing.txt are now normative.</p>                                                                           

<p>In the General Category, Cn is now specified to be the default value. It                                                                            

applies to all unassigned code points, as well as to all noncharacters.</p>                                                                           

<h2 class="bb"><a name="relation">IX Relation to ISO/IEC 10646</a></h2>                                                                           

<p>ISO/IEC 10646 is a multi-part standard. Part 1, published as ISO/IEC                                                                            

10646-1:2000(E), covers the architecture and Basic Multilingual Plane. Part 2,                                                                            

which is in its final ballot, covers the supplementary planes. Unicode 3.1 adds                                                                            

all the supplementary characters that will be part of ISO/IEC 10646-2. Unicode                                                                            

3.1 introduces the terms plane, BMP, and supplementary plane to help align                                                                            

terminology with ISO/IEC 10646.</p>                                                                           

<p>The Unicode Standard is not split into parts corresponding to those of                                                                            

ISO/IEC 10646. The parts of 10646 have independent publication schedules.                                                                            

Because there are relations between characters that are processed for separate                                                                            

parts of 10646 but need to be treated consistently in the Unicode Standard, it                                                                            

is occasionally necessary to deviate from strict synchronization to a given                                                                            

release of 10646.</p>                                                                           

<p>The Unicode Consortium and ISO/IEC JTC1/SC2/WG2 are committed to maintaining                                                                            

the synchronization between the two standards. Unicode 3.1 adds two BMP                                                                            

characters that are part of the first amendment to ISO/IEC 10646-1:2000, which                                                                            

is in final stages of development. See <a href="#description">Article I,                                                                            

Description</a>, for more information about these two characters and the reason                                                                            

for their inclusion into Unicode 3.1.</p>                                                                           

<p>The upcoming amendment of 10646-1 will also restrict the repertoire of 10646                                                                            

so that it will be formally compatible with UTF-16.</p>                                                                           

<h2 class="bb"><a name="references">X References</a> and Sources</h2>                                                                           

<h3>Standards and Specifications</h3>                                                                           

<p>ISO 639: International Organization for Standardization. <i>Code for the                                                                            

representation of names of languages</i> [Geneva, 1988]. (ISO 639:1988).</p>                                                                           

<p>ISO 3166: International Organization for Standardization. <i>Codes for the                                                                            

representation of names of countries and their subdivisions</i>. [Geneva]. Part                                                                            

1: Country Codes (ISO 3166-1:1997). Part 2: Country subdivision code (ISO                                                                            

3166-2:1998). Part 3: Code for formerly used names of countries (ISO                                                                            

3166-3:1999).</p>                                                                           

<p>ISO/IEC 10646: International Organization for Standardization. <i>Information                                                                            

Technology- Universal Multiple-Octet Coded Character Set (UCS) - Part 1:                                                                            

Architecture and Basic Multilingual Plane</i>. [Geneva], September 2000.                                                                            

(ISO/IEC 10646-1:2000).</p>                                                                           

<p>ISO/IEC FDIS 10646-2: International Organization for Standardization. <i>Information                                                                            

technology -- Universal Multiple-Octet Coded Character Set (UCS) -- Part 2:                                                                            

Supplementary Planes</i>. [Geneva], January 2001. (ISO/IEC 10646-2:2000 Final                                                                            

Draft International Standard).</p>                                                                           

<p>[<a name="mathml">MathML</a>] <i>Mathematical Markup Language (MathML&trade;) 1.01 Specification</i>.                                                                              

(W3C Recommendation, revision of 7 July 1999.) Editors: Patrick Ion and Robert                                                                              

Miner.<br>                                                                             

<a href="http://www.w3.org/TR/REC-MathML/">http://www.w3.org/TR/REC-MathML/</a></p>                                                                             

<p>RFC 3066: <i>Tags for the Identification of Languages</i>, by Harald                                                                              

Alvestrand. January 2001.</p>                                                                             

<p>RFC 2045: <i>Multipurpose Internet Mail Extensions (MIME). Part One: Format                                                                              

of Internet Message Bodies</i>, by N. Freed and N. Borenstein. November 1996.</p>                                                                             

<p>RFC 2046: <i>Multipurpose Internet Mail Extensions (MIME). Part Two: Media                                                                              

Types, by N. Freed and N. Borenstein</i>. November 1996.</p>                                                                             

<p>RFC 2047: <i>MIME (Multipurpose Internet Mail Extensions). Part Three:                                                                              

Message Header Extensions for Non-ASCII Text</i>, by K. Moore. November 1996.</p>                                                                             

<p>RFC 2048: <i>Multipurpose Internet Mail Extensions (MIME). Part Four:                                                                              

Registration Procedures</i>, by N. Freed, J. Klensin, and J. Postel. November                                                                              

1996.</p>                                                                             

<p>RFC 2049: <i>Multipurpose Internet Mail Extensions (MIME). Part Five:                                                                              

Conformance Criteria and Examples</i>, by N. Freed and N. Borenstein. November                                                                              

1996.</p>                                                                             

<h3>Other References and Sources</h3>                                                                             

<p>Bonfante, Larissa. &quot;The Scripts of Italy.&quot; In <i>The World's                                                                              

Writing Systems</i>. Edited by Peter T. Daniels and William Bright. New York,                                                                              

Oxford University Press, 1995. ISBN 0-19-507993-0.</p>                                                                             

<p>Catholic Church. <i>Graduale Sacrosanctae Romanae Ecclesiae de Tempore et de                                                                              

Sanctis SS. D. N. Pii X. Pontificis Maximi.</i> Parisiis, Desclée, 1961.                                                                              

(Graduale romanum, no. 696.)</p>                                                                             

<p>Cristofani, Mauro. &quot;L'alfabeto etrusco.&quot; In <i>Lingue e dialetti                                                                              

dell'Italia antica, a cura di Aldo Larosdocimi</i>, p. 401-428. Roma, Biblioteca                                                                              

di storia patria, a cura dell’ Ente per la diffusione e l’educazione storia,                                                                              

1978. (Popoli e civiltà dell'Italia antica, VI.)</p>                                                                             

<p>&quot;Deseret Alphabet.&quot; In <i>Encyclopedia of Mormonism</i>, edited by                                                                              

Daniel H. Ludlow. New York, Macmillan, 1992. ISBN 0-02-904040-X.</p>                                                                             

<p>Ebbinghaus, Ernst. &quot;The Gothic alphabet.&quot; In <i>The World’s                                                                              

Writing Systems</i>, edited by Peter T. Daniels and William Bright. New York,                                                                              

Oxford University Press, 1996. ISBN 0-19-507993-0.</p>                                                                             

<p>Faulmann, Carl. <i>Das Buch der Schrift: enthaltend die Schriftzeichen und                                                                              

Alphabete aller Zeiten und aller Völker des Erdkreises</i>. Reprint of 1880 ed.                                                                              

Frankfurt am Main, Eichborn, 1990. ISBN 3-8218-1720-8.</p>                                                                             

<p>Gordon, Arthur E. <i>Illustrated Introduction to Latin Epigraphy</i>.                                                                              

Berkeley, University of California Press, 1983. ISBN 0-520-03898-3.</p>                                                                             

<p>Haarmann, Harald. <i>Universalgeschichte der Schrift</i>. Frankfurt/Main, New                                                                              

York, Campus, 1990. ISBN 3-593-34346-0.</p>                                                                             

<p>Hellenic Organization for Standardization (ELOT). <i>The Greek Byzantine                                                                              

Musical Notation System</i>. Athens, 1997. (ELOT 1373.)</p>                                                                             

<p>Heussenstamm, George. <i>Norton Manual of Music Notation</i>. New York, W.W.                                                                              

Norton, 1987. ISBN 0-393-95526-5 (pbk.)</p>                                                                             

<p>Kennedy, Michael. <i>Oxford Dictionary of Music</i>. Oxford, New York, Oxford                                                                              

University Press, 1985. ISBN 0-19-311333-3.</p>                                                                             

<blockquote>                                                                             

  Second ed. published 1994. ISBN 0-19-869162-9.                                                                             

</blockquote>                                                                             

<p>Marinetti, Anna. <i>Le iscrizione sudpicene</i>. I. Testi. Firenze, Olschki,                                                                              

1985. ISBN 88-222-3331-X (v. 1).</p>                                                                             

<p>MIME. See RFCs 2045-2049.</p>                                                                             

<p>Monson, Samuel C. <i>Representative American Phonetic Alphabets</i>. New                                                                              

York, 1954. Ph.D. dissertation -- Columbia University.</p>                                                                             

<p>&quot;Music.&quot; In <i>New Encyclopedia Britannica</i>. 15th ed. Chicago:                                                                              

Encyclopedia Britannica, 199-.</p>                                                                             

<p><i>The New Harvard Dictionary of Music</i>, edited by Don Michael Randel.                                                                              

Cambridge, Massachusetts, Belknap Press of Harvard University Press, 1986. ISBN                                                                              

0-674-61525-5.</p>                                                                             

<p>Ottman, Robert W. <i>Elementary Harmony: Theory and Practice</i>. 2nd ed.                                                                              

Englewood Cliffs, Prentice-Hall, 1970. ISBN 0-13-257451-9.</p>                                                                             

<blockquote>                                                                             

  Fifth ed. published 1998. ISBN 0-13-281610-5.                                                                             

</blockquote>                                                                             

<p>Parlangèli, Oronzo. <i>Studi Messapici</i>. Milano, Istituto Lombardo di                                                                              

Scienze e Lettere, 1960.</p>                                                                             

<p>Rastall, Richard. <i>The Notation of Western Music: An Introduction</i>.                                                                              

London: Dent, 1983. ISBN 0-460-04205-X.</p>                                                                             

<blockquote>                                                                             

  Also published: New York, St. Martin's Press, 1982. ISBN 0-312-57963-2.                                                                             

</blockquote>                                                                             

<p>Read, Gardner. <i>Music Notation: A Manual of Modern Practice</i>. Boston:                                                                              

Allyn and Bacon, 1964.</p>                                                                             

<blockquote>                                                                             

  Second ed. published London, Gollancz, 1974. ISBN 0-575-01758-9.                                                                             

</blockquote>                                                                             

<p>Sampson, Geoffrey. <i>Writing Systems: a Linguistic Introduction</i>.                                                                              

Stanford, California, Stanford University Press, 1985. ISBN 0-8047-1254-9.</p>                                                                             

<p>Stone, Kurt. <i>Music Notation in the Twentieth Century: A Practical                                                                              

Guidebook</i>. New York: W.W. Norton, 1980. ISBN 0-393-95053-0.</p>                                                                             

<p><i>Understanding Music with AI: Perspectives on Music Cognition</i>, edited                                                                              

by Mira Balaban, Kemal Ebcioglu, and Otto Laske. Cambridge, Massachusetts, MIT                                                                              

Press; Menlo Park, California, AAAI Press, 1992. ISBN 0-262-52170-9.</p>                                                                             

<p>Some of the figures in this document were provided by Michael Everson and                                                                              

Asmus Freytag.</p>                                                                            

<h2>XI <a name="Modifications">Modifications</a></h2>            

<p>The following summarizes modifications from the previous version of this              

document. Modifications to this document are strictly limited to repairing    

straightforward typographical and production errors.&nbsp;</p>          

<table cellspacing="4" cellpadding="0" width="100%" border="0">          

  <tbody>          

    <tr>          

      <td valign="top" width="1"><a name="tracking_number4">4</a></td>          

      <td valign="top">          

        <ul>          

          <li>Added Jamo-3.txt to the list in Article I, Description under     

            &quot;Formal Definition of Unicode 3.1.&quot; The file itself is 

            unchanged from The Unicode Standard, Version 3.0.</li>         

          <li>Revised figure showing Georgian code chart</li>         

          <li>Corrected typo of U+0031 instead of U+0030 in codepoints for the    

            set of basic Latin digits in the first bullet under &quot;Basic Set    

            of Alphanumeric Characters&quot; in 12.2 Mathematical Alphanumeric    

            Symbols in Article V, Block Descriptions</li>         

        </ul>          

      </td>          

    </tr>          

  </tbody>          

</table>                                                                                 

<hr align="LEFT">                                                                        

<p><font size="-1">Copyright © 2001 Unicode, Inc. All Rights Reserved. The                                                                              

Unicode Consortium makes no expressed or implied warranty of any kind, and                                                                              

assumes no liability for errors or omissions. No liability is assumed for                                                                              

incidental and consequential damages in connection with or arising out of the                                                                              

use of the information or programs contained or accompanying this technical                                                                              

report.</font></p>                                                                             

<p><font size="-1">Unicode and the Unicode logo are trademarks of Unicode, Inc.,                                                                              

and are registered in some jurisdictions.</font></p>                                                                             

                                                                             

</body>                                                                             

                                                                             

</html>                                                                             

Rendered documentLive HTML preview