tr8
rev 3The Unicode Standard�, Version 2.1
Open HTMLUpstream
tr8-3.html
1428 lines
Open Raw
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
       "http://www.w3.org/TR/REC-html40/loose.dtd"> 
<html>
<head><base href="https://www.unicode.org/reports/tr8/tr8-3.html">
     
<link rel="stylesheet" href="http://www.unicode.org/unicode.css" type="text/css">
<title>UTR #8: The Unicode Standard�, Version 2.1</title>
</head>
<body>
<center>
<h2>
<href="http://www.unicode.org/"><img SRC="http://www.unicode.org/img/unilogo-72.gif" BORDER=0 height=36 width=36 align=TEXTTOP>
Unicode Technical Report #8</h2></center>
<center>
<h1>The Unicode Standard�, Version 2.1</h1></center>
<table BORDER CELLSPACING=2 CELLPADDING=0 WIDTH="100%" >
<tr>
<td WIDTH="120">Revision</td>
<td>3.0</td>
</tr>

<tr>
<td WIDTH="120">Authors</td>
<td>Lisa Moore (<a href="mailto:lisam@us.ibm.com">lisam@us.ibm.com</a>)</td>
</tr>

<tr>
<td WIDTH="120">Date</td>
<td>1999-11-21</td>
</tr>

<tr>
<td WIDTH="120">This Version</td>
<td><a href="http://www.unicode.org/unicode/reports/tr8/tr8-3.html">http://www.unicode.org/unicode/reports/tr8/tr8-3</a></td>
</tr>

<tr>
<td WIDTH="120">Previous Version</td>

<td><a href="http://www.unicode.org/unicode/reports/tr8/tr8-2.html">http://www.unicode.org/unicode/reports/tr8/tr8-2</a></td>
</tr>

<tr>
<td WIDTH="120">Latest Version</td>

<td><a href="http://www.unicode.org/unicode/reports/tr8">http://www.unicode.org/unicode/reports/tr8</a></td>
</tr>
</table>

<h3>
<b><i>Summary</i></b></h3>

<p><i>This report documents the Unicode Standard, Version 2.1.</i></p> 

<h3>
<b><i>Status of this document</i></b></h3>

<p><i>This document contains informative material and normative specifications which have been
considered and approved by the Unicode Technical Committee for publication as a <b>Technical
Report</b> and as part of the Unicode Standard, Version 2.1. Any reference to version 2.1 of the
Unicode Standard automatically includes this technical report. Please mail corrigenda and
other comments to the author.</i></p>
<p><i>The content of all technical reports must be understood in the context of the appropriate
version of the Unicode Standard. References in this technical report to sections of the
Unicode Standard refer to the Unicode Standard, Version 2.0. See
http://www.unicode.org/unicode/standard/versions for more information.</i></p>

<h3>
<b><i>Contents</i></b></h3>

<ul>
<li>
<a href="#Description">1 Description</a>
<ul>
<li>
<a href="#Conformance">1.1 Conformance</a></li>
</ul></li>
<li>
<a href="#Object Replacement Character">2 Object Replacement Character</a></li>
<li>
<a href="#Euro Sign">3 Euro Sign</a></li>
<li>
<a href="#Errata">4 Errata</a>
<ul>
<li>
<a href="#Math Property Characters">4.1 Math Property Characters</a></li>
<li>
<a href="#Letter Errata">4.2 Letter Errata</a></li>
<li>
<a href="#Canonical Decomposition Clarification">4.3 Canonical Decomposition
    Clarification</a></li>
<li>
<a href="#Identifier Errata">4.4 Identifier Errata</a></li>
<li>
<a href="#Bidirectional Behavior Errata">4.5 Bidirectional Behavior Errata</a>
<ul>
<li>
<a href="#Basic Display Algorithm">4.5.1 Basic Display Algorithm</a></li>
<li>
<a href="#Bidirectional Character Types">4.5.2 Bidirectional Character Types</a></li>
<li>
<a href="#The Base Level">4.5.3 The Base Level</a></li>
<li>
<a href="#Terminating Embeddings and Overrides">4.5.4 Terminating Embeddings and Overrides</a></li>
<li>
<a href="#Resolving Weak Types">4.5.5 Resolving Weak Types</a></li>
<li>
<a href="#Resolving Neutral Types (1)">4.5.6 Resolving Neutral Types (1)</a></li>
<li>
<a href="#Resolving Neutral Types (2)">4.5.7 Resolving Neutral Types (2)</a></li>
<li>
<a href="#Resolving Implicit Levels">4.5.8 Resolving Implicit Levels</a></li>
<li>
<a href="#Reordering Resolved Levels">4.5.9 Reordering Resolved Levels</a></li>
<li>
<a href="#New Directional Properties">4.5.10 Characters with New Directional Properties</a></li>
</ul></li>
<li>
<a href="#Apostrophe Semantics Errata">4.6 Apostrophe Semantics Errata</a></li>
<li>
<a href="#Typographic Errata">4.7 Typographic Errata</a></li>
<li>
<a href="#Glyph Errata">4.8 Glyph Errata</a></li>
<li>
<a href="#UTF-7 Sample Code Correction">4.9 UTF-7 Sample Code Correction</a></li>
</ul></li>
<li>
<a href="#Unicode Character Database and Properties Changes">5 Unicode Character Database and Properties Changes</a></li>
<li>
<a href="#Revisions">Revisions</a></li>
</ul>

<h2>
<a NAME="Description"></a>1 Description</h2>

<p>Version 2.1 of the Unicode Standard brings together two additions to the
repertoire which are expected to be in wide use in a number of implementations, errata
collected since the publication of Version 2.0, and a number of updates to the character
properties database. The two newly added characters are the U+FFFC OBJECT REPLACEMENT
CHARACTER and the U+20AC EURO SIGN. The object replacement character is already employed
in multiple implementations, and the euro sign is expected to be widely used very soon as
the European Monetary Union (EMU) proceeds to phase in its use as the EMU unit of
currency. This modification of the Unicode Standard is made available so that implementers
can proceed with their support plans knowing that their implementation of Unicode is a
well-defined, conforming version. With the additions of Version 2.1, the Unicode Standard
contains 38, 887 characters from the world&#146;s scripts.</p>
<p>Additional characters and scripts have been accepted into the
Unicode Standard since the publication of The Unicode Standard, Version 2.0. These
are not included in Version 2.1 but are documented on the Unicode Web site at:
<a href="/unicode/alloc/Pipeline.html">http://www.unicode.org/unicode/alloc/Pipeline.html</a><p>
<h3>
<a NAME="Conformance"></a>1.1 Conformance</h3>

<p>Overall Unicode conformance criteria as described in Chapter 3 of
Version 2.0 are unchanged. Specific aspects of the bidirectional algorithm have been
modified in Version 2.1, Hangul syllable decompositions have been clarified, and certain
normative character property values have been changed.</p>

<h2>
<a NAME="Object Replacement Character"></a>2 Object Replacement Character</h2>

<p>The U+FFFC OBJECT REPLACEMENT CHARACTER is used as an insertion point
for objects located within a stream of text. All other information about the object is
kept outside the character data stream. Internally it is a dummy character which acts as
an anchor point for the object&#146;s formatting information. In addition to assuring
correct placement of an object in a data stream, the object replacement character also
allows the use of general stream-based algorithms for any textual aspects of embedded
objects</p>
<p>The object replacement character is classified as a <i>Symbol, Other</i> <i>(So)</i>
and has a bidirectional category of <i>Other Neutrals</i> <i>(ON)</i>. </p>

<table CELLSPACING="0" BORDER="2" CELLPADDING="9" WIDTH="100%">
  <tr> 
    <td VALIGN="TOP"></font><i>Addition</i>     
    <p>p 7-523. Add to the standard the following character:
    <div align="left"><table border="0" cellpadding="5">
      <tr>
        <td>FFFC</td>
        <td><img align="middle" src="http://www.unicode.org/img/CJKtr8/UFFFC.gif" X-SAS-UseImageHeight X-SAS-UseImageWidth WIDTH="32" HEIGHT="32"></td>
        <td>OBJECT REPLACEMENT CHARACTER</td>
      </tr>
    </table>
    </div></td>
  </tr>
</table>
<h2>
<a NAME="Euro Sign"></a>3 Euro Sign</h2>

<p>The new single currency for member countries of the European Monetary Union (EMU) is
the euro. The euro character is encoded in the Unicode Standard as U+20AC EURO SIGN. </p>
<p>To avoid confusion, the historical character U+20A0 EURO-CURRENCY SIGN has been updated
with an informative note and a cross reference to U+20AC EURO SIGN.</p>
<p>The euro character is classified as <i>Symbol, Currency</i> <i>(Sc)</i>
and has a bidirectional category of <i>European Number Terminator (ET)</i>.</p>

<table CELLSPACING="0" BORDER="2" CELLPADDING="9" WIDTH="100%">
  <tr>
    <td VALIGN="TOP"></font><i>Corrigendum</i>        
    <p>p 7-161. Currency symbols character
    names list</p>
    <p>Add the following informative note for character 20A0:</p>
    <p>&quot;Historical character derived from Xerox Character Code Standard&quot;</p>
    <p>Add the following cross reference for character 20A0:</p>
    <p>&quot;20AC euro sign&quot;</td>
  </tr> 
</table>


<table CELLSPACING="0" BORDER="2" CELLPADDING="9" WIDTH="100%">
  <tr>
    <td VALIGN="TOP"></font><i>Addition</i>
    <p>p 7-161. Add to the standard the following character:  
     <div align="left"><table border="0" cellpadding="5">
      <tr>
        <td>20AC</td>
        <td><img align="middle" src="http://www.unicode.org/img/CJKtr8/U20AC.gif" X-SAS-UseImageHeight X-SAS-UseImageWidth WIDTH="32" HEIGHT="32"></td>
        <td>EURO SIGN</td>
      </tr>
    </table>
    </div></font>
    <p>Add the following informative note for 20AC:</p>
    <p>&quot;Currency sign for the European Monetary Union&quot;</p>
    <p>Add the following cross reference for 20AC:</p>
    <p>&quot;20A0 euro-currency sign&quot;</td>
  </tr>
</table>

<h2>
<a NAME="Errata"></a>4 Errata</h2>

<h3>
<a NAME="Math Property Characters"></a>4.1 Math Property Characters</h3>

<p>Additional Unicode characters have been designated as having the mathematical property.
Typos in the Version 2.0 list of characters with the mathematical property have also been
corrected.</p>
<table CELLSPACING="0" BORDER="2" CELLPADDING="9" WIDTH="100%">
  <tr>
    <td VALIGN="top"></font><i>Corrigenda</i>
    <p>p 4-25. In the list following section 4.9 </p>
    <p>Change 20A6 to 2016.</p>
    <p>Change &quot;20D2..20E1&quot; to &quot;20D0..20DC, 20E1&quot;.</p>
    <p>Add the following characters to the list:</p>
    <div align="left"><table border="0" cellpadding="5">
      <tr>
        <td valign="top"><font SIZE="1">207A..207E</font></td>
        <td width="30"></td>
        <td valign="top"><font SIZE="1">SUPERSCRIPT PLUS SIGN.. SUPERSCRIPT RIGHT PARENTHESIS</font></td>
      </tr>
      <tr>
        <td valign="top"><font SIZE="1">208A..208E</font></td>
        <td width="30"></td>
        <td valign="top"><font SIZE="1">SUBSCRIPT PLUS SIGN.. SUBSCRIPT RIGHT PARENTHESIS</font></td>
      </tr>
      <tr>
        <td valign="top"><font SIZE="1">FB29 </font></td>
        <td width="30"></td>
        <td valign="top"><font SIZE="1">HEBREW LETTER ALTERNATIVE PLUS SIGN</font></td>
      </tr>
      <tr>
        <td valign="top"><font SIZE="1">FE35..FE38 </font></td>
        <td width="30"></td>
        <td valign="top"><font SIZE="1">PRESENTATION FORM FOR VERTICAL LEFT
        PARENTHESIS..PRESENTATION FORM FOR VERTICAL RIGHT CURLY BRACKET</font></td>
      </tr>
      <tr>
        <td valign="top"><font SIZE="1">FE59..FE5C </font></td>

        <td width="30"></td>

        <td valign="top"><font SIZE="1">SMALL LEFT PARENTHESIS..SMALL RIGHT CURLY BRACKET</font></td>

      </tr>

      <tr>

        <td valign="top"><font SIZE="1">FE61..FE66 </font></td>

        <td width="30"></td>

        <td valign="top"><font SIZE="1">SMALL ASTERISK..SMALL EQUALS SIGN</font></td>

      </tr>

      <tr>

        <td valign="top"><font SIZE="1">FE68 </font></td>

        <td width="30"></td>

        <td valign="top"><font SIZE="1">SMALL REVERSE SOLIDUS</font></td>

      </tr>

      <tr>

        <td valign="top"><font SIZE="1">FF08..FF0B </font></td>

        <td width="30"></td>

        <td valign="top"><font SIZE="1">FULLWIDTH LEFT PARENTHESIS..FULLWIDTH PLUS SIGN</font></td>

      </tr>

      <tr>

        <td valign="top"><font SIZE="1">FF0D </font></td>

        <td width="30"></td>

        <td valign="top"><font SIZE="1">FULLWIDTH HYPHEN-MINUS</font></td>

      </tr>

      <tr>

        <td valign="top"><font SIZE="1">FF0F </font></td>

        <td width="30"></td>

        <td valign="top"><font SIZE="1">FULLWIDTH SOLIDUS</font></td>

      </tr>

      <tr>

        <td valign="top"><font SIZE="1">FF1C..FF1E </font></td>

        <td width="30"></td>

        <td valign="top"><font SIZE="1">FULLWIDTH LESS-THAN SIGN.. FULLWIDTH GREATER-THAN SIGN</font></td>

      </tr>

      <tr>

        <td valign="top"><font SIZE="1">FF3B..FF3E </font></td>

        <td width="30"></td>

        <td valign="top"><font SIZE="1">FULLWIDTH LEFT SQUARE BRACKET.. FULLWIDTH CIRCUMFLEX

        ACCENT</font></td>

      </tr>

      <tr>

        <td valign="top"><font SIZE="1">FF5B..FF5E</font></td>

        <td width="30"></td>

        <td valign="top"><font SIZE="1">FULLWIDTH LEFT CURLY BRACKET.. FULLWIDTH TILDE</font></td>

      </tr>

      <tr>

        <td valign="top"><font SIZE="1">FFE2 </font></td>

        <td width="30"></td>

        <td valign="top"><font SIZE="1">FULLWIDTH NOT SIGN</font></td>

      </tr>

      <tr>

        <td valign="top"><font SIZE="1">FFE8..FFEC </font></td>

        <td width="30"></td>

        <td valign="top"><font SIZE="1">HALFWIDTH FORMS LIGHT VERTICAL.. HALFWIDTH DOWNWARDS ARROW</font></td>
      </tr>
    </table>
    </div></td>
  </tr>
</table>

<h3>
<a NAME="Letter Errata"></a>4.2 Letter Errata</h3>

<p>Two characters have been removed from the alphabetics listing, U+02BC MODIFIER LETTER
APOSTROPHE and U+055A ARMENIAN APOSTROPHE.</p>
<table CELLSPACING="0" BORDER="2" CELLPADDING="9" WIDTH="100%">
  <tr>
    <td VALIGN="TOP"></font><i>Corrigendum</i>
    <p>p 4-14. Section 4.5 Letters</p>
    <p>Remove 02BC and 055A from the table of alphabetics.</td>
  </tr>
</table>

<h3>
<a NAME="Canonical Decomposition Clarification"></a>4.3 Canonical Decomposition Clarification</h3>

<p>The status of Hangul Syllable decompositions have been clarified.</p>

<table CELLSPACING="0" BORDER="2" CELLPADDING="9" WIDTH="100%">
  <tr>
    <td VALIGN="TOP"></font><i>Corrigenda</i>
    <p>p 3-7.<i> </i>D23 </p>
    <p>Change the first sentence to read: &quot;<i>canonical decomposition: </i>the
   decomposition of a character which results from recursively applying the canonical
    mappings found in the names list of <i>Section 7.1, Character Names List Entries</i> and
    those described in <i>Section 3.10 Combining Jamo Behavior</i> until no characters can be
    further decomposed, and then reordering non-spacing marks according to <i>Section 3.9,
    Canonical Ordering Behavior.</i>&quot;</p>
    <p>p 3-11. Section 3.10 Combining Jamo Behavior</p>
    <p>Change the third bullet to: &quot;determine the canonical decomposition of Hangul
    syllables&quot;</p>
    <p>p 3-13. Item 1</p>
    <p>Change the first sentence to: &quot;Process C by composing the conjoining jamo wherever
    possible, according to the compatibility decomposition rules in <i>Chapter 7, Code Charts</i>.&quot;</p>
    <p>Change the fourth sentence to: &quot;Raw keyboard data, on the other hand, may be in
    the form of a compatibility decomposition.&quot;</p>
    <p>p 3-13. Hangul Syllable Decomposition</p>
    <p>Change the first sentence to: &quot;The following describes the reverse mapping - how
    to take Hangul syllable S and derive the canonical decomposition C.&quot;</td>
  </tr>

</table>

<h3>
<a NAME="Identifier Errata"></a>4.4 Identifier Errata</h3>

<p>New distinctions have been made in the Unicode Character Database for use in
identifiers. In addition changes have been made to the text of the standard.</p>
<table CELLSPACING="0" BORDER="2" CELLPADDING="9" WIDTH="100%">
  <tr>
    <td VALIGN="top"></font><i>Corrigenda</i>
    <p>p. 5-26, 27. Section 5.14 Identifiers</p>
    <p>Add 06DD and 06DE to &lt;enclosing_char&gt;.</p>
    <p>Add compatibility low lines FE33, FE34, FE4D..FE4F to
    &lt;underscore&gt;.</p>
    <p>Remove 0387 from &lt;extender&gt;.</p>
    <p>Remove &lt;identifier_part&gt; and its definition.</p>
    <p>Change the &lt;identifier&gt; syntactic rule to:</p>
  
  <div align="left"><table border="0" cellpadding="5">
      <tr></font>
        <td width="150">&quot;&lt;identifier&gt;</td>
        <td width="10"></td>
        <td>::= &lt;identifier_start &gt; ( &lt;identifier_start&gt; |
        &lt;identifier_extend&gt; )*&quot;</td>
      </tr>
    </table>

    </div><p>Add the following syntactic rules at the end of the list:</p>
    <div align="left"><table border="0" cellpadding="5">
      <tr>
        <td width="150">&quot;&lt;identifier_extend&gt;</td>
        <td width="10"></td>
        <td>::= &lt;decimal_digit_char&gt; | &lt;ident_combining_char&gt; |
        &lt;underscore&gt; | &lt;extender&gt; | &lt;ident_ignorable_char&gt; | &lt;connector&gt;
      </td>
      </tr>
      <tr>
        <td width="150">&lt;connector&gt;</td>
        <td width="10"></td>
        <td>::= { 203F, 2040 }&quot </td>
        </tr>
    </table>

    </div><p>Following the syntactic rules add the following:</p>
    <p>&quot;Identifiers are ultimately defined by a set of character categories from the
    Unicode Character Database. (The individual Terminal Classes described in the text do not
    have a one-to-one relationship with the character categories, but the resulting
    definitions of identifiers are intended to be the same. </p>
    <div align="left"><table border="0" cellpadding="5">
      <tr>
        <td valign="top" align="left"><font face="Fixedsys" size="2"><b>Syntactic Class</b></font></td>
        <td valign="top" align="left"><font face="Fixedsys" size="2"><b>Equivalent Category Set</b></font></td>
        <td valign="top" align="left"><font face="Fixedsys" size="2"><b>Coverage</b></font></td>
      </tr>
      <tr>
        <td></td>
        <td></td>

        <td></td>

      </tr>

      <tr>
        <td valign="top"><font face="Fixedsys" size="2">&lt;identifier_start&gt;</font></td>
        <td valign="top"><font face="Fixedsys" size="2">Lu,Ll,Lt,Lm,Lo,Nl</font></td>
        <td valign="top"><font face="Fixedsys" size="2">Uppercase letter, Lowercase letter,
        Titlecase letter, Modifier letter, Other letter, Letter number</font></td>
      </tr>
      <tr>
        <td valign="top"><font face="Fixedsys" size="2">&lt;identifier_extend&gt;</font></td>
        <td valign="top"><font face="Fixedsys" size="2">Mn,Mc,Nd,Pc,Cf </font></td>
        <td valign="top"><font face="Fixedsys" size="2">Non-spacing mark, Spacing combining mark,
        Decimal number, Connector punctuation, Formatting code</font></td>
      </tr>
      <tr>
        <td valign="top"><font face="Fixedsys" size="2">&lt;ident_ignorable_char&gt;</font></td>
        <td valign="top"><font face="Fixedsys" size="2">Cf</font></td>
        <td valign="top"><font face="Fixedsys" size="2">Formatting code </font></td>
      </tr>
    </table>
    </div><p>For an explicit list of the current coverage of each of these
    syntactic classes, see &lt;identifier_start&gt;, &lt;identifier_extend&gt;, and
    &lt;ident_ignorable_char&gt;.&quot; </td>
  </tr>

</table>

<h3>
<a NAME="Bidirectional Behavior Errata"></a>4.5 Bidirectional Behavior Errata</h3>

<p>Since the Unicode Standard Version 2, many aspects of the bidirectional
behavior algorithm have been clarified or modified, including the basic display algorithm,
bidirectional character types, base levels, resolving weak and neutral types, and
resolving implicit levels. These changes affect pages 3-14 through 3-23 of the standard.
Additionally, a few characters have been assigned new bidirectional type properties.</p>

<h4>
<a NAME="Basic Display Algorithm"></a>4.5.1 Basic Display Algorithm</h4>

<p>The description of the scope of the algorithm within a block has been
clarified, and a pointer to further information on the handling of CR and LF has been
added.</p>

<table CELLSPACING="0" BORDER="2" CELLPADDING="9" WIDTH="100%">
  <tr>
    <td VALIGN="TOP"></font><i>Corrigendum</i>
    <p>p 3-16. At the end of the paragraph
    before the first bullet, add:</p>
    <p>&quot;The algorithm only reorders text within a block; characters on one side of a
    block separator have no effect on characters on the other side. (Also, see <i>Section 4.3,
    Directionality</i> on the handling of CR, LF, and CRLF)&quot;</td>
  </tr>

</table>

<h4>
<a NAME="Bidirectional Character Types"></a>4.5.2 Bidirectional Character Types</h4>

<p>The following (together with a change to Reordering Resolved Levels)
clarifies how to implement the last paragraph of page 3-16.</p>

<table CELLSPACING="0" BORDER="2" CELLPADDING="9" WIDTH="100%">
  <tr>
    <td VALIGN="TOP"></font><i>Corrigenda</i>
    <p>p 3-17. Before Table 3-5, add:</p>
    <p>&quot;Combining marks are given the type of the preceding letter.&quot;</p>
    <p>p 4-11. After &quot;where there are gaps.&quot;, add:</p>
    <p>&quot;Combining marks are given the type of the preceding letter, and are not called
    out in this table either.&quot;</td>
  </tr>
</table>

<h4>
<a NAME="The Base Level"></a>4.5.3 The Base Level</h4>

<p>Several of the rules were corrected to say <i>embedding direction </i>rather
than<i> global direction</i>. The first term is more explicitly defined.</p>

<table CELLSPACING="0" BORDER="2" CELLPADDING="9" WIDTH="100%">
  <tr>

    <td VALIGN="TOP"></font><i>Corrigendum</i>
    <p>p 3-18. Before &quot;Explicit Levels
    and Directions&quot;, insert:</p>
    <p>&quot;The direction of the current embedding level (for a character in question) is
    called the <i>embedding direction</i>. It is L if the embedding level is even, and R if
    the embedding level is odd.&quot;</td>
  </tr>
</table>

<h4>
<a NAME="Terminating Embeddings and Overrides"></a>4.5.4 Terminating Embeddings and Overrides</h4>

<p>T6 incorrectly removed implicit and explicit directional formatting codes. The original
purpose of T6 was to allow the use of styles or style sheets instead of embedding or
override codes (see p. 3-22). T6 has been eliminated, and N4 has been changed instead (see
below).</p>

<table CELLSPACING="0" BORDER="2" CELLPADDING="9" WIDTH="100%">
  <tr>
    <td VALIGN="TOP"></font><i>Corrigendum</i>
    <p>p 3-19. T6 </p>
    <p>Delete T6.</td>
  </tr>
</table>

<h4>
<a NAME="Resolving Weak Types"></a>4.5.5 Resolving Weak Types</h4>

<p>P1 has been clarified to state that it applies to single characters, and P2 more
explicitly shows how to resolve a sequence of European terminators. </p>

<table CELLSPACING="0" BORDER="2" CELLPADDING="9" WIDTH="100%">
  <tr>
   <td VALIGN="TOP"></font><i>Corrigendum</i>
    <p>p 3-19. P1</p>
    <p>Change to &quot;P1. A single European separator between two European numbers changes to
    an European number. A single common separator between two numbers of the same type changes
    to that type.&quot;</p>
    <p>p 3-19. P2</p>
    <p>Change to &quot;P2. A sequence of European terminators adjacent to European numbers
    changes to all European numbers.</p>
    <p>ET, ET, EN <img src="../../pending/Arrow.gif" alt="Arrow.gif (79 bytes)" WIDTH="10" HEIGHT="7"> EN, EN, EN</p>
    <p>EN, ET, ET <img src="../../pending/Arrow.gif" alt="Arrow.gif (79 bytes)" WIDTH="10" HEIGHT="7"> EN, EN, EN</p>
    <p>AN, ET, EN <img src="../../pending/Arrow.gif" alt="Arrow.gif (79 bytes)" WIDTH="10" HEIGHT="7"> AN, EN,EN&quot;</p>
    <p>p 3-19. P3</p>
    <p>Add example at end.&nbsp; &quot;ET, AN <img src="../../pending/Arrow.gif" alt="Arrow.gif (79 bytes)" WIDTH="10" HEIGHT="7"> N, AN&quot;</font></td>
  </tr>
</table>

<h4>
<a NAME="Resolving Neutral Types (1)"></a>4.5.6 Resolving Neutral Types (1)</h4>

<p>The wording in N2 has been modified to use the embedding direction instead of the
global direction, and the confusing term &quot;letter&quot; has been changed to
&quot;character&quot; which makes it clear that strong R punctuation should be included.</p>

<table CELLSPACING="0" BORDER="2" CELLPADDING="9" WIDTH="100%">
  <tr>
    <td VALIGN="TOP"></font><i>Corrigenda</i>
    <p>p 3-19. N2 </p>
    <p>Replace &quot;global&quot; by &quot;embedding&quot;. </p>
    <p>p 3-20. N3 </p>
    <p>Change &quot;letter&quot; to &quot;character&quot; everywhere.</td>
  </tr>
</table>

<h4>
<a NAME="Resolving Neutral Types (2)"></a>4.5.7 Resolving Neutral Types (2)</h4>

<p>Since N4 describes the behavior of embedding codes, it has been moved to a more
appropriate place in the algorithm. It replaces T6 and now describes the behavior of
override codes as well.</p>

<table CELLSPACING="0" BORDER="2" CELLPADDING="9" WIDTH="100%">
  <tr>
    <td VALIGN="TOP"></font><i>Corrigenda</i>
    <p>p 3-19, 20. Move N4 to where T6 was.
    Change the number to T6, and change the wording and examples to:</p>
    <p>&quot;T6. In the following rules, an embedding or override code and its matching PDF
    act as if they were strong characters of the appropriate type. All unmatched PDFs are
    ignored. If two embeddings with the same level are adjacent, then the PDF terminating the
    first embedding and the code initiating the next embedding are ignored.</p>
    <p>LRO ... PDF <img src="../../pending/Arrow.gif" alt="Arrow.gif (79 bytes)" WIDTH="10" HEIGHT="7"> L ... L</font></p><p><font SIZE="3">LRE ... PDF <img src="../../pending/Arrow.gif" alt="Arrow.gif (79 bytes)" WIDTH="10" HEIGHT="7"> L ... L</font></p><p><font SIZE="3">RLO ... PDF <img src="../../pending/Arrow.gif" alt="Arrow.gif (79 bytes)" WIDTH="10" HEIGHT="7"> R ... R</font></p><p><font SIZE="3">RLE ... PDF <img src="../../pending/Arrow.gif" alt="Arrow.gif (79 bytes)" WIDTH="10" HEIGHT="7"> R ... R</font></p><p><font SIZE="3">RLE ... PDF, RLO ... PDF <img src="../../pending/Arrow.gif" alt="Arrow.gif (79 bytes)" WIDTH="10" HEIGHT="7">
    RLE ..., ... PDF&quot;</td>
  </tr>
</table>

<h4>
<a NAME="Resolving Implicit Levels"></a>4.5.8 Resolving Implicit Levels</h4>

<p>I1 and I2 have been modified to ensure that implementers will use the embedding
direction instead of the base direction. Also, although Table 3-7 refers to Sequence Type,
the wording was not clear that the rules applied to sequences. This is important in the
case of EN.</p>

<table CELLSPACING="0" BORDER="2" CELLPADDING="9" WIDTH="100%">
  <tr>
    <td VALIGN="TOP"></font><i>Corrigenda</i>
    <p>p 3-20, 21. I1 </p>
    <p>Replace &quot;global&quot; by &quot;embedding&quot;.</p>
    <p>Replace &quot;Numeric text (EN) goes up two levels unless preceded by left-to-right
    text.&quot; by:&nbsp; <br>
    </font></p><p><font SIZE="3">&quot;A sequence of one or more numeric types (EN) goes up two levels unless
    immediately preceded by left-to-right text.&quot;</p>
    <p>Change the example from &quot;(L) EN&quot; to &quot;(L) EN...EN&quot;</td>
  </tr>
</table>

<h4>
<a NAME="Reordering Resolved Levels"></a>4.5.9 Reordering Resolved Levels</h4>

<p>L1 incorrectly implied that there could be more than one block separator. This has been
corrected and more explanation is provided. </p>

<table CELLSPACING="0" BORDER="2" CELLPADDING="9" WIDTH="100%">
  <tr>
    <td VALIGN="TOP"></font><i>Corrigenda&nbsp;</i>
    <p>p 3-20. L1 </font></p>
    <p>Add to the end of the paragraph before L1:&nbsp;</p>
    <p>&quot;The process of breaking a paragraph into one or more lines that fit within
    particular bounds is outside the scope of the bidirectional algorithm. Where character
    shaping is involved, it can be somewhat more complicated (see pages 6-22 through 6-32).
    Logically there are the following steps:&nbsp;<ul>
      <li><p>The levels of the text are determined according to the bidi algorithm.</li>
    </ul><ul>
      <li><p>The characters are shaped into glyphs according to their context (<i>taking the
        embedding levels into account</i>).&nbsp;</li>
    </ul><ul>
      <li><p>The accumulated widths of those glyphs (<i>in logical order</i>) is used to determine
        line breaks.&nbsp;</li>
    </ul><ul>
     <li><p>The glyphs on each line are then separately reordered according to the rules L1 and L2
        below.&quot;&nbsp;</li>
    </ul><p>Change in L1, &quot;trailing white space (including block separators)&quot; to
    &quot;any trailing white space characters (including those of type B, S, and
    WS)&quot;.&nbsp;</p>
    <p>Add after L1, &quot;(Note: since a Block separator breaks lines, there will be at most
    one per line.)&quot;&nbsp;</p>
    <p>Before &quot;Bidirectional Conformance&quot;, add:</p>
    <p>&quot;Combining marks applied to a right-to-left base character will at this point <i>precede</i>
    their base character. See <i>Section 5.12 Rendering Non-Spacing Marks</i> for an
    illustration of this. If the rendering engine expects them to <i>follow</i> the base
    characters in the final display process, then the ordering of the marks and the base
    character will need to be reversed.&quot;</td>
 </tr>

</table>

<h4>
<a NAME="New Directional Properties"></a>4.5.10 Characters with New Directional Properties</h4>

<p>Certain characters have new bidirectional property definitions. To
improve the display of e-mail addresses and URLs, the directional types of U+0026
AMPERSAND and U+0040 COMMERCIAL AT have been changed from left-to-right to other neutral.
The directional type of U+002E FULL STOP&nbsp; has been changed from EUROPEAN NUMBER
SEPARATOR &nbsp; to COMMON NUMBER SEPARATOR to improve the display of decimal numbers;
U+2007 FIGURE SPACE has also been changed from EUROPEAN NUMBER SEPARATOR&nbsp; to COMMON
NUMBER SEPARATOR for consistency.</p>

<table CELLSPACING="0" BORDER="2" CELLPADDING="9" WIDTH="100%">
  <tr>
    <td VALIGN="TOP"></font><i>Corrigenda</i> 
    <p>p 4-11. Table 4.4 Bidirectional Character Types</p>
    <p>Remove the table entry &quot;Miscellaneous U+0026, U+0040&quot; from the strong
    left-to-right category.</p>
    <p>Remove the table entries &quot;Full Stop (Period) U+002E&quot; and &quot;Figure Space
    U+2007&quot; from the European Number Separator category.</p>
    <p>p 4-12. Table 4.4 Bidirectional Character Types</font></p>
    <p>Add the table entries &quot;Full Stop (Period) U+002E&quot; and &quot;Figure Space
    U+2007&quot; to the Common Number Separator category. </td>
  </tr>

</table>

<h3>
<a NAME="Apostrophe Semantics Errata"></a>4.6 Apostrophe Semantics Errata</h3>

<p>The following corrigenda clarify the semantics of different apostrophes, and correct
problems in the mapping tables from Windows and Macintosh code pages.</font>

<table CELLSPACING="0" BORDER="2" CELLPADDING="9" WIDTH="100%">
  <tr>
    <td VALIGN="TOP"></font><i>Corrigendum</i>   
    <p>p 6-3. Add at the end of <i><b>Loose versus Precise Semantics</b>:</I></p>
    <p>&quot;For historical reasons, U+0027 is a particularly overloaded character. In
    ASCII it is used to represent a punctuation mark (such as right single quotation mark,
    left single quotation mark, apostrophe punctuation, vertical line, or prime) or a modifier
    letter (such as apostrophe modifier or acute accent.) (Punctuation marks generally break
    words; modifier letters generally are considered part of a word.) In many systems it is
    always represented as a straight vertical line and can never represent a curly apostrophe
    or right quotation mark.</p>
    <p>In the case of an apostrophe,
      <UL><li>U+02BC MODIFIER LETTER APOSTROPHE is preferred where the character is to represent a
        modifier letter (for example, in transliterations to indicate a glottal stop.) In the
        latter case, it is also referred to as a <i>letter apostrophe.</I></li></UL>
    <UL><li>U+2019 RIGHT SINGLE QUOTATION MARKis preferred<i> </i>where the character is to
        represent a punctuation mark, as in <i>&quot;We&#146;ve been here before.&quot;</i> In the
        latter case, U+2019 is also referred to as a <i>punctuation apostrophe</i>.</UL>
    <p>In implementation, however, you cannot assume that users&#146; text
    always adheres to the distinction between these characters. The text may come from
    different sources, including mapping from other character sets that do not have this
    distinction between letter apostrophe and punctuation apostrophe/right single quotation
    mark. In that case, <i>all</i> of them will generally be represented by U+2019.</font></p><p><font SIZE="3">Where you are parsing text where such distinctions are important, you will still need
    to look at the context around the characters to help disambiguate the relevant
    semantics.&quot;</td>

  </tr>

</table>

<p>&nbsp;</p>

<table CELLSPACING="0" BORDER="2" CELLPADDING="9" WIDTH="100%">

  <tr>

    <td VALIGN="TOP"></font><i>Corrigendum</i>
    <p>p 7-7. Change character 0027 informative notes, second bullet to: </p>
    <p>&quot;preferred character for apostrophe is either 02BC &#145;MODIFIER LETTER
    APOSTROPHE or 2019 RIGHT SINGLE QUOTATION MARK (which also represents a punctuation
    apostrophe).&quot;</td>
  </tr>

</table>



<p>&nbsp;</p>



<table CELLSPACING="0" BORDER="2" CELLPADDING="9" WIDTH="100%">

  <tr>

    <td VALIGN="TOP"></font><i>Corrigendum</i>
    <p>p 7-37. Change character 02BC informative notes, third bullet to:</p>
    <p>&quot;this is the preferred character for letter apostrophe.&quot;</td>
  </tr>

</table>



<p>&nbsp;</p>



<table CELLSPACING="0" BORDER="2" CELLPADDING="9" WIDTH="100%">

  <tr>

    <td VALIGN="TOP"></font><i>Corrigendum</i>
    <p>p 7-155. Change character 2019 informative notes, first bullet to:</p>
    <p>&quot;this is the preferred character for quotation mark and punctuation
    apostrophe.&quot;</td>

  </tr>

</table>

<h3>
<a NAME="Typographic Errata"></a>4.7 Typographic Errata</h3>

<p>The following are typographic errors in the text of the standard.</p>

<table CELLSPACING="0" BORDER="2" CELLPADDING="9" WIDTH="100%">
  <tr>
    <td VALIGN="TOP"></font><i>Corrigenda</i>
    <p>pp 7-50..7-55. Change the page header to &quot;0400...Cyrillic...04FF&quot;.</p>
    <p>pp 7-66..7-70. Change the page header to &quot;0600...Arabic...06FF&quot;.</td>
  </tr>
</table>

<h3>
<a NAME="Glyph Errata"></a>4.8 Glyph Errata</h3>

<p>A number of glyphs have been corrected. The corrections are given here and can be found
on the Unicode Web site at:</p>
<p><A href="UnicodeTypos.html">http://www.unicode.org/unicode/uni2errata/UnicodeTypos.html</a></p>


<p>Additional glyph corrections will be posted to this site as available.</p>





<table BORDER="1" CELLSPACING="1" CELLPADDING="9" WIDTH="100%">

  <tr>

    <td VALIGN="TOP" COLSPAN="3"></font><i>Corrigenda</i></td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP"></font>05F1</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/unicode/uni2errata/U+05F1.gif" x-sas-useimagewidth x-sas-useimageheight align="top" WIDTH="17" HEIGHT="22"></td>

    <td WIDTH="73%" VALIGN="TOP">HEBREW LIGATURE YIDDISH VAV YOD</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">2603</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/unicode/uni2errata/U+2603.gif" x-sas-useimagewidth x-sas-useimageheight border="0" align="middle" WIDTH="19" HEIGHT="30"></td>

    <td WIDTH="73%" VALIGN="TOP">SNOWMAN</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">3085</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/unicode/uni2errata/U+3085.gif" width="32" height="32" x-sas-useimagewidth x-sas-useimageheight align="middle"></td>

    <td WIDTH="73%" VALIGN="TOP">HIRAGANA LETTER SMALL YU</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA0E</td>  
    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA0E.gif" alt="UFA0D.gif (172 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA0F</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA0F.gif" alt="UFA0F.gif (171 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA10</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA10.gif" alt="UFA10.gif (173 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA11</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA11.gif" alt="UFA11.gif (181 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA12</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA12.gif" alt="UFA12.gif (172 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA13</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA13.gif" alt="UFA13.gif (175 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA14</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA14.gif" alt="UFA14.gif (174 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA15</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA15.gif" alt="UFA15.gif (181 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA16</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA16.gif" alt="UFA16.gif (174 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA17</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA17.gif" alt="UFA17.gif (164 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA18</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA18.gif" alt="UFA18.gif (168 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA19</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA19.gif" alt="UFA19.gif (174 bytes)" WIDTH="32" HEIGHT="32"></td>
    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA1A</font></td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA1A.gif" alt="UFA1A.gif (169 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA1B</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA1B.gif" alt="UFA1B.gif (171 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA1C</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA1C.gif" alt="UFA1C.gif (173 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA1D</font></td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA1D.gif" alt="UFA1D.gif (178 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA1E</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA1E.gif" alt="UFA1E.gif (173 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA1F</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA1F.gif" alt="UFA1F.gif (174 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA20</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA20.gif" alt="UFA20.gif (177 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA21</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA21.gif" alt="UFA21.gif (174 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA22</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA22.gif" alt="UFA22.gif (175 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA23</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA23.gif" alt="UFA23.gif (170 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA24</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA24.gif" alt="UFA24.gif (167 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA25</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA25.gif" alt="UFA25.gif (172 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA26</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA26.gif" alt="UFA26.gif (176 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA27</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA27.gif" alt="UFA27.gif (175 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA28</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA28.gif" alt="UFA28.gif (174 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA29</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA29.gif" alt="UFA29.gif (175 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA2A</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA2A.gif" alt="UFA2A.gif (175 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA2B</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA2B.gif" alt="UFA2B.gif (176 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA2C</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA2C.gif" alt="UFA2C.gif (174 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

  <tr>

    <td WIDTH="18%" VALIGN="TOP">FA2D</td>

    <td VALIGN="top" align="center"><img src="http://www.unicode.org/img/CJKtr8/UFA2D.gif" alt="UFA2D.gif (181 bytes)" WIDTH="32" HEIGHT="32"></td>

    <td WIDTH="73%" VALIGN="TOP">CJK Compatibility Ideograph</td>

  </tr>

</table>

<h3>
<a NAME="UTF-7 Sample Code Correction"></a>4.9 UTF-7 Sample Code Correction</h3>

<p>The UTF-7 specification was unclear on one point, which led to an error in the sample
code for converting from UCS-2 to UTF-7. The problem occurs when U+002D HYPHEN-MINUS
follows a character that must be encoded. Because ASCII 0x2D is the terminating character
for an encoded sequence, two 0x2D characters must be output in order to preserve the
U+002D when converting back to Unicode.</p>

<p>RFC 2152 has been published with a revised version of the UTF-7 specifications. The
file included with the CD-ROM file has been updated with this fix.</p>

<table CELLSPACING="0" BORDER="2" CELLPADDING="9" WIDTH="100%">
  <tr>

    <td VALIGN="TOP"></TABLE><i>Corrigenda</i><p><font SIZE="3">p A-5. The correction is in the
    code near the bottom of the page. The new text is highlighted.</FONT></p>
    <pre>if (!needshift)

{
    /* Write the explicit shift out character if
        1) The caller has requested that we always do it, or
        2) The directly encoded character is in the
        base64 set, or
        <strong>3) The directly encoded character is SHIFT_OUT.</strong>
        */

    if (verbose || ((!done) &amp;&amp; (invbase64[r] &gt;=0
       <strong>|| r == SHIFT_OUT)))</strong>
    {
        TARGETCHECK;
        *target++ = SHIFT_OUT
    }
    shifted = 0;
}</pre>

<h2>
<a NAME="Unicode Character Database and Properties Changes"></a>5 Unicode Character Database and Properties Changes</h2>

<p>In addition to including the properties for the object replacement character and the
euro sign, the Unicode Technical Committee has approved changes to the Unicode Character
Database to reconcile problems found in an analysis of the character categories, and to
make new distinctions in the database for use in identifiers. The property changes reflect
the following:</p>

</font><ol>

  <li><p><font SIZE="3">Encoding of U+20AC EURO SIGN and U+FFFC OBJECT REPLACEMENT CHARACTER</font></li>

  <li><p><font SIZE="3">Removing space, white space and delimitation as characteristics of U+FEFF</font></li>

  <li><p><font SIZE="3">Narrowing the concept of white space to avoid miscellaneous ignorable Unicode controls
    and the Unicode NULL.</font></li>

  <li><p><font SIZE="3">Mandated changes in directional properties, expanded to compatibility forms for
    consistency</font></li>

</ol><p><font SIZE="3">The details are given in the following table:</font>



<table BORDER="1" CELLSPACING="1" CELLPADDING="9" WIDTH="100%">

  <tr>

    <td WIDTH="28%" VALIGN="TOP"><font SIZE="3">Space</font></td>

    <td WIDTH="72%" VALIGN="TOP"><font SIZE="3">Remove FEFF</font></td>

  </tr>

  <tr>

    <td WIDTH="28%" VALIGN="TOP"><font SIZE="3">White space</font></td>

    <td WIDTH="72%" VALIGN="TOP"><font SIZE="3">Remove 0000, 200C..200F,202A..202E,

    206A..206F, FEFF</font></td>

  </tr>

  <tr>

    <td WIDTH="28%" VALIGN="TOP"><font SIZE="3">Punctuation</font></td>

    <td WIDTH="72%" VALIGN="TOP"><font SIZE="3">Add 00B7</font></td>

  </tr>

  <tr>

    <td WIDTH="28%" VALIGN="TOP"><font SIZE="3">Delimiter</font></td>

    <td WIDTH="72%" VALIGN="TOP"><font SIZE="3">Remove FEFF</font></td>

  </tr>

  <tr>

    <td WIDTH="28%" VALIGN="TOP"><font SIZE="3">Currency Symbol</font></td>

    <td WIDTH="72%" VALIGN="TOP"><font SIZE="3">Add 20AC</font></td>

  </tr>

  <tr>

    <td WIDTH="28%" VALIGN="TOP"><font SIZE="3">Bidi: Left-to-Right</font></td>

    <td WIDTH="72%" VALIGN="TOP"><font SIZE="3">Remove 0026, 0040, FE60, FE6B, FF06, FF20</font></td>

  </tr>

  <tr>

    <td WIDTH="28%" VALIGN="TOP"><font SIZE="3">Bidi: Eur Num Term</font></td>

    <td WIDTH="72%" VALIGN="TOP"><font SIZE="3">Add 20AC</font></td>

  </tr>

  <tr>

    <td WIDTH="28%" VALIGN="TOP"><font SIZE="3">Bidi: Eur Num Sep</font></td>

    <td WIDTH="72%" VALIGN="TOP"><font SIZE="3">Remove 002E, 2007, FE52, FF0E</font></td>

  </tr>

  <tr>

    <td WIDTH="28%" VALIGN="TOP"><font SIZE="3">Bidi: Common Sep </font></td>

    <td WIDTH="72%" VALIGN="TOP"><font SIZE="3">Add 002E, 2007, FE52, FF0E</font></td>

  </tr>

  <tr>

    <td WIDTH="28%" VALIGN="TOP"><font SIZE="3">Bidi: Other Neutrals</font></td>

    <td WIDTH="72%" VALIGN="TOP"><font SIZE="3">Add 0026, 0040, FE60, FE6B, FF06, FF20</font></td>

  </tr>

  <tr>

    <td WIDTH="28%" VALIGN="TOP"><font SIZE="3">Unassigned Code Value</font></td>

    <td WIDTH="72%" VALIGN="TOP"><font SIZE="3">Remove 20AC, FFFC</font></td>

  </tr>

</table>

<p><font SIZE="3">This new information is reflected in the newest version of the Unicode Character
Database and the additional properties files in the Unicode 2.1 Update directory on the
unicode.org ftp site: </font></p><p><font SIZE="3"><a href="ftp://ftp.unicode.org/Public/2.1-Update/">ftp://ftp.unicode.org/Public/2.1-Update/</a></font></p><p><font SIZE="3">The 2.1 files in the update directory supersede the three 2.0 files on the CD-ROM,
which is distributed with <em>The Unicode Standard, Version 2.0</em>, and which are also
available at:</font></p><p><font SIZE="3"><a href="ftp://ftp.unicode.org/Public/UNIDATA">ftp://ftp.unicode.org/Public/UNIDATA</a>

</font><ul>
  <li><p><font SIZE="3">PROPS2.TXT (superseded by PropList-2.1.1.txt)</font></li>
</ul><ul>
  <li><p><font SIZE="3">UNIDAT2.TXT (superseded by UnicodeData-2.1.1.txt)</font></li>
</ul><ul>
  <li><p><font SIZE="3">README2.TXT (superseded by ReadMe-2.1.1.txt)</font></li>
</ul><p><font SIZE="3">A diff file cataloging the changes in the Unicode Character Database file is also
available: </font></p><p><font SIZE="3"><a href="ftp://ftp.unicode.org/Public/2.1-Update/diff2014v211.txt">ftp://ftp.unicode.org/Public/2.1-Update/diff2014v211.txt</a></font></p>

<h2>
<a NAME="Revisions"></a>Revisions</h2>

<h3>
<a NAME="Changes for Revision 3"></a>Changes for Revision 3</h3>

<p>Formatting corrections were made.</p>

<h3>
<a NAME="Changes for Revision 2"></a>Changes for Revision 2</h3>

<p>Correction of typographical and glyph errors as follows:</p>

<p>1. Typo in section 3.4 Identifier Errata, third line describing compatability low lines
corrected to read FE33, not FF33.</p>

<p>2. Glyph for U+FA0E in section 3.8 corrected.</p>

<p>3. Under 3.4 Identifier Errata, in the small unlined table towards the bottom, under
&quot;Coverage,&quot; second entry, changed &quot;Enclosing mark&quot; to &quot;Spacing
combining mark.&quot;</p>

<p>Internal hyperlinks added at beginning of document. </p>

<h3>
<a NAME="Changes for Revision 1"></a>Changes for Revision 1</h3>

<p><font SIZE="3">Correction of two typographical errors as follows: </FONT></p>

<p>1.&nbsp; In the section 3.9 &quot;UTF-7 Sample Code Correction&quot;, in the sentence,
&quot; The problem occurs when U+200D HYPHEN-MINUS follows a character that must be
encoded.&quot; &quot;U+200D&quot; corrected to read &quot;U+002D&quot;.</p>

<p>2.&nbsp; In section 3.6 the third corrigendum, &quot;p 7-37. Change character 02BC
informative notes, first bullet to:&quot; &nbsp; &quot;first&quot; corrected to read
&quot;third.&quot;</p>

<hr>
<p>Copyright � 1998-1999 Unicode, Inc. All Rights Reserved. The Unicode Consortium
makes no expressed or implied warranty of any kind, and assumes no liability
for errors or omissions. No liability is assumed for incidental and consequential
damages in connection with or arising out of the use of the information
or programs contained or accompanying this technical report.</p>
<p>Unicode and the Unicode logo are trademarks of Unicode, Inc., and are
registered in some jurisdictions.</p>

</body>

</html>

Rendered documentLive HTML preview