DerivedProperties-3.1.1.html
376 lines
Open Raw
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"

       "http://www.w3.org/TR/REC-html40/loose.dtd"> 

<html>

<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="Content-Language" content="en-us">
<meta name="GENERATOR" content="Microsoft FrontPage 4.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<meta name="keywords" content="unicode, normalization, composition, decomposition">
<meta name="description" content="Describes derived Unicode properties">
<title>UCD: Derived Character Properties</title>
<link rel="stylesheet" type="text/css" href="http://www.unicode.org/unicode.css">
</head>

<body bgcolor="#ffffff">

<table width="100%" cellpadding="0" cellspacing="0" border="0">
  <tr>
    <td>
      <table width="100%" border="0" cellpadding="0" cellspacing="0">
        <tr>
          <td class="icon"><a href="http://www.unicode.org"><img border="0" src="http://www.unicode.org/webscripts/logo60s2.gif" align="middle" alt="[Unicode]" width="34" height="33"></a>&nbsp;&nbsp;<a class="bar" href="UnicodeCharacterDatabase.html">Unicode 
            Character Database</a></td>
        </tr>
      </table>
    </td>
  </tr>
  <tr>
    <td class="gray">&nbsp;</td>
  </tr>
</table>
<h1>Derived Character Properties</h1>
<table height="87" cellspacing="2" cellpadding="0" width="100%" border="1">
  <tbody>
    <tr>
      <td valign="top" width="144">Revision</td>
      <td valign="top">3.1.1</td>
    </tr>
    <tr>
      <td valign="top" width="144">Authors</td>
      <td valign="top">Mark Davis</td>
    </tr>
    <tr>
      <td valign="top" width="144">Date</td>
      <td valign="top">2001-08-08</td>
    </tr>
    <tr>
      <td valign="top" width="144">This Version</td>
      <td valign="top"><a href="http://www.unicode.org/Public/3.1-Update1/DerivedProperties-3.1.1.html">http://www.unicode.org/Public/3.1-Update/DerivedProperties-3.1.0.html</a></td>
    </tr>
    <tr>
      <td valign="top" width="144">Previous Version</td>
      <td valign="top"><a href="http://www.unicode.org/Public/3.1-Update/DerivedProperties-3.1.0.html">http://www.unicode.org/Public/3.1-Update/DerivedProperties-3.1.0.html</a></td>
    </tr>
    <tr>
      <td valign="top" width="144">Latest Version</td>
      <td valign="top"><a href="http://www.unicode.org/Public/UNIDATA/DerivedProperties.html">http://www.unicode.org/Public/UNIDATA/DerivedProperties.html</a></td>
    </tr>
  </tbody>
</table>
<h3><br>
S<i>ummary</i></h3>
<blockquote>
  <p><i>This document describes the format and content of the main derived data 
  files in the Unicode Character Database (UCD).</i></p>
</blockquote>
<h3><i>Status</i></h3>
<blockquote>
  <p><i>The file and the files described herein are part of the Unicode 
  Character Database and governed by the <a href="#UCD_Terms">UCD Terms of Use</a> 
  given below.</i></p>
  <p><i>For general information on file formats and table formats, and the 
  implications of normative vs informative properties, see 
  UnicodeCharacterDatabase.html.</i></p>
  <p><i><b>Warning: </b>the information in this file does not completely 
  describe the use and interpretation of Unicode character properties and 
  behavior. It must be used in conjunction with the data in the other files in 
  the UCD, and relies on the notation and definitions supplied in <a href="http://www.unicode.org/unicode/standard/versions/Unicode3.0.html">The 
  Unicode Standard</a>. All chapter references are to Version 3.1.0 of the 
  standard.</i></p>
</blockquote>
<blockquote>
  <hr width="50%">
</blockquote>
<h2>Introduction</h2>
<p align="left">This document describes a number of data files in the Unicode 
Character database. These are the Derived data files, containing information 
that can be completely derived from other data files, but is presented in a 
different format for ease of use.</p>
<p align="left">The files themselves are informative, although they may contain 
normative properties. For more information, see UnicodeCharacterDatabase.html.</p>
<h2>Derived Core Properties</h2>
<p>The following are important derived properties of Unicode characters, and are 
contained in DerivedCoreProperties.txt.</p>
<div align="center">
  <center>
  <table border="1" cellspacing="0" cellpadding="3" class="smallText">
    <tr>
      <th valign="top" align="left">Property Value</th>
      <th valign="top">N/I</th>
      <th>Definition and Generation</th>
    </tr>
    <tr>
      <th valign="top" align="left">Math</th>
      <th valign="top">I</th>
      <td valign="top">Characters with the Math property. For more information, 
        see <a href="http://www.unicode.org/unicode/uni2book/ch04.pdf">Chapter 
        4, Character Properties</a>.<br>
        <i>Generated from: Sm + Other_Math</i></td>
    </tr>
    <tr>
      <th valign="top" align="left">Alphabetic</th>
      <th valign="top">I</th>
      <td valign="top">Characters with the Alphabetic property. For more 
        information, see <a href="http://www.unicode.org/unicode/uni2book/ch04.pdf">Chapter 
        4, Character Properties</a>.<br>
        <i>Generated from: Lu+Ll+Lt+Lm+Lo+ Other_Alphabetic</i></td>
    </tr>
    <tr>
      <th valign="top" align="left">Lowercase</th>
      <th valign="top">I</th>
      <td valign="top">Characters with the Lowercase property. For more 
        information, see <a href="http://www.unicode.org/unicode/uni2book/ch04.pdf">Chapter 
        4, Character Properties</a> and <a href="http://www.unicode.org/unicode/reports/tr21/">UTR 
        #21: Case Mappings</a>.<br>
        <i>Generated from: Ll + Other_Lowercase</i></td>
    </tr>
    <tr>
      <th valign="top" align="left">Uppercase</th>
      <th valign="top">I</th>
      <td valign="top">Characters with the Uppercase property. For more 
        information, see <a href="http://www.unicode.org/unicode/uni2book/ch04.pdf">Chapter 
        4, Character Properties</a> and <a href="http://www.unicode.org/unicode/reports/tr21/">UTR 
        #21: Case Mappings</a>.<br>
        <i>Generated from: Lu + Other_Uppercase</i></td>
    </tr>
    <tr>
      <th valign="top" align="left">ID_Start</th>
      <th valign="top">I</th>
      <td valign="top">Characters that can start an identifier.<br>
        <i>Generated from Lu+Ll+Lt+Lm+Lo+Nl</i></td>
    </tr>
    <tr>
      <th valign="top" align="left">ID_Continue</th>
      <th valign="top">I</th>
      <td valign="top">Characters that can continue an identifier. See <a href="#Cf_Note">Cf 
        Note</a>.<br>
        <i>Generated from: ID_Start + Mn+Mc+Nd+Pc</i></td>
    </tr>
    <tr>
      <th valign="top" align="left">XID_Start</th>
      <th valign="top">I</th>
      <td valign="top">Same as ID_Start, except for modifications to allow 
        closure under normalization forms NFKC and NFKD.<br>
        <i>Generated from: ID_Start; see <a href="#Closure_Note">Closure Note</a></i></td>
    </tr>
    <tr>
      <th valign="top" align="left">XID_Continue</th>
      <th valign="top">I</th>
      <td valign="top">Same as ID_Continue, except for modifications to allow 
        closure under normalization forms NFKC and NFKD. See <a href="#Closure_Note">Closure 
        Note</a> and <a href="#Cf_Note">Cf Note</a>.<br>
        <i>Generated from: ID_Continue; see <a href="#Closure_Note">Closure Note</a></i></td>
    </tr>
  </table>
  </center>
</div>
<blockquote>
  <p><b><a name="Closure_Note">Closure Note</a>: </b>XID_Start and XID_Continue 
  are defined by adding or removing certain special characters as per <a href="http://www.unicode.org/unicode/reports/tr15/#Programming%20Language%20Identifiers">UAX 
  #15, Annex 7</a>. They do <i><b>not</b></i> remove the non-NFKD nor the 
  non_NFKC characters; if that is desired it needs to be a separate filter. They 
  merely ensure that:</p>
  <p align="center">if <code>isIdentifer(string)<br>
  </code>then <code>isIdentifier(NFKC(string))<br>
  </code>and <code>isIdentifier(NFKD(string))</code></p>
  <p><b><a name="Cf_Note">Cf Note</a>: </b>The general category Cf characters 
  are not included in ID_Continue nor in XID_Continue; they should continue 
  identifiers, but be filtered out of the result.</p>
</blockquote>
<p>For more information on identifiers, see <a href="http://www.unicode.org/unicode/uni2book/ch05.pdf">Chapter 
5, Implementation Guidelines</a>, and <a href="http://www.unicode.org/unicode/reports/tr15/#Programming%20Language%20Identifiers">UAX 
#15, Annex&nbsp;7</a>.</p>
<h2>Derived Extracted Properties</h2>
<p>The following files contain other properties of the UCD that are simply 
separated out, and listed in range format. These files are provided purely as a 
reformatting of existing data, with a certain exceptions listed below.</p>
<table border="1" width="100%" cellspacing="0" cellpadding="4">
  <tr>
    <th>&quot;.txt&quot; Files</th>
    <th valign="top">N/I</th>
    <th>Definition and Generation</th>
  </tr>
  <tr>
    <td valign="top">DerivedGeneralCategory</td>
    <td align="center" valign="top">N</td>
    <td>From UnicodeData.txt, field 2</td>
  </tr>
  <tr>
    <td valign="top">DerivedCombiningClass</td>
    <td align="center" valign="top">N</td>
    <td>From UnicodeData.txt, field 3</td>
  </tr>
  <tr>
    <td valign="top">DerivedBidiClass</td>
    <td align="center" valign="top">N</td>
    <td>From UnicodeData.txt, field 4</td>
  </tr>
  <tr>
    <td valign="top">DerivedDecompositionType</td>
    <td align="center" valign="top">*</td>
    <td>From the &lt;tag&gt; in UnicodeData.txt, field 5. For characters with 
      canonical decomposition mappings (no tag), the value &quot;canonical&quot; 
      is used.
      <p>* The value &quot;canonical&quot; is normative; the others are 
      informative.</p>
    </td>
  </tr>
  <tr>
    <td valign="top">DerivedNumericType</td>
    <td align="center" valign="top">N</td>
    <td>The property value is is based on the contents of UnicodeData.txt, 
      fields 6 through&nbsp;8:
      <div align="center">
        <center>
        <table border="1" cellspacing="0" cellpadding="4">
          <tr>
            <th width="50%">property value</th>
            <th width="50%">non-empty fields</th>
          </tr>
          <tr>
            <td width="50%">decimal</td>
            <td width="50%">6, 7, &amp; 8</td>
          </tr>
          <tr>
            <td width="50%">digit</td>
            <td width="50%">7 &amp; 8</td>
          </tr>
          <tr>
            <td width="50%">numeric</td>
            <td width="50%">8</td>
          </tr>
        </table>
        </center>
      </div>
    </td>
  </tr>
  <tr>
    <td valign="top">DerivedNumericValues</td>
    <td align="center" valign="top">N</td>
    <td>From UnicodeData.txt, field 8</td>
  </tr>
  <tr>
    <td valign="top">DerivedBinaryProperties</td>
    <td align="center" valign="top">N</td>
    <td>From UnicodeData.txt, field 9. See <a href="#Bidi_Note">Bidi Note</a>.</td>
  </tr>
  <tr>
    <td valign="top">DerivedEastAsianWidth</td>
    <td align="center" valign="top">I</td>
    <td>From EastAsianWidth.txt, field 1</td>
  </tr>
  <tr>
    <td valign="top">DerivedLineBreak</td>
    <td align="center" valign="top">*</td>
    <td>From LineBreak.txt, field 1.
      <p>* Some values are normative; some are informative. See UTR #11: Line 
      Break Property for more information.</td>
  </tr>
  <tr>
    <td valign="top">DerivedJoiningType</td>
    <td align="center" valign="top">N</td>
    <td>From ArabicShaping.txt, field 1</td>
  </tr>
  <tr>
    <td valign="top">DerivedJoiningGroup</td>
    <td align="center" valign="top">N</td>
    <td>From ArabicShaping.txt, field 2</td>
  </tr>
</table>
<blockquote>
  <p><b><a name="Bidi_Note">Bidi Note</a>:</b> The BidiMirrored property and the 
  BidiMirroring property are different. The former is a normative property that 
  indicates whether characters are mirrored in a right-to-left context in the 
  Unicode Bidirectional Algorithm. The latter is an informative mapping of 
  BidiMirrored characters, where possible, to characters that normally have the 
  corresponding mirrored glyph.</p>
</blockquote>
<h2>Derived Normalization Properties</h2>
<p>The properties in DerivedNormalizationProperties.txt are useful in dealing 
with normalization forms. In the following table, NF* refers to one of NFD, NFC, 
NFKC, or NFKD.</p>
<table border="1" cellspacing="0" cellpadding="3" class="smallText">
  <tr>
    <th align="left">Property Value</th>
    <th>N/I</th>
    <th>Definition and Generation</th>
  </tr>
  <tr>
    <th valign="top" align="left">FNC</th>
    <th valign="top">N</th>
    <td valign="top">Characters that require extra mappings for closure under 
      Case Folding plus Normalization Form KC. Characters marked with this 
      property have a third field with the mapping in it. Generated with the 
      following:<font face="Courier" size="2" color="#000000">
      <pre>b = NFKC(Fold(a));
c = NFKC(Fold(b));
if (c != b) add mapping from a to c</pre>
      </font></td>
  </tr>
  <tr>
    <th valign="top" align="left">Comp_Ex</th>
    <th valign="top">N</th>
    <td valign="top">Characters that are excluded from composition: those 
      explicitly in CompositionExclusions.txt, plus:<br>
      <i>(3) Singleton Decompositions</i><br>
      <i>(4) Non-Starter Decompositions</i></td>
  </tr>
  <tr>
    <th valign="top" align="left">NF*_NO</th>
    <th valign="top">N</th>
    <td valign="top">Characters that cannot ever occur in NF*. See <a href="#QuickCheck_Note">QuickCheck 
      Note</a>.</td>
  </tr>
  <tr>
    <th valign="top" align="left">NF*_MAYBE</th>
    <th valign="top">N</th>
    <td valign="top">Characters that may occur in valid NF*, depending on the 
      context. See <a href="#QuickCheck_Note">QuickCheck Note</a>.</td>
  </tr>
  <tr>
    <th valign="top" align="left">NF*_Expands</th>
    <th valign="top">N</th>
    <td valign="top">Characters that expand to more than one character in the 
      specified normalization form.</td>
  </tr>
</table>
<blockquote>
  <p><b><a name="QuickCheck_Note">QuickCheck Note</a>:</b> A previous version of 
  this data was in NormalizationQuickCheck.txt. For more information, see <a href="http://www.unicode.org/unicode/reports/tr15/#Annex8">UAX 
  #15 Annex&nbsp;8</a>.)</p>
</blockquote>
<h2><i><a name="UCD_Terms">UCD Terms of Use</a></i></h2>
<h3><i>Disclaimer</i></h3>
<blockquote>
  <p><i>The Unicode Character Database is provided as is by Unicode, Inc. No 
  claims are made as to fitness for any particular purpose. No warranties of any 
  kind are expressed or implied. The recipient agrees to determine applicability 
  of information provided. If this file has been purchased on magnetic or 
  optical media from Unicode, Inc., the sole remedy for any claim will be 
  exchange of defective media within 90 days of receipt.</i></p>
  <p><i>This disclaimer is applicable for all other data files accompanying the 
  Unicode Character Database, some of which have been compiled by the Unicode 
  Consortium, and some of which have been supplied by other sources.</i></p>
</blockquote>
<h3><i>Limitations on Rights to Redistribute This Data</i></h3>
<blockquote>
  <p><i>Recipient is granted the right to make copies in any form for internal 
  distribution and to freely use the information supplied in the creation of 
  products supporting the Unicode<sup>TM</sup> Standard. The files in the 
  Unicode Character Database can be redistributed to third parties or other 
  organizations (whether for profit or not) as long as this notice and the 
  disclaimer notice are retained. Information can be extracted from these files 
  and used in documentation or programs, as long as there is an accompanying 
  notice indicating the source.</i></p>
</blockquote>
<hr width="50%">
<p align="center"><a href="http://www.unicode.org/unicode/copyright.html"><img src="http://www.unicode.org/img/hb_home.gif" border="0" alt="Home" width="40" height="49"><img src="http://www.unicode.org/img/hb_mid.gif" border="0" alt="Terms of Use" width="152" height="49"><img src="http://www.unicode.org/img/hb_mail.gif" border="0" alt="E-mail" width="46" height="49"></a>

</body>

</html>