PropList-3.2.0.html
306 lines
Open Raw
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> 
<html>

<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="Content-Language" content="en-us">
<meta name="GENERATOR" content="Microsoft FrontPage 4.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<meta name="keywords" content="unicode, normalization, composition, decomposition">
<meta name="description" content="Describes PropList.html">
<title>UCD: Extended Character Properties</title>
<link rel="stylesheet" type="text/css" href="http://www.unicode.org/reports/reports.css">
</head>

<body bgcolor="#ffffff">

<table class="header" width="100%">
  <tr>
    <td class="icon"><a href="http://www.unicode.org"><img align="middle" alt="[Unicode]" border="0" src="http://www.unicode.org/webscripts/logo60s2.gif" width="34" height="33"></a>&nbsp;&nbsp;<a class="bar" href="UnicodeCharacterDatabase.html">Unicode 
      Character Database</a></td>
  </tr>
  <tr>
    <td class="gray">&nbsp;</td>
  </tr>
</table>
<blockquote>
  <h1>Extended Character Properties</h1>
  <table border="1" style="width:100%">
    <tbody>
      <tr>
        <td valign="top" width="144">Revision</td>
        <td valign="top">3.2.0</td>
      </tr>
      <tr>
        <td valign="top" width="144">Authors</td>
        <td valign="top">Mark Davis</td>
      </tr>
      <tr>
        <td valign="top" width="144">Date</td>
        <td valign="top">2002-03-22</td>
      </tr>
      <tr>
        <td valign="top" width="144">This Version</td>
        <td valign="top"><a href="http://www.unicode.org/Public/3.2-Update/PropList-3.2.0.html">http://www.unicode.org/Public/3.2-Update/PropList-3.2.0.html</a></td>
      </tr>
      <tr>
        <td valign="top" width="144">Previous Version</td>
        <td valign="top"><a href="http://www.unicode.org/Public/3.1-Update1/PropList-3.1.1.html">http://www.unicode.org/Public/3.1-Update1/PropList-3.1.1.html</a></td>
      </tr>
      <tr>
        <td valign="top" width="144">Latest Version</td>
        <td valign="top"><a href="http://www.unicode.org/Public/UNIDATA/PropList.html">http://www.unicode.org/Public/UNIDATA/PropList.html</a></td>
      </tr>
    </tbody>
  </table>
  <h3><i><br>
  Summary</i></h3>
  <blockquote>
    <p><i>This document describes the format and content of the PropList.txt 
    data file in the Unicode Character Database (UCD).</i></p>
  </blockquote>
  <h3><i>Status</i></h3>
  <blockquote>
    <p><i>The file and the files described herein are part of the Unicode 
    Character Database and governed by the <a href="#UCD_Terms">UCD Terms of Use</a> 
    given below.</i></p>
    <p><i>For general information on file formats and table formats, and the 
    implications of normative vs informative properties, see 
    UnicodeCharacterDatabase.html.</i></p>
    <p><i><b>Warning: </b>the information in this file does not completely 
    describe the use and interpretation of Unicode character properties and 
    behavior. It must be used in conjunction with the data in the other files in 
    the Unicode Character Database, and relies on the notation and definitions 
    supplied in <a href="http://www.unicode.org/standard/standard.html">The 
    Unicode Standard</a>. All chapter references are to Version 3.2.0 of the 
    standard unless otherwise indicated.</i></p>
  </blockquote>
  <hr width="50%">
  <h2>Introduction</h2>
  <p align="left">PropList.txt contains extended properties that supplement the 
  General Category property described in UnicodeData.html. Unlike the derived 
  properties, the properties in PropList.txt cannot be derived directly from 
  UnicodeData.txt or other data files of the UCD. These properties are listed in 
  the following table.</p>
  <p align="center"><i>All properties in this file are binary.</i></p>
  <blockquote>
    <p align="left"><b>Note: </b>The properties of the form Other_XXX are used 
    to generate properties in DerivedCoreProperties.txt. They are not intended 
    for general use, such as in APIs that return property values.</p>
  </blockquote>
  <div align="center">
    <center>
    <table border="1" cellspacing="0" cellpadding="3" class="smallText">
      <tr>
        <th>Property Value</th>
        <th>N/I</th>
        <th>Definition and Usage</th>
      </tr>
      <tr>
        <th valign="top" align="left">White_space</th>
        <th valign="top">N</th>
        <td valign="top">Space characters and those format control characters 
          (such as TAB, CR and LF) which should be treated by programming 
          languages as &quot;white space&quot; for the purpose of parsing 
          elements.
          <p><b>Note:</b> ZERO WIDTH SPACE and ZERO WIDTH NO-BREAK SPACE are not 
          included, since their functions are restricted to line-break control. 
          Their names are unfortunately misleading in this respect.</p>
          <p><b>Note: </b>There are other senses of &quot;whitespace&quot; that 
          encompass a different set of characters.</p>
        </td>
      </tr>
      <tr>
        <th valign="top" align="left">Bidi_Control</th>
        <th valign="top">N</th>
        <td valign="top">Those format control characters which have specific 
          functions in the Bidirectional Algorithm.</td>
      </tr>
      <tr>
        <th valign="top" align="left">Join_Control</th>
        <th valign="top">N</th>
        <td valign="top">Those format control characters which have specific 
          functions for control of cursive joining and ligation.</td>
      </tr>
      <tr>
        <th valign="top" align="left">ASCII_Hex_Digit</th>
        <th valign="top">N</th>
        <td valign="top">ASCII characters commonly used for the representation 
          of hexadecimal numbers.</td>
      </tr>
      <tr>
        <th valign="top" align="left">Dash</th>
        <th valign="top">I</th>
        <td valign="top">Those punctuation characters explicitly called out as 
          dashes in the Unicode Standard, plus compatibility equivalents to 
          those. Most of these have the Pd General Category, but some have the 
          Sm General Category because of their use in mathematics.</td>
      </tr>
      <tr>
        <th valign="top" align="left">Hyphen</th>
        <th valign="top">I</th>
        <td valign="top">Those dashes used to mark connections between pieces of 
          words, plus the Katakana middle dot. The Katakana middle dot functions 
          like a hyphen, but is shaped like a dot rather than a dash.</td>
      </tr>
      <tr>
        <th valign="top" align="left">Quotation_Mark</th>
        <th valign="top">I</th>
        <td valign="top">Those punctuation characters that function as quotation 
          marks.</td>
      </tr>
      <tr>
        <th valign="top" align="left">Terminal_Punctuation</th>
        <th valign="top">I</th>
        <td valign="top">Those punctuation characters that generally mark the 
          end of textual units.</td>
      </tr>
      <tr>
        <th valign="top" align="left">Other_Math</th>
        <th valign="top">I</th>
        <td valign="top">Used in deriving&nbsp; the Math property.</td>
      </tr>
      <tr>
        <th valign="top" align="left">Hex_Digit</th>
        <th valign="top">I</th>
        <td valign="top">Characters commonly used for the representation of 
          hexadecimal numbers, plus their compatibility equivalents.</td>
      </tr>
      <tr>
        <th valign="top" align="left">Other_Alphabetic</th>
        <th valign="top">I</th>
        <td valign="top">Used in deriving the Alphabetic property.</td>
      </tr>
      <tr>
        <th valign="top" align="left">Ideographic</th>
        <th valign="top">I</th>
        <td valign="top">Characters considered to be CJKV (Chinese, Japanese, 
          Korean, and Vietnamese) ideographs.</td>
      </tr>
      <tr>
        <th valign="top" align="left">Diacritic</th>
        <th valign="top">I</th>
        <td valign="top">Characters that linguistically modify the meaning of 
          another character to which they apply. Some diacritics are not 
          combining characters, and some combining characters are not 
          diacritics.</td>
      </tr>
      <tr>
        <th valign="top" align="left">Extender</th>
        <th valign="top">I</th>
        <td valign="top">Characters whose principal function is to extend the 
          value or shape of a preceding alphabetic character. Typical of these 
          are length and iteration marks.</td>
      </tr>
      <tr>
        <th valign="top" align="left">Other_Lowercase</th>
        <th valign="top">I</th>
        <td valign="top">Used in deriving the Lowercase property.</td>
      </tr>
      <tr>
        <th valign="top" align="left">Other_Uppercase</th>
        <th valign="top">I</th>
        <td valign="top">Used in deriving the Uppercase property.</td>
      </tr>
      <tr>
        <th valign="top" align="left">Noncharacter_Code_Point</th>
        <th valign="top">N</th>
        <td valign="top">Code points that are explicitly defined as illegal for 
          the encoding of characters. See <a href="http://www.unicode.org/unicode/reports/tr27/">Unicode 
          3.1</a> for more information.</td>
      </tr>
      <tr>
        <th valign="top" align="left">Other_Grapheme_Extend</th>
        <th valign="top">N</th>
        <td valign="top">Used in deriving&nbsp; the Grapheme_Extend property.</td>
      </tr>
      <tr>
        <th valign="top" align="left">Grapheme_Link</th>
        <th valign="top">N</th>
        <td valign="top">Used in determining default grapheme cluster 
          boundaries.
          <p>For more information, see <a href="http://www.unicode.org/unicode/reports/tr29/">UTR 
          #29: Text Boundaries</a> (in proposed draft status at publication of 
          Unicode 3.2).</td>
      </tr>
      <tr>
        <th valign="top" align="left">IDS_Binary_Operator<br>
          IDS_Trinary_Operator<br>
          Radical<br>
          Unified_Ideograph</th>
        <th valign="top">N</th>
        <td valign="top">For a machine-readable list of Ideographic Description 
          Sequences.
          <p>For more information, see <a href="http://www.unicode.org/unicode/reports/tr28/">Unicode 
          3.2</a>.</td>
      </tr>
      <tr>
        <th valign="top" align="left">Other_Default_Ignorable_Code_Point</th>
        <th valign="top">N</th>
        <td valign="top">Used in deriving the Default_Ignorable_Code_Point 
          property.</td>
      </tr>
      <tr>
        <th valign="top" align="left">Deprecated</th>
        <th valign="top">N</th>
        <td valign="top">For a machine-readable list of deprecated characters. 
          No characters will ever be removed from the standard, but the usage of 
          deprecated characters is strongly discouraged.
          <p>For more information, see <a href="http://www.unicode.org/unicode/reports/tr28/">Unicode 
          3.2</a>.</td>
      </tr>
      <tr>
        <th valign="top" align="left">Soft_Dotted</th>
        <th valign="top">N</th>
        <td valign="top">Characters with a &quot;soft dot&quot;, like <i>i</i> 
          or <i>j.</i> An accent placed on these characters causes the dot to 
          disappear. An explicit <i>dot above</i> can be added where required, 
          such as in Lithuanian.
          <p>For more information, see <a href="http://www.unicode.org/unicode/uni2book/ch07.pdf">Unicode 
          3.0, Chapter 7</a>, <i>Diacritics on i and j</i></td>
      </tr>
      <tr>
        <th valign="top" align="left">Logical_Order_Exception</th>
        <th valign="top">N</th>
        <td valign="top">There are a small number of characters that do not use 
          logical order. These characters require special handling in most 
          processing.
          <p>For more information, see <a href="http://www.unicode.org/unicode/reports/tr28/">Unicode 
          3.2</a>.</td>
      </tr>
    </table>
    </center>
  </div>
  <h2><i><a name="UCD_Terms"><br>
  UCD Terms of Use</a></i></h2>
  <h3><i>Disclaimer</i></h3>
  <blockquote>
    <p><i>The Unicode Character Database is provided as is by Unicode, Inc. No 
    claims are made as to fitness for any particular purpose. No warranties of 
    any kind are expressed or implied. The recipient agrees to determine 
    applicability of information provided. If this file has been purchased on 
    magnetic or optical media from Unicode, Inc., the sole remedy for any claim 
    will be exchange of defective media within 90 days of receipt.</i></p>
    <p><i>This disclaimer is applicable for all other data files accompanying 
    the Unicode Character Database, some of which have been compiled by the 
    Unicode Consortium, and some of which have been supplied by other sources.</i></p>
  </blockquote>
  <h3><i>Limitations on Rights to Redistribute This Data</i></h3>
  <blockquote>
    <p><i>Recipient is granted the right to make copies in any form for internal 
    distribution and to freely use the information supplied in the creation of 
    products supporting the Unicode<sup>TM</sup> Standard. The files in the 
    Unicode Character Database can be redistributed to third parties or other 
    organizations (whether for profit or not) as long as this notice and the 
    disclaimer notice are retained. Information can be extracted from these 
    files and used in documentation or programs, as long as there is an 
    accompanying notice indicating the source.</i></p>
  </blockquote>
  <hr width="50%">
  <p align="center"><a href="http://www.unicode.org/unicode/copyright.html"><img src="http://www.unicode.org/img/hb_home.gif" border="0" alt="Home" width="40" height="49"><img src="http://www.unicode.org/img/hb_mid.gif" border="0" alt="Terms of Use" width="152" height="49"><img src="http://www.unicode.org/img/hb_mail.gif" border="0" alt="E-mail" width="46" height="49"></a>
</blockquote>

</body>

</html>