tr31
rev 43Unicode Identifiers and Syntax
Open HTMLUpstream
tr31-43.html
3279 lines
Open Raw
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

<html>
<head><base href="https://www.unicode.org/reports/tr31/tr31-43.html">


<title>UAX #31: Unicode Identifiers and Syntax</title>
<link rel="stylesheet" type="text/css"
	href="https://www.unicode.org/reports/reports-v2.css">
<style type="text/css">
    .higher-resolved-level {
        background-color: palegreen;
    }
</style>
</head>
<body>

	<table class="header">
		<tr>
          <td class="icon" style="width:38px; height:35px">
          <a href="https://www.unicode.org/">
          <img border="0" src="https://www.unicode.org/webscripts/logo60s2.gif" align="middle"
          alt="[Unicode]" width="34" height="33"></a>
          </td>

          <td class="icon" style="vertical-align:middle">
          <a class="bar"> </a>
          <a class="bar" href="https://www.unicode.org/reports/"><font size="3">Technical Reports</font></a>
          </td>
		</tr>
		<tr>
			<td colspan="2" class="gray">&nbsp;</td>
		</tr>
	</table>

	<div class="body">
		<h2 class="uaxtitle">Unicode® Standard Annex #31</h2>
		<h1>Unicode Identifiers and Syntax</h1>
		<table class="simple" width="90%">
			<tr>
				<td width="20%">Version</td>
				<td>Unicode 17.0.0</td>
			</tr>
			<tr>
				<td>Editors</td>
				<td>Mark Davis (<a href="mailto:mark@unicode.org">mark@unicode.org</a>)
				and Robin Leroy (<a href="mailto:eggrobin@unicode.org">eggrobin@unicode.org</a>)</td>
			</tr>
			<tr>
				<td>Date</td>
				<td>2025-08-20</td>
			</tr>
			<tr>
				<td>This Version</td>
				<td>
				<a href="https://www.unicode.org/reports/tr31/tr31-43.html">
				https://www.unicode.org/reports/tr31/tr31-43.html</a></td>
			</tr>
			<tr>
				<td>Previous Version</td>
				<td>
				<a href="https://www.unicode.org/reports/tr31/tr31-41.html">
				https://www.unicode.org/reports/tr31/tr31-41.html</a></td>
			</tr>
			<tr>
				<td>Latest Version</td>
				<td><a href="https://www.unicode.org/reports/tr31/">https://www.unicode.org/reports/tr31/</a></td>
			</tr>
			<tr>
				<td>Latest Proposed Update</td>
				<td><a href="https://www.unicode.org/reports/tr31/proposed.html">https://www.unicode.org/reports/tr31/proposed.html</a></td>
			</tr>
			<tr>
				<td>Revision</td>
				<td><a href="#Modifications">43</a></td>
			</tr>
		</table>

		<h4>Summary</h4>
		<p>
			<i>This annex describes specifications for recommended defaults
				for the use of Unicode in the definitions of general-purpose identifiers, immutable identifiers, hashtag identifiers, and in
				pattern-based syntax. It also supplies guidelines for use of
				normalization with identifiers.</i>
		</p>
		<h4>Status</h4>

		<!-- NOT YET APPROVED 
			<p class="changed"><i>This is a<b><font color="#ff3333"> draft </font></b>document
				which may be updated, replaced, or superseded by other documents at
				any time. Publication does not imply endorsement by the Unicode
				Consortium. This is not a stable document; it is inappropriate to
				cite this document as other than a work in progress.
			</i></p>
		 END NOT YET APPROVED -->
		<!-- APPROVED -->
			<p><i>This document has been reviewed by Unicode members and other
				interested parties, and has been approved for publication by the
				Unicode Consortium. This is a stable document and may be used as
				reference material or cited as a normative reference by other
				specifications.</i></p>
		<!-- END APPROVED -->

		<blockquote>
			<p>
				<i><b>A Unicode Standard Annex (UAX)</b> forms an integral part
					of the Unicode Standard, but is published online as a separate
					document. The Unicode Standard may require conformance to normative
					content in a Unicode Standard Annex, if so specified in the
					Conformance chapter of that version of the Unicode Standard. The
					version number of a UAX document corresponds to the version of the
					Unicode Standard of which it forms a part.</i>
			</p>
		</blockquote>
		<p>
			<i>Please submit corrigenda and other comments with the online
				reporting form [<a href="https://www.unicode.org/reporting.html">Feedback</a>].
				Related information that is useful in understanding this annex is
				found in Unicode Standard Annex #41, “<a
				href="https://www.unicode.org/reports/tr41/tr41-36.html">Common
					References for Unicode Standard Annexes</a>.” For the latest version of
				the Unicode Standard, see [<a
				href="https://www.unicode.org/versions/latest/">Unicode</a>]. For a
				list of current Unicode Technical Reports, see [<a
				href="https://www.unicode.org/reports/">Reports</a>]. For more
				information about versions of the Unicode Standard, see [<a
				href="https://www.unicode.org/versions/">Versions</a>]. For any
				errata which may apply to this annex, see [<a
				href="https://www.unicode.org/errata/">Errata</a>].
			</i>
		</p>

		<h4 class="contents">Contents</h4>
		<ul class="toc">
			<li>1 <a href="#Introduction">Introduction</a>
				<ul class="toc">
					<li>Figure 1. <a
						href="#Figure_Code_Point_Categories_for_Identifier_Parsing">Code Point Categories for Identifier Parsing</a></li>
					<li>1.1 <a href="#Stability">Stability</a>
						<ul class="toc">
							<li>Table 1. <a href="#Table_Permitted_Changes_in_Future_Versions">Permitted Changes in Future Versions</a></li>
						</ul>
					</li>
					<li>1.2 <a href="#Customization">Customization</a></li>
					<li>1.3 <a href="#Display_Format">Display Format</a></li>
					<li>1.4 <a href="#Conformance">Conformance</a></li>
					<li>1.5 <a href="#Notation">Notation</a></li>
				</ul>
			</li>
			<li>2 <a href="#Default_Identifier_Syntax">Default Identifiers</a>
				<ul class="toc">
					<li>Table 2. <a href="#Table_Lexical_Classes_for_Identifiers">Properties for Lexical Classes for Identifiers</a></li>
					<li>2.1 <a href="#Combining_Marks">Combining Marks</a></li>
					<li>2.2 <a href="#Modifier_Letters">Modifier Letters</a></li>
					<li>2.3 <a href="#Layout_and_Format_Control_Characters">Layout
							and Format Control Characters</a></li>
					<li>2.4 <a href="#Specific_Character_Adjustments">Specific
							Character Adjustments</a>
						<ul class="toc">
							<li>Table 3. <a href="#Table_Optional_Start">Optional
									Characters for Start</a></li>
							<li>Table 3a. <a href="#Table_Optional_Medial">Optional
									Characters for Medial</a></li>
							<li>Table 3b. <a href="#Table_Optional_Continue">Optional Characters for Continue</a></li>
							<li>Table 4. <a href="#Table_Candidate_Characters_for_Exclusion_from_Identifiers">Excluded Scripts</a></li>
							<li>Table 5. <a href="#Table_Recommended_Scripts">Recommended Scripts</a></li>
							<li>Table 6. <a href="#Aspirational_Use_Scripts">Aspirational Use Scripts</a> (Withdrawn)</li>
							<li>Table 7. <a href="#Table_Limited_Use_Scripts">Limited Use Scripts</a></li>
						</ul>
					</li>
					<li>2.5 <a href="#Backward_Compatibility">Backward
							Compatibility</a></li>
				</ul>
			</li>
			<li>3 <a href="#Immutable_Identifier_Syntax">Immutable Identifiers</a></li>
			<li>4 <a href="#Whitespace_and_Syntax">Whitespace and Syntax</a>
				<ul class="toc">
					<li>4.1 <a href="#Whitespace">Whitespace</a>
						<ul class="toc">
							<li>4.1.1 <a href="#Bidirectional_Ordering">Bidirectional Ordering</a></li>
							<li>4.1.2 <a href="#Required_Spaces">Required_Spaces</a></li>
							<li>4.1.3 <a href="#Contexts_for_Ignorable_Format_Controls">Contexts for Ignorable Format Controls</a></li>
						</ul>
					</li>
					<li>4.2 <a href="#Syntax">Syntax</a>
						<ul class="toc">
							<li>4.2.1 <a href="#User-Defined_Operators">User-Defined Operators</a></li>
						</ul>
					</li>
					<li>4.3 <a href="#Pattern_Syntax">Pattern Syntax</a></li>
				</ul>
			</li>
			<li>5 <a href="#normalization_and_case">Normalization and
					Case</a>
				<ul class="toc">
					<li>5.1 <a href="#NFKC_Modifications">NFKC Modifications</a>
						<ul class="toc">
							<li>5.1.1 <a href="#Combining_Mark_Mods">Modifications
									for Characters that Behave Like Combining Marks</a></li>
							<li>5.1.2 <a href="#Irreg_Decomp_Mods">Modifications for
									Irregularly Decomposing Characters</a></li>
							<li>5.1.3 <a href="#Identifier_Closure">Identifier
									Closure Under Normalization</a>
								<ul class="toc">
									<li>Figure 5. <a href="#Figure_Normalization_Closure">Normalization Closure</a></li>
									<li>Figure 6. <a href="#Figure_Case_Closure">Case
											Closure</a></li>
									<li>Figure 7. <a href="#Figure_Reverse_Normalization_Closure">Reverse Normalization Closure</a></li>
									<li>Table 8. <a
										href="#Figure_Compatibility_Equivalents_to_Letters_or_Decimal_Numbers">Compatibility Equivalents to Letters or Decimal Numbers</a></li>
									<li>Table 9. <a
										href="#Figure_Canonical_Equivalence_Exceptions_Prior_to_Unicode_5.1">Canonical Equivalence Exceptions Prior to Unicode 5.1</a></li>
								</ul>
							</li>
						</ul>
					</li>
					<li>5.2 <a href="#Case_and_Stability">Case and Stability</a>
						<ul class="toc">
							<li>5.2.1 <a href="#Edge_Cases_for_Folding">Edge Cases
									for Folding</a></li>
						</ul>
					</li>
				</ul>
			</li>
			<li>6 <a href="#hashtag_identifiers">Hashtag Identifiers</a></li>
			<li>7 <a href="#Standard_Profiles">Standard Profiles</a>
				<ul class="toc">
				<li>7.1 <a href="#Mathematical_Compatibility_Notation_Profile">Mathematical Compatibility Notation Profile</a></li>
				<li>7.2 <a href="#Emoji_Profile">Emoji Profile</a></li>
				<li>7.3 <a href="#Default_Ignorable_Exclusion_Profile">Default Ignorable Exclusion Profile</a></li>
				</ul>
			</li>
			<li><a href="#Acknowledgments">Acknowledgments</a></li>
			<li><a href="#References">References</a></li>
			<li><a href="#Migration">Migration</a></li>
			<li><a href="#Modifications">Modifications</a></li>
		</ul>
		<hr>
		<h2>
			1 <a name="Introduction" href="#Introduction">Introduction</a>
		</h2>
		<p>
			A common task facing an implementer of the Unicode Standard is the
			provision of a parsing and/or lexing engine for identifiers, such as
			programming language variables or domain names.
				There are also realms where identifiers need to be defined with an extended set of
				characters to align better with what end users expect, such as in
				hashtags.
		</p>
		<p>
			To assist in the standard treatment of identifiers in Unicode
			character-based parsers and lexical analyzers, a set of
			specifications is provided here as a
				basis for parsing identifiers that contain Unicode characters. These specifications
				include:
		</p>
		<ul>
			<li><a
					href="#Default_Identifier_Syntax">Default Identifiers</a>: a
				recommended default for the definition of identifiers.</li>
			<li><a href="#Immutable_Identifier_Syntax">Immutable
					Identifiers</a>: for environments that need a definition of
				identifiers that does not change across versions of Unicode.</li>
			<li><a href="#hashtag_identifiers">Hashtag
					Identifiers</a>: for identifiers that need a broader set of
				characters, principally for hashtags.</li>
		</ul>
		<p>These guidelines follow the typical pattern of identifier
			syntax rules in common programming languages, by defining an ID_Start
			class and an ID_Continue class and using a simple BNF rule for
			identifiers based on those classes; however, the composition of those
			classes is more complex and contains additional types of characters,
			due to the universal scope of the Unicode Standard.</p>
		<p>
			This annex also provides guidelines for the use of normalization and
			case insensitivity with identifiers, expanding on a section that was
			originally in Unicode Standard Annex #15, “Unicode Normalization
			Forms” [<a href="../tr41/tr41-36.html#UAX15">UAX15</a>].
		</p>
		<p>
			Lexical analysis of computer languages is also concerned with lexical
			elements other than identifiers, and with white space and line breaks
			that separate them. This annex provides guidelines for the sets of
			characters that have such lexical significance outside of identifiers.
		</p>
		<p>
			The specification in this annex provides a definition of identifiers
			that is guaranteed to be backward compatible with each successive
			release of Unicode, but also allows any appropriate new Unicode
			characters to become available in identifiers. In addition, Unicode
			character properties for stable pattern syntax are provided. The
			resulting pattern syntax is backward compatible <i>and</i> forward
			compatible over future versions of the Unicode Standard. These
			properties can either be used alone or in conjunction with the
			identifier characters.
		</p>
		<p>
			<i>Figure 1</i> shows the disjoint categories of code points defined
			in this annex. (The sizes of the boxes are not to scale.)
		</p>
		<p class="caption">Figure 1. <a
						name="Figure_Code_Point_Categories_for_Identifier_Parsing"
						href="#Figure_Code_Point_Categories_for_Identifier_Parsing">Code
						Point Categories for Identifier Parsing</a></p>
		<div align="center">
			<table class="simple" cellpadding="20">
				<tr>
					<td style="vertical-align: middle; text-align: center">ID_Start<br>
						Characters
					</td>
					<td style="vertical-align: middle; text-align: center"
						class="lightblue">Pattern_Syntax<br> Characters
					</td>
					<td style="vertical-align: middle; text-align: center"
						class="medgray" rowspan="3" width="50%">Unassigned Code
						Points</td>
				</tr>
				<tr>
					<td style="vertical-align: middle; text-align: center">ID_Nonstart<br>
						Characters
					</td>
					<td style="text-align: center; vertical-align: middle"
						class="lightblue">Pattern_White_Space<br> Characters
					</td>
				</tr>
				<tr>
					<td style="text-align: center; vertical-align: middle" height="66"
						colspan="2" class="lightyellow">Other Assigned<br> Code
						Points<br></td>
				</tr>
			</table>
		</div>
		<p>
			The set consisting of the union of <i>ID_Start</i> and <i>ID_Nonstart</i>
			characters is known as <i>Identifier Characters</i> and has the
			property <i>ID_Continue</i>. The <i>ID_Nonstart</i> set is defined as
			the set difference <i>ID_Continue</i> minus <i>ID_Start</i>: it is
			not a formal Unicode property. While lexical rules are traditionally
			expressed in terms of the latter, the discussion here is simplified
			by referring to disjoint categories.
		</p>
		<h3>
			1.1 <a name="Stability" href="#Stability">Stability</a>
		</h3>
		<p>There are certain features that developers can depend on for
			stability:</p>
		<ul>
			<li>Identifier characters, Pattern_Syntax characters, and
				Pattern_White_Space are disjoint: they will never overlap.</li>
			<li>By definition, the Identifier characters are always a superset of the
				ID_Start characters.</li>
			<li>The Pattern_Syntax characters and Pattern_White_Space
				characters are immutable and will not change over successive
				versions of Unicode.</li>
			<li>The ID_Start and ID_Nonstart characters may grow over time,
				either by the addition of new characters provided in a future
				version of Unicode or (in rare cases) by the addition of characters
				that were in Other.</li>
		</ul>
		<p>
			In successive versions of Unicode, the only allowed changes of
			characters from one of the above classes to another are those listed
			with a plus sign (+) in <i>Table 1.</i>
		</p>
		<p class="caption">Table 1. <a name="Table_Permitted_Changes_in_Future_Versions"
						href="#Table_Permitted_Changes_in_Future_Versions">Permitted
						Changes in Future Versions</a></p>
		<div align="center">
			<table class="subtle" style="border-top: none; border-left: none">
				<tr>
					<td width="25%" style="border-top: none; border-left: none">&nbsp;</td>
					<th width="25%" style="text-align: center">ID_Start</th>
					<th width="25%" style="text-align: center">ID_Nonstart</th>
					<th width="25%" style="text-align: center">Other Assigned</th>
				</tr>
				<tr>
					<th>Unassigned</th>
					<td style="text-align: center"><b> <font size="4">+</font></b></td>
					<td style="text-align: center"><b> <font size="4">+</font></b></td>
					<td style="text-align: center"><b> <font size="4">+</font></b></td>
				</tr>
				<tr>
					<th>Other Assigned</th>
					<td style="text-align: center"><b> <font size="4">+</font></b></td>
					<td style="text-align: center"><b> <font size="4">+</font></b></td>
					<td style="text-align: center">&nbsp;</td>
				</tr>
				<tr>
					<th>ID_Nonstart</th>
					<td style="text-align: center"><b> <font size="4">+</font></b></td>
					<td style="text-align: center">&nbsp;</td>
					<td style="text-align: center">&nbsp;</td>
				</tr>
			</table>
		</div>
		<p>
			The Unicode Consortium has formally adopted a stability policy on
			identifiers. For more information, see [<a
				href="../tr41/tr41-36.html#Stability">Stability</a>].
		</p>
		<h3>
			1.2 <a name="Customization" href="#Customization">Customization</a>
		</h3>
		<p>Each programming language standard has its own identifier
			syntax; different programming languages have different conventions
			for the use of certain characters such as $, @, #, and _ in
			identifiers. To extend such a syntax to cover the full behavior of a
			Unicode implementation, implementers may combine those specific rules
			with the syntax and properties provided here.</p>
		<p>
			Each programming language can define its identifier syntax as <i>relative</i>
			to the Unicode identifier syntax, such as saying that identifiers are
			defined by the Unicode properties, with the addition of “$”. By
			addition or subtraction of a small set of language specific
			characters, a programming language standard can easily track a
			growing repertoire of Unicode characters in a compatible way. See
			also <i>Section 2.5, <a href="#Backward_Compatibility">Backward
					Compatibility</a></i>.
		</p>
		<p>Similarly, each programming language can define its own
			whitespace characters or syntax characters relative to the Unicode
			Pattern_White_Space or Pattern_Syntax characters, with some specified
			set of additions or subtractions.</p>
		<p>
			Systems that want to extend identifiers to encompass words used in
			natural languages, or narrow identifiers for security may do so as
			described in <i>Section 2.3, <a
				href="#Layout_and_Format_Control_Characters">Layout and Format
					Control Characters</a></i>, <i>Section 2.4, <a
				href="#Specific_Character_Adjustments">Specific Character
					Adjustments</a></i>, and <i>Section 5, <a
				href="#normalization_and_case">Normalization and Case</a></i>.
		</p>
		<p>
			To preserve the disjoint nature of the categories illustrated in <i>Figure
				1</i>, any character <i>added</i> to one of the categories must be <i>subtracted</i>
			from the others.
		</p>
		<blockquote>
			<p>
				<b>Note:</b> In many cases there are important
				security implications that may require additional constraints on
				identifiers. For more information, see [<a
					href="../tr41/tr41-36.html#UTR36">UTR36</a>].
			</p>
		</blockquote>
		<h3>
			1.3 <a name="Display_Format" href="#Display_Format">Display
				Format</a>
		</h3>
		<p>
			Implementations may use a format for <em>displaying</em> identifiers
			that differs from the internal form used to <em>compare</em>
			identifiers. For example, an implementation might display what
			the user has entered, but use a normalized format for comparison.
			Examples of this include:
		</p>
		<blockquote>
			<p>
				<strong>Case. </strong>The display format retains case differences,
				but the comparison format erases them by using Case_Folding. Thus
				“A” and its lowercase variant “a” would be treated as the same
				identifier internally, even though they may have been input
				differently and may display differently.
			</p>
			<p>
				<strong>Variants. </strong>The display format retains variant
				distinctions, such as halfwidth versus fullwidth forms, or between
				variation sequences and their base characters, but the comparison
				format erases them by using NFKC_Case_Folding. Thus “A” and its
				full-width variant “A” would be treated as the same identifier
				internally, even though they may have been input differently and may
				display differently.
			</p>
		</blockquote>
		<p>
			For an example of the use of display versus comparison formats see <em>UTS
				#46: Unicode IDNA Compatibility Processing</em> [<a
				href="../tr41/tr41-36.html#UTS46">UTS46</a>]. For more information
			about normalization and case in identifiers see <em>Section 5, <a
				href="#normalization_and_case">Normalization and Case</a></em>.
		</p>
		<h3>
			1.4 <a name="Conformance" href="#Conformance">Conformance</a>
		</h3>
		<p>The following describes the possible ways that an
			implementation can claim conformance to this specification.</p>
		<p>
			<b><a name="C1" href="#C1">UAX31-C1</a></b>. <i>An implementation
				claiming conformance to this specification shall identify the
				version of this specification.</i>
		</p>
		<blockquote>
			<b>Note:</b> An implementation can make use of the property-based definitions from a specific version of this
			specification with property assignments from an unversioned reference to the Unicode Character Database.
			In this case, the implementation should specify a minimum version of Unicode for the properties.
		</blockquote>
		<p>
			<b><a name="C2" href="#C2">UAX31-C2</a></b>. <i>An implementation
				claiming conformance to this specification shall describe which of
				the following requirements it observes:</i>
		</p>
		<ul>
			<li><a href="#R1">R1. Default Identifiers</a></li>
			<li><a href="#R1b">R1b. Stable Identifiers</a></li>
			<li><a href="#R2">R2. Immutable Identifiers</a></li>
			<li><a href="#R3">R3. Pattern_White_Space and Pattern_Syntax
					Characters</a></li>
			<li><a href="#R3a">R3a. Pattern_White_Space Characters</a></li>
			<li><a href="#R3b">R3b. Pattern_Syntax Characters</a></li>
			<li><a href="#R3c">R3c. Operator Identifiers</a></li>
			<li><a href="#R4">R4. Equivalent Normalized Identifiers</a></li>
			<li><a href="#R5">R5. Equivalent Case-Insensitive
					Identifiers</a></li>
			<li><a href="#R6">R6. Filtered Normalized Identifiers</a></li>
			<li><a href="#R7">R7. Filtered Case-Insensitive Identifiers</a></li>
			<li><a href="#R8">R8. Hashtag Identifiers</a></li>
		</ul>

		<blockquote>
			<b>Note:</b> Requirement <a href="#R1a">R1a</a> has been removed. The characters that were added when meeting
			this requirement are now part of the default; the contextual checks required by this
			requirement remain as part of the General Security Profile in Unicode Technical Standard #39, “Unicode Security Mechanisms” [<a href="../tr41/tr41-36.html#UTS39">UTS39</a>].
		</blockquote>

		<blockquote>
			<b>Note:</b> Meeting requirement R3 is equivalent to meeting requirements R3a and R3b.
		</blockquote>

		<h3>1.5 <a name="Notation" href="#Notation">Notation</a></h3>
		<p>This annex uses <em>UnicodeSet</em> notation to illustrate the derivation of
		some properties or sets of characters.
		This notation is defined in the
		<a href="https://www.unicode.org/reports/tr35/#Unicode_Sets">“Unicode Sets” section</a> of
		<i>UTS #35, Unicode Locale Data Markup Language</i>
		[<a href="../tr41/tr41-36.html#UTS35">UTS35</a>].</p>

		<h2>
			2 <a name="Default_Identifier_Syntax"
				href="#Default_Identifier_Syntax">Default Identifiers</a>
		</h2>

		<p>The formal syntax provided here captures the general intent
			that an identifier consists of a string of characters beginning with
			a letter or an ideograph, and followed by any number of letters,
			ideographs, digits, or underscores. It provides a definition of
			identifiers that is guaranteed to be backward compatible with each
			successive release of Unicode, but also adds any appropriate new
			Unicode characters.</p>
		<p>The formulations allow for extensions, also
			known as <em>profiles</em>. That is, the particular set of code points or sequences of code points for
			each category used by the syntax can be customized according to the
			requirements of the environment. Profiles are described
			as additions to or removals from the categories used by the syntax.
			They can thus be combined, provided that there are no conflicts (whereby one profile adds a character
			and another removes it), or that the resolution of such conflicts is specified.</p>
		<p>If such extensions include characters from Pattern_White_Space or
			Pattern_Syntax, then such identifiers do not conform to an unmodified
			<i><a href="#R3">UAX31-R3 Pattern_White_Space and Pattern_Syntax
				Characters</a></i>. However, such extensions may often be necessary. For
			example, Java and C++ identifiers include ‘$’, which is a
			Pattern_Syntax character.</p>
		<p>
			<b><a name="D1" href="#D1">UAX31-D1</a></b>. <b>Default
					Identifier Syntax:</b>
		</p>
		<blockquote>
			<p>
				<code>&lt;Identifier&gt; := &lt;Start&gt; &lt;Continue&gt;*
					(&lt;Medial&gt; &lt;Continue&gt;+)*</code>
			</p>
		</blockquote>


		<p>
			Identifiers are defined by assigning the
			sets of lexical classes defined as properties in the Unicode
			Character Database [<a href="../tr41/tr41-36.html#UAX44">UAX44</a>].
			These properties are shown in <i>Table 2</i>. The
			first column shows the property name, whose values are defined in
			the UCD. The second column provides a general description of the
			coverage for the associated class, the derivational relationship
			between the ID properties and the XID properties, and an associated
			UnicodeSet notation for the class.
		</p>
		<p class="caption">Table 2. <a name="Table_Lexical_Classes_for_Identifiers"
						href="#Table_Lexical_Classes_for_Identifiers">Properties for Lexical Classes for
						Identifiers</a></p>
		<div align="center">
			<table class="subtle">
				<tr>
					<th>Properties</th>
					<th>General Description of Coverage</th>
				</tr>
				<tr>
					<td><code>ID_Start </code></td>
					<td><code>ID_Start</code> characters
						are derived from the Unicode
						General_Category of uppercase letters, lowercase letters,
						titlecase letters, modifier letters, other letters, letter
						numbers, plus Other_ID_Start, minus Pattern_Syntax and
						Pattern_White_Space code points.<br> <br>In UnicodeSet notation:<br>
						[\p{L}\p{Nl}\p{Other_ID_Start}-\p{Pattern_Syntax}-\p{Pattern_White_Space}]</td>
				</tr>
				<tr>
					<td><code>XID_Start</code></td>
					<td><code>XID_Start</code> characters are
						derived from <code>ID_Start</code> as per <i>Section 5.1, <a
							href="#NFKC_Modifications">NFKC Modifications</a></i>.</td>
				</tr>
				<tr>
					<td><code>ID_Continue</code></td>
					<td><code>ID_Continue</code>
							characters include ID_Start characters, plus characters having the
						Unicode General_Category of nonspacing marks, spacing combining
						marks, decimal number, connector punctuation, plus
						Other_ID_Continue, minus Pattern_Syntax and Pattern_White_Space
						code points.<br> <br>In UnicodeSet notation:<br>
						[\p{ID_Start}\p{Mn}\p{Mc}\p{Nd}\p{Pc}\p{Other_ID_Continue}-\p{Pattern_Syntax}-\p{Pattern_White_Space}]</td>
				</tr>
				<tr>
					<td><code>XID_Continue</code></td>
					<td><code>XID_Continue</code>
							characters are derived from <code>ID_Continue</code> as per <i>Section
								5.1, <a href="#NFKC_Modifications">NFKC Modifications</a></i>.<br> <br>
								<code>XID_Continue</code>
							characters are also known simply as <i>Identifier Characters</i>,
						because they are a superset of the <code>XID_Start</code> characters.</td>
				</tr>
			</table>
		</div>
		<p>
			Note that “other letters” includes ideographs. For more about the
			stability extensions, see <em>Section 2.5 <a
				href="#Backward_Compatibility">Backward Compatibility</a></em>.<br>
		</p>
		<p>The innovations in the identifier syntax to cover the Unicode
			Standard include the following:</p>
		<ul>
			<li>Incorporation of proper handling of combining marks.</li>
			<li>Allowance for layout and format control characters, which
				should be ignored when parsing identifiers.</li>
		</ul>

		<p>
			The XID_Start and XID_Continue properties are improved lexical
			classes that incorporate the changes described in <i>Section 5.1,
				<a href="#NFKC_Modifications">NFKC Modifications</a></i>.
				They are recommended for most purposes, especially for security,
			over the original ID_Start and ID_Continue properties.
		</p>

		<p>
			<b><a name="R1" href="#R1">UAX31-R1</a></b>. <b>Default
					Identifiers:</b> <i>To meet this requirement, to determine whether a string
				is an identifier an implementation shall
				choose either <a href="#R1-1">UAX31-R1-1</a> or <a href="#R1-2">UAX31-R1-2</a>.</i>
		</p>
		<p>
			<b><a name="R1-1" href="#R1-1">UAX31-R1-1</a></b>.
			<i>Use definition <a href="#D1">UAX31-D1</a>, setting Start and
					Continue to the properties XID_Start and XID_Continue, respectively, and leaving Medial empty.</i>
		</p>
		<p>
			<b><a name="R1-2" href="#R1-2">UAX31-R1-2</a></b>.
			<i>Declare that it uses a <b>profile</b>
					of <a href="#R1-1">UAX31-R1-1</a>
					and define that profile with a precise specification of the
					characters and character sequences that are added to or removed from Start,
						Continue, and Medial and/or provide a list of additional
					constraints on identifiers.
			</i>
		</p>
		<blockquote>
			<b>Note:</b> Such a specification may incorporate a reference to one or more of the
			standard profiles described in <i>Section 7, <a href="#Standard_Profiles">Standard
			Profiles</a></i>.
		</blockquote>
		<p>One such profile may
		  be to use the contents of ID_Start and ID_Continue in place of
		  XID_Start and XID_Continue, for backward compatibility.</p>
	  <p>Another such profile would be to include  some set of
		    the optional characters, for example:
      <ul>
		      <li>Start := XID_Start, plus some characters
		        from <a href="#Table_Optional_Start">Table 3</a></li>
		      <li>Continue := Start + XID_Continue, plus some
		        characters from <a href="#Table_Optional_Continue">Table 3b</a></li>
		      <li>Medial := some characters from <a
							href="#Table_Optional_Medial">Table 3a</a></li>
      </ul>
      	<blockquote>
      		<p>
      		<b>Note:</b> Characters in the Medial class must not overlap with those in
      		either the Start or Continue classes.
      		Thus, any characters added to the Medial class from <i><a href="#Table_Optional_Medial">Table 3a</a></i>
			must be be checked to ensure they do not also occur in either the newly defined Start class
			or Continue class.
			</p>
		</blockquote>

		<p>
			Beyond such minor modifications, profiles could also be used to significantly extend the
			character set available in identifiers.
			In so doing, care must be taken not to unintentionally include undesired characters,
			or to violate important invariants.
		</p>
		<p>
			An implementation should be careful when adding a property-based set to a profile.
		</p>
		<p>
			For example, consider a profile that adds subscript and superscript digits and
			operators in order to support technical notations, such as:</p>
		<div align="center">
			<table class="subtle">
				<tr>
					<th>Context</th>
					<th>Example Identifier</th>
				</tr>
				<tr>
					<td>Assyriology</td>
					<td><code>dun₃⁺</code></td>
				</tr>
				<tr>
					<td>Chemistry</td>
					<td><code>Ca²⁺_concentration</code></td>
				</tr>
				<tr>
					<td>Mathematics</td>
					<td><code>xₖ₊₁</code> <i>or</i> <code>f⁽⁴⁾</code></td>
				</tr>
				<tr>
					<td>Phonetics</td>
					<td><code>daan⁶</code></td>
				</tr>
			</table>
		</div>
		<p>
			That profile may be described as adding the following set to XID_Continue:
		</p>
			<blockquote>
				<code>[⁽₍⁾₎⁺₊⁼₌⁻₋⁰₀¹₁²₂³₃⁴₄⁵₅⁶₆⁷₇⁸₈⁹₉]</code>.
			</blockquote>
			<blockquote>
				<b>Note:</b> The above list is for illustration only.
				A standard profile is provided to support the use of Mathematical Compatibility Notation Profile in identifiers.
				See <i>Section 7.1, <a href="#Mathematical_Compatibility_Notation_Profile">Mathematical Compatibility Notation Profile</a></i>.
			</blockquote>
		<p>
			If, instead of listing these characters explicitly, the profile had chosen to use
			properties or combinations of properties, that might result in including
			undesired characters.
		</p>
		<p>
			For example, <code>\p{General_Category=Other_Number}</code> is the general category set
			containing the subscript and superscript digits.
			But it also includes the compatibility characters [<code>⑴ 🄂 ⒈</code>], which are
			not needed for technical notations,
			and are very likely inappropriate for identifiers—on multiple counts.
		</p>
		<p>
			On the other hand, a language that allows currency symbols in identifiers could have
			<code>\p{General_Category=Currency_Symbol}</code> as a profile,
			since that property matches the intent.
		</p>
		<p>
			Similarly, a profile based on adding entire blocks is likely to include unintended characters,
			or to miss ones that are desired.
			For the use of blocks see <i>Annex A, Character Blocks</i>,
			in [<a href="../tr41/tr41-36.html#UTS18">UTS18</a>].
		</p>
		<p>
			Defining a profile by use of a property also needs to take account of the fact that
			unless the property is designed to be stable (such as XID_Continue),
			code points could be removed in a future version of Unicode.
			If the profile also needs stable identifiers (backwards compatible),
			then it must take additional measures.
			See <i><a href="#R1b">UAX31-R1b Stable Identifiers</a></i>.
		</p>
		<p>
			Implementations that require identifier closure
			under normalization should ensure that any custom profile preserves identifier closure
			under the chosen normalization form. See
			<i>Section 5.1.3, <a href="#Identifier_Closure">Identifier Closure Under Normalization</a></i>. The example cited above regarding subscripts and superscripts preserves identifier closure under
			Normalization Forms C and D, but <em>not</em> under Forms KC and KD.
			Under NFKC and NFKD, the subscript and superscript parentheses and operators normalize
			to their ASCII counterparts.
			If an implementation that uses this profile relies on identifier closure under normalization, it
			should conform to <a href="#R4">UAX31-R4</a> using NFC, not NFKC.
		</p>
		<blockquote>
			<b>Note:</b> While default identifiers are less open-ended than immutable identifiers,
			they are still subject to spoofing issues arising from invisible characters,
			visually identical characters, or bidirectional reordering causing distinct sequences to appear
			in the same order.
			Where spoofing concerns are relevant, the mechanisms described in
			Unicode Technical Standard #39, “Unicode Security Mechanisms” [<a href="../tr41/tr41-36.html#UTS39">UTS39</a>],
			should be used.
			For the specific case of programming languages and programming environments,
			recommendations are provided in
			Unicode Technical Standard #55, “Unicode Source Code Handling” [<a href="../tr41/tr41-36.html#UTS55">UTS55</a>].
		</blockquote>
	    <p>
			<b><a name="R1a" href="#R1a">UAX31-R1a</a></b>. <b>Restricted
					Format Characters:</b> <i>This clause has been removed.</i></p>
					<p>The characters that were added when meeting
			this requirement are now part of the default; the contextual checks required by this
			requirement remain as part of the General Security Profile in Unicode Technical Standard #39, “Unicode Security Mechanisms” [<a href="../tr41/tr41-36.html#UTS39">UTS39</a>].
		</p>

		<p>
			<b><a name="R1b" href="#R1b">UAX31-R1b</a></b>. <b>Stable
					Identifiers:</b> <i>To meet this requirement, an implementation shall
				guarantee that identifiers are stable across versions of the Unicode
				Standard: that is, once a string qualifies as an identifier, it does
				so in all future versions of the Unicode Standard.</i>
		</p>

		<blockquote>
			<p>
				<b>Note:</b> The UAX31-R1b requirement  is
				relevant when an identifier definition is based on property assignments from an
				unversioned reference to the Unicode Standard, as property assignments may
				change in a future version of the standard. It is typically achieved by using
				a small list of characters that qualified as identifier characters
				in some previous version of Unicode.
				See <i>Section 2.5, <a
					href="#Backward_Compatibility">Backward Compatibility</a></i>.
					Where profiles are allowed,
					management of those profiles may also be required to guarantee backwards
					compatibility. Typically such management also uses
					a list of characters that qualified previously.
					Because of the stability policy [<a href="../tr41/tr41-36.html#Stability">Stability</a>],
					if an implementation meets either requirement
					<a href="#R1">UAX31-R1</a> or <a href="#R2">UAX31-R2</a> without declaring a
					profile, that implementation also meets requirement UAX31-R1b.
			</p>
		</blockquote>
		<blockquote>
			<p>
				<b>Example:</b> Consider an identifier definition which uses
				<a href="#R1">UAX31-R1</a> default identifiers with a profile that adds digits
				(characters with General_Category=Nd) to the set <i>Start</i>, and uses an
				unversioned reference to the Unicode Character Database,
				with a minimum version of 5.2.0.
			</p>
			<p>
				With property assignments from Unicode Version 5.2.0, both
				<code>᧚</code> (U+19DA) and <code>A᧚</code> (U+0041, U+19DA) are valid identifiers
				under this definition: U+19DA has General_Category=Nd.
			</p>
			<p>
				In Unicode Version 6.0.0, U+19DA has General_Category=No.
				The identifier <code>A᧚</code> (U+0041, U+19DA)
				remains valid, because XID_Continue includes any characters that used to be XID_Continue.
				However, <code>᧚</code> is not a valid identifier, because U+19DA is no
				longer in the set [:Nd:].
			</p>
			<p>
				In order to meet requirement <a href="#R1b">UAX31-R1b</a>, the definition would
				need to be changed to add to the set <i>Start</i> all characters that have the
				property General_Category=Nd in any version of Unicode starting from Unicode 5.2.0
				and up to the version used by the implementation.
			</p>
		</blockquote>
		<h3>
			2.1 <a name="Combining_Marks" href="#Combining_Marks">Combining
				Marks</a>
		</h3>
		<p>
			Combining marks are accounted for in identifier syntax: a composed
			character sequence consisting of a base character followed by any
			number of combining marks is valid in an identifier. Combining marks
			are required in the representation of many languages, and the
			conformance rules in <i>Chapter 3, Conformance</i>, of [<a
				href="../tr41/tr41-36.html#Unicode">Unicode</a>] require the
			interpretation of canonical-equivalent character sequences. The
			simplest way to do this is to require identifiers in the NFC format
			(or transform them into that format); see <i>Section 5, <a
				href="#normalization_and_case">Normalization and Case</a></i>.
		</p>
		<p>
			Enclosing combining marks (such as U+20DD..U+20E0) are excluded from
			the definition of the
			lexical class
			<code>ID_Continue</code>,
			because the composite characters that result from their composition
			with letters are themselves not normally considered valid
			constituents of these identifiers.
		</p>
		<h3>
			2.2 <a name="Modifier_Letters" href="#Modifier_Letters">Modifier
				Letters</a>
		</h3>
		<p>
			Modifier letters (General_Category=Lm) are also included in the
			definition of the syntax classes for identifiers. Modifier letters
			are often part of natural language orthographies and are useful for
			making word-like identifiers in formal languages. On the other hand,
			modifier symbols (General_Category=Sk), which are seldom a part of
			language orthographies, are excluded from identifiers. For more
			discussion of modifier letters and how they function, see [<a
				href="../tr41/tr41-36.html#Unicode">Unicode</a>].
		</p>
		<p>Implementations that tailor identifier syntax for special
			purposes may wish to take special note of modifier letters, as in
			some cases modifier letters have appearances, such as raised commas,
			which may be confused with common syntax characters such as quotation
			marks.</p>
		<h3>
			2.3 <a name="Layout_and_Format_Control_Characters"
				href="#Layout_and_Format_Control_Characters">Layout and Format
				Control Characters</a>
		</h3>

		<p>
			Certain Unicode characters are known as
			Default_Ignorable_Code_Points. These include variation selectors and
			characters used to control joining behavior, bidirectional ordering
			control, and alternative formats for display (having the
			General_Category value of Cf). The use of
			default-ignorable characters in identifiers is problematic, first
			because the effects they represent are stylistic or otherwise out of
			scope for identifiers, and second because the characters themselves
			often have no visible display. It is also possible to misapply these
			characters such that users can create strings that look the same but
			actually contain different characters, which can create security
			problems. In environments where spoofing concerns are paramount, such as top-level domain names, identifiers should also be limited to
			characters that are case-folded and normalized with the NFKC_Casefold
			operation. For more information, see <i>Section 5, <a
				href="#normalization_and_case">Normalization and Case</a></i> and <i>UTR
				#36: Unicode Security Considerations</i> [<a
				href="../tr41/tr41-36.html#UTR36">UTR36</a>].
		</p>
	  <p> While not all Default_Ignorable_Code_Points are in XID_Continue, the variation selectors and joining controls <em>are</em> included in XID_Continue.
		  These variation selectors are used in standardized variation sequences, sequences from the Ideographic Variation Database, and emoji variation sequences.
			The joining controls are used in the orthographies of some languages, as well as in emoji ZWJ sequences.
		  However, these characters are subject to the same considerations as other Default_Ignorable_Code_Points listed above.
		  Because variation selectors and joining controls request a difference in display but do not guarantee it, they do not work well in general-purpose identifiers.
		  A profile should be used to remove them from general-purpose identifiers (along with other Default_Ignorable_Code_Points), unless their use is required in a particular domain, such as in a profile that includes emoji.
	      For such a profile it may be useful to explicitly retain or even add certain  Default_Ignorable_Code_Points in the identifier syntax.</p>
		<p>For programming language identifiers, spoofing issues are more comprehensively addressed by higher-level diagnostics rather than at the syntactic level. See Unicode Technical Standard #55, “Unicode Source Code Handling” [<a href="../tr41/tr41-36.html#UTS55">UTS55</a>].</p>
	  <p><b><em>Comparison.</em></b> In any environment where the display form for identifiers differs from the form used to compare them, Default_Ignorable_Code_Points should be ignored for comparison.
		  For example, this applies to case-insensitive identifiers.
		  For more information, see <em>Section 1.3, <a href="#Display_Format">Display Format</a></em>.</p>
			<blockquote>
			  <p><b>Notes:</b></p>
			  <ul><li>An implementation of <a href="#R4">UAX31-R4</a> and <a href="#R5">UAX31-R5</a> (Equivalent Case and Compatibility-Insensitive Identifiers) that compares identifiers under the <i>identifier caseless match</i> defined by D147 [<a href="../tr41/tr41-36.html#Unicode">Unicode</a>], that is, canonical decomposition (NFD) followed by the toNFKC_Casefold operation, ignores Default_Ignorable_Code_Points.</li>
			    <li>The  Default_Ignorable_Code_Point property values are not guaranteed to be stable.
					However, the derivation of the NFKC_Casefold property will be changed if necessary to ensure that it remains stable for default identifiers.
					That means that the toNFKC_Casefold operation applied to a string with only characters in XID_Continue in a version of Unicode will have the same results in any future version of Unicode.</li>
			  </ul>
			</blockquote>
			<p>
				In addition, a standard profile is provided to exclude all Default_Ignorable_Code_Points; see <i>Section 7, <a href="#Standard_Profiles">Standard Profiles</a></i>. Note however that, even if Default_Ignorable_Code_Points are excluded, spoofing issues remain unless the mechanisms in Unicode Technical Standard #39, “Unicode Security Mechanisms” [<a href="../tr41/tr41-36.html#UTS39">UTS39</a>] are utilized.
			</p>
		<p>The General Security Profile defined in Section 3.1, General Security Profile for Identifiers, in <em>UTS #39, Unicode Security Mechanisms</em> [<a href="../tr41/tr41-36.html#UTS39">UTS39</a>], excludes all Default_Ignorable_Code_Points by default, including variation selectors.</p>

		<h3>
			2.4 <a name="Specific_Character_Adjustments"
				href="#Specific_Character_Adjustments">Specific Character
				Adjustments</a>
		</h3>
		<p>
			Specific identifier syntaxes can be treated as tailorings (or <i>profiles</i>)
			of the generic syntax based on character properties. For example, SQL
			identifiers allow an underscore as an identifier continue, but not as
			an identifier start; C identifiers allow an underscore as either an
			identifier continue or an identifier start. Specific languages may
			also want to exclude the characters that have a Decomposition_Type
			other than Canonical or None, or to exclude some subset of those,
			such as those with a Decomposition_Type equal to Font.
		</p>
		<p>
			There are circumstances in which identifiers are expected to more
			fully encompass words or phrases used in natural languages.
		</p>
		<p>
			For more natural-language identifiers, a profile should allow the
			characters in <i><a href="#Table_Optional_Start">Table 3</a></i>, <i><a href="#Table_Optional_Medial">Table
						3a</a></i>, and<i> <a href="#Table_Optional_Continue">Table 3b</a></i> in
			identifiers, unless there are compelling reasons not to. Most additions to identifiers are restricted
				to medial positions. These are listed in <i><a
					href="#Table_Optional_Medial">Table 3a</a></i>. A few characters can
				also occur in final positions, and are listed in <i><a
					href="#Table_Optional_Continue">Table 3b</a></i>. The contents of these
				tables may overlap.
		</p>
		<p>
			In some environments even spaces and @
			are allowed in identifiers, such as in SQL: <i>SELECT * FROM
				Employee Pension.</i>
		</p>
		<p class="caption">Table 3. <a name="Table_Optional_Start"
						href="#Table_Optional_Start">Optional Characters for Start</a></p>
		<div align="center">
			<table class="subtle">
				<tr>
					<th>Code Point</th>
					<th>Character</th>
					<th>Name</th>
				</tr>
				<tr>
					<td>0024</td>
					<td style="text-align: center">$</td>
					<td>DOLLAR SIGN</td>
				</tr>
				<tr>
					<td>005F</td>
					<td style="text-align: center">_</td>
					<td>LOW LINE</td>
				</tr>
			</table>
		</div>
		<p class="caption">Table 3a. <a
						name="Table_Optional_Medial" href="#Table_Optional_Medial">Optional Characters for Medial</a></p>
		<div align="center">
			<table class="subtle">
				<tr>
					<th>Code Point</th>
					<th>Character</th>
					<th>Name</th>
				</tr>
				<tr>
					<td>0027</td>
					<td style="text-align: center">&#x0027;</td>
					<td>APOSTROPHE</td>
				</tr>
				<tr>
					<td>002D</td>
					<td style="text-align: center">-</td>
					<td>HYPHEN-MINUS</td>
				</tr>
				<tr>
					<td>002E</td>
					<td style="text-align: center">.</td>
					<td>FULL STOP</td>
				</tr>
				<tr>
					<td>003A</td>
					<td style="text-align: center">:</td>
					<td>COLON</td>
				</tr>
				<tr>
					<td>058A</td>
					<td style="text-align: center">&#x058A;</td>
					<td>ARMENIAN HYPHEN</td>
				</tr>
				<tr>
					<td>05F4</td>
					<td style="text-align: center">&#x05F4;</td>
					<td>HEBREW PUNCTUATION GERSHAYIM</td>
				</tr>
				<tr>
					<td>0F0B</td>
					<td style="text-align: center">&#x0F0B;</td>
					<td>TIBETAN MARK INTERSYLLABIC TSHEG</td>
				</tr>
				<tr>
					<td>2010</td>
					<td style="text-align: center">&#x2010;</td>
					<td>HYPHEN</td>
				</tr>
				<tr>
					<td>2019</td>
					<td style="text-align: center">&#x2019;</td>
					<td>RIGHT SINGLE QUOTATION MARK</td>
				</tr>
				<tr>
					<td>2027</td>
					<td style="text-align: center">&#x2027;</td>
					<td>HYPHENATION POINT</td>
				</tr>
				<tr>
					<td>30A0</td>
					<td style="text-align: center">&#x30A0;</td>
					<td>KATAKANA-HIRAGANA DOUBLE HYPHEN</td>
				</tr>
			</table>
		</div>
		<p class="caption">Table 3b. <a name="Table_Optional_Continue"
						href="#Table_Optional_Continue">Optional Characters for
						Continue</a></p>
		<div align="center">
			<table class="subtle">
				<tr>
					<th>Code Point</th>
					<th>Character</th>
					<th>Name</th>
				</tr>
				<tr>
					<td>05F3</td>
					<td style="text-align: center">&#x05F3;</td>
					<td>HEBREW PUNCTUATION GERESH</td>
				</tr>
			</table>
		</div>
		<p>In UnicodeSet notation, the characters in these tables are:</p>
		<ul>
			<li>Table 3: [\$_]</li>
			<li>Table 3a: ['\-.\:֊״་‐’‧゠・]</li>
			<li>Table 3b: [ ׳]</li>
		</ul>
		<p>
			In identifiers that allow for unnormalized characters, the
			compatibility equivalents of the characters listed in <i><a
				href="#Table_Optional_Start">Table 3</a></i>,
				<i><a href="#Table_Optional_Medial">Table 3a</a></i>, and <i><a
					href="#Table_Optional_Continue">Table 3b</a></i>
			may also be appropriate.
		</p>
		<p>
			For more information on characters that may occur in words, and those
			that may be used in name validation, see Section 4,<i> Word Boundaries</i>, in [<a
				href="../tr41/tr41-36.html#UAX29">UAX29</a>].
		</p>
		<p>
			Some scripts are not in  customary modern use, and thus
			implementations may want to exclude them from identifiers. These
			include historic and obsolete scripts, scripts used
			mostly liturgically, and regional scripts used only in very small
			communities or with very limited current usage. Some scripts also have unresolved architectural issues that make them currently unsuitable for identifiers. The scripts in <em>Table 4, <a
				href="#Table_Candidate_Characters_for_Exclusion_from_Identifiers">Excluded Scripts</a></em> are recommended for exclusion from identifiers.</p>
		<p class="caption">Table 4. <a
						name="Table_Candidate_Characters_for_Exclusion_from_Identifiers"
						href="#Table_Candidate_Characters_for_Exclusion_from_Identifiers">Excluded Scripts</a></p>
		<div align="center">
			<table class="subtle">
				<tr>
					<th>Property Notation</th>
					<th>Description</th>
				</tr>

				<tr>
					<td><code>\p{script=Aghb}</code></td>
					<td>Caucasian Albanian</td>
				</tr>
				<tr>
					<td><code>\p{script=Ahom}</code></td>
					<td>Ahom</td>
				</tr>
				<tr>
					<td><code>\p{script=Armi}</code></td>
					<td>Imperial Aramaic</td>
				</tr>
				<tr>
					<td><code>\p{script=Avst}</code></td>
					<td>Avestan</td>
				</tr>
				<tr>
					<td><code>\p{script=Bass}</code></td>
					<td>Bassa Vah</td>
				</tr>
				<tr>
					<td><code>\p{script=Berf}</code></td>
					<td>Beria Erfe</td>
				</tr>
				<tr>
					<td><code>\p{script=Bhks}</code></td>
					<td>Bhaiksuki</td>
				</tr>
				<tr>
					<td><code>\p{script=Brah}</code></td>
					<td>Brahmi</td>
				</tr>
				<tr>
					<td><code>\p{script=Bugi}</code></td>
					<td>Buginese</td>
				</tr>
				<tr>
					<td><code>\p{script=Buhd}</code></td>
					<td>Buhid</td>
				</tr>
				<tr>
					<td><code>\p{script=Cari}</code></td>
					<td>Carian</td>
				</tr>
				<tr>
					<td><code>\p{script=Chrs}</code></td>
					<td>Chorasmian</td>
				</tr>
				<tr>
					<td><code>\p{script=Copt}</code></td>
					<td>Coptic</td>
				</tr>
				<tr>
					<td><code>\p{script=Cpmn}</code></td>
					<td>Cypro-Minoan</td>
				</tr>
				<tr>
					<td><code>\p{script=Cprt}</code></td>
					<td>Cypriot</td>
				</tr>
				<tr>
					<td><code>\p{script=Diak}</code></td>
					<td>Dives Akuru</td>
				</tr>
				<tr>
					<td><code>\p{script=Dogr}</code></td>
					<td>Dogra</td>
				</tr>
				<tr>
					<td><code>\p{script=Dsrt}</code></td>
					<td>Deseret</td>
				</tr>
				<tr>
					<td><code>\p{script=Dupl}</code></td>
					<td>Duployan</td>
				</tr>
				<tr>
					<td><code>\p{script=Egyp}</code></td>
					<td>Egyptian Hieroglyphs</td>
				</tr>
				<tr>
					<td><code>\p{script=Elba}</code></td>
					<td>Elbasan</td>
				</tr>
				<tr>
					<td><code>\p{script=Elym}</code></td>
					<td>Elymaic</td>
				</tr>
				<tr>
					<td><code>\p{script=Gara}</code></td>
					<td>Garay</td>
				</tr>
				<tr>
					<td><code>\p{script=Glag}</code></td>
					<td>Glagolitic</td>
				</tr>
				<tr>
					<td><code>\p{script=Gong}</code></td>
					<td>Gunjala Gondi</td>
				</tr>
				<tr>
					<td><code>\p{script=Gonm}</code></td>
					<td>Masaram Gondi</td>
				</tr>
				<tr>
					<td><code>\p{script=Goth}</code></td>
					<td>Gothic</td>
				</tr>
				<tr>
					<td><code>\p{script=Gran}</code></td>
					<td>Grantha</td>
				</tr>
				<tr>
					<td><code>\p{script=Gukh}</code></td>
					<td>Gurung Khema</td>
				</tr>
				<tr>
					<td><code>\p{script=Hano}</code></td>
					<td>Hanunoo</td>
				</tr>
				<tr>
					<td><code>\p{script=Hatr}</code></td>
					<td>Hatran</td>
				</tr>
				<tr>
					<td><code>\p{script=Hluw}</code></td>
					<td>Anatolian Hieroglyphs</td>
				</tr>
				<tr>
					<td><code>\p{script=Hmng}</code></td>
					<td>Pahawh Hmong</td>
				</tr>
				<tr>
					<td><code>\p{script=Hung}</code></td>
					<td>Old Hungarian</td>
				</tr>
				<tr>
					<td><code>\p{script=Ital}</code></td>
					<td>Old Italic</td>
				</tr>
				<tr>
					<td><code>\p{script=Kawi}</code></td>
					<td>Kawi</td>
				</tr>
				<tr>
					<td><code>\p{script=Khar}</code></td>
					<td>Kharoshthi</td>
				</tr>
				<tr>
					<td><code>\p{script=Khoj}</code></td>
					<td>Khojki</td>
				</tr>
				<tr>
					<td><code>\p{script=Kits}</code></td>
					<td>Khitan Small Script</td>
				</tr>
				<tr>
					<td><code>\p{script=Krai}</code></td>
					<td>Kirat Rai</td>
				</tr>
				<tr>
					<td><code>\p{script=Kthi}</code></td>
					<td>Kaithi</td>
				</tr>
				<tr>
					<td><code>\p{script=Lina}</code></td>
					<td>Linear A</td>
				</tr>
				<tr>
					<td><code>\p{script=Linb}</code></td>
					<td>Linear B</td>
				</tr>
				<tr>
					<td><code>\p{script=Lyci}</code></td>
					<td>Lycian</td>
				</tr>
				<tr>
					<td><code>\p{script=Lydi}</code></td>
					<td>Lydian</td>
				</tr>
				<tr>
					<td><code>\p{script=Maka}</code></td>
					<td>Makasar</td>
				</tr>
			 	<tr>
					<td><code>\p{script=Mahj}</code></td>
					<td>Mahajani</td>
				</tr>
				<tr>
					<td><code>\p{script=Mani}</code></td>
					<td>Manichaean</td>
				</tr>
				<tr>
					<td><code>\p{script=Marc}</code></td>
					<td>Marchen</td>
				</tr>
				<tr>
					<td><code>\p{script=Medf}</code></td>
					<td>Medefaidrin</td>
				</tr>
			 	<tr>
					<td><code>\p{script=Mend}</code></td>
					<td>Mende Kikakui</td>
				</tr>
				<tr>
					<td><code>\p{script=Merc}</code></td>
					<td>Meroitic Cursive</td>
				</tr>
				<tr>
					<td><code>\p{script=Mero}</code></td>
					<td>Meroitic Hieroglyphs</td>
				</tr>
				<tr>
					<td><code>\p{script=Modi}</code></td>
					<td>Modi</td>
				</tr>
				<tr>
					<td><code>\p{script=Mong}</code></td>
					<td>Mongolian</td>
				</tr>
				<tr>
					<td><code>\p{script=Mroo}</code></td>
					<td>Mro</td>
				</tr>
				<tr>
					<td><code>\p{script=Mult}</code></td>
					<td>Multani</td>
				</tr>
				<tr>
					<td><code>\p{script=Nagm}</code></td>
					<td>Nag Mundari</td>
				</tr>
				<tr>
					<td><code>\p{script=Narb}</code></td>
					<td>Old North Arabian</td>
				</tr>
				<tr>
					<td><code>\p{script=Nand}</code></td>
					<td>Nandinagari</td>
				</tr>
				<tr>
					<td><code>\p{script=Nbat}</code></td>
					<td>Nabataean</td>
				</tr>
				<tr>
					<td><code>\p{script=Nshu}</code></td>
					<td>Nushu</td>
				</tr>
				<tr>
					<td><code>\p{script=Ogam}</code></td>
					<td>Ogham</td>
				</tr>
				<tr>
					<td><code>\p{script=Onao}</code></td>
					<td>Ol Onal</td>
				</tr>
				<tr>
					<td><code>\p{script=Orkh}</code></td>
					<td>Old Turkic</td>
				</tr>
				<tr>
					<td><code>\p{script=Osma}</code></td>
					<td>Osmanya</td>
				</tr>
				<tr>
					<td><code>\p{script=Ougr}</code></td>
					<td>Old Uyghur</td>
				</tr>
				<tr>
					<td><code>\p{script=Palm}</code></td>
					<td>Palmyrene</td>
				</tr>
				<tr>
					<td><code>\p{script=Pauc}</code></td>
					<td>Pau Cin Hau</td>
				</tr>
				<tr>
					<td><code>\p{script=Perm}</code></td>
					<td>Old Permic</td>
				</tr>
				<tr>
					<td><code>\p{script=Phag}</code></td>
					<td>Phags-pa</td>
				</tr>
				<tr>
					<td><code>\p{script=Phli}</code></td>
					<td>Inscriptional Pahlavi</td>
				</tr>
				<tr>
					<td><code>\p{script=Phlp}</code></td>
					<td>Psalter Pahlavi</td>
				</tr>
				<tr>
					<td><code>\p{script=Phnx}</code></td>
					<td>Phoenician</td>
				</tr>
				<tr>
					<td><code>\p{script=Prti}</code></td>
					<td>Inscriptional Parthian</td>
				</tr>
				<tr>
					<td><code>\p{script=Rjng}</code></td>
					<td>Rejang</td>
				</tr>
			  <tr>
					<td><code>\p{script=Runr}</code></td>
					<td>Runic</td>
			  </tr>
				<tr>
					<td><code>\p{script=Samr}</code></td>
					<td>Samaritan</td>
				</tr>
				<tr>
					<td><code>\p{script=Sarb}</code></td>
					<td>Old South Arabian</td>
				</tr>
				<tr>
					<td><code>\p{script=Sgnw}</code></td>
					<td>SignWriting</td>
				</tr>
				<tr>
					<td><code>\p{script=Shaw}</code></td>
					<td>Shavian</td>
				</tr>
				<tr>
					<td><code>\p{script=Shrd}</code></td>
					<td>Sharada</td>
				</tr>
				<tr>
					<td><code>\p{script=Sidd}</code></td>
					<td>Siddham</td>
				</tr>
				<tr>
					<td><code>\p{script=Sidt}</code></td>
					<td>Sidetic</td>
				</tr>
				<tr>
					<td><code>\p{script=Sind}</code></td>
					<td>Khudawadi</td>
				</tr>
				<tr>
					<td><code>\p{script=Sora}</code></td>
					<td>Sora Sompeng</td>
				</tr>
				<tr>
					<td><code>\p{script=Sogd}</code></td>
					<td>Sogdian</td>
				</tr>
				<tr>
					<td><code>\p{script=Sogo}</code></td>
					<td>Old Sogdian</td>
				</tr>
				<tr>
					<td><code>\p{script=Soyo}</code></td>
					<td>Soyombo</td>
				</tr>
				<tr>
					<td><code>\p{script=Sunu}</code></td>
					<td>Sunuwar</td>
				</tr>
				<tr>
					<td><code>\p{script=Tagb}</code></td>
					<td>Tagbanwa</td>
				</tr>
				<tr>
					<td><code>\p{script=Takr}</code></td>
					<td>Takri</td>
				</tr>
				<tr>
					<td><code>\p{script=Tang}</code></td>
					<td>Tangut</td>
				</tr>
				<tr>
					<td><code>\p{script=Tayo}</code></td>
					<td>Tai Yo</td>
				</tr>
				<tr>
					<td><code>\p{script=Tglg}</code></td>
					<td>Tagalog</td>
				</tr>
				<tr>
					<td><code>\p{script=Tirh}</code></td>
					<td>Tirhuta</td>
				</tr>
				<tr>
					<td><code>\p{script=Tnsa}</code></td>
					<td>Tangsa</td>
				</tr>
				<tr>
					<td><code>\p{script=Todr}</code></td>
					<td>Todhri</td>
				</tr>
				<tr>
					<td><code>\p{script=Tols}</code></td>
					<td>Tolong Siki</td>
				</tr>
				<tr>
					<td><code>\p{script=Toto}</code></td>
					<td>Toto</td>
				</tr>
				<tr>
					<td><code>\p{script=Tutg}</code></td>
					<td>Tulu-Tigalari</td>
				</tr>
				<tr>
					<td><code>\p{script=Ugar}</code></td>
					<td>Ugaritic</td>
				</tr>
				<tr>
					<td><code>\p{script=Vith}</code></td>
					<td>Vithkuqi</td>
				</tr>
				<tr>
					<td><code>\p{script=Wara}</code></td>
					<td>Warang Citi</td>
				</tr>
				<tr>
					<td><code>\p{script=Xpeo}</code></td>
					<td>Old Persian</td>
				</tr>
				<tr>
					<td><code>\p{script=Xsux}</code></td>
					<td>Cuneiform</td>
				</tr>
				<tr>
					<td><code>\p{script=Yezi}</code></td>
					<td>Yezidi</td>
				</tr>
				<tr>
					<td><code>\p{script=Zanb}</code></td>
					<td>Zanabazar Square</td>
				</tr>
			</table>
		</div>
		<p>Some characters used with recommended scripts may still be problematic for identifiers, for example because they are part of extensions that are not in modern customary use, and thus implementations may want to exclude them from identifiers. These include characters for historic and obsolete orthographies, characters used mostly liturgically, and in orthographies for languages used only in very small communities or with very limited current or declining usage. Some characters also have architectural issues that may make them unsuitable for identifiers. See <em>UTS #39, Unicode Security Mechanisms</em> [<a href="../tr41/tr41-36.html#UTS39">UTS39</a>] for more information.</p>
		<p>The scripts listed in <em>Table 5, <a href="#Table_Recommended_Scripts">Recommended Scripts</a></em> are generally recommended for use in
			identifiers. These are in widespread modern customary use, or are
			regional scripts in modern customary use by large communities.
		</p>
		<blockquote>
			<p><b>Note:</b> The Tibetan script is included in the list of recommended scripts because
				the language and its script are in widespread common use.
				However, implementers should be aware that the vertical stacking nature of Tibetan,
				unless constrained by additional rules,
				may lead to difficulties viewing an identifier in user interface elements
				such as address or status bars.
				In addition, at the current time, the script has not been as carefully vetted or
				seen as much practical experience when deployed for identifiers in security-relevant contexts
				as is the case for the other recommended scripts.</p>
		</blockquote>
		<p class="caption">Table 5. <a name="Table_Recommended_Scripts"
						href="#Table_Recommended_Scripts">Recommended Scripts</a></p>
		<div align="center">
			<table class="subtle">
				<tr>
					<th>Property Notation</th>
					<th>Description</th>
				</tr>
				<tr>
					<td><code>\p{script=Zyyy}</code></td>
					<td>Common</td>
				</tr>
				<tr>
					<td><code>\p{script=Zinh}</code></td>
					<td>Inherited</td>
				</tr>
				<tr>
					<td><code>\p{script=Arab}</code></td>
					<td>Arabic</td>
				</tr>
				<tr>
					<td><code>\p{script=Armn}</code></td>
					<td>Armenian</td>
				</tr>
				<tr>
					<td><code>\p{script=Beng}</code></td>
					<td>Bengali</td>
				</tr>
				<tr>
					<td><code>\p{script=Cyrl}</code></td>
					<td>Cyrillic</td>
				</tr>
				<tr>
					<td><code>\p{script=Deva}</code></td>
					<td>Devanagari</td>
				</tr>
				<tr>
					<td><code>\p{script=Ethi}</code></td>
					<td>Ethiopic</td>
				</tr>
				<tr>
					<td><code>\p{script=Geor}</code></td>
					<td>Georgian</td>
				</tr>
				<tr>
					<td><code>\p{script=Grek}</code></td>
					<td>Greek</td>
				</tr>
				<tr>
					<td><code>\p{script=Gujr}</code></td>
					<td>Gujarati</td>
				</tr>
				<tr>
					<td><code>\p{script=Guru}</code></td>
					<td>Gurmukhi</td>
				</tr>
				<tr>
					<td><code>\p{script=Hang}</code></td>
					<td>Hangul</td>
				</tr>
				<tr>
					<td><code>\p{script=Hani}</code></td>
					<td>Han</td>
				</tr>
				<tr>
					<td><code>\p{script=Hebr}</code></td>
					<td>Hebrew</td>
				</tr>
				<tr>
					<td><code>\p{script=Hira}</code></td>
					<td>Hiragana</td>
				</tr>
				<tr>
					<td><code>\p{script=Kana}</code></td>
					<td>Katakana</td>
				</tr>
				<tr>
					<td><code>\p{script=Knda}</code></td>
					<td>Kannada</td>
				</tr>
				<tr>
					<td><code>\p{script=Khmr}</code></td>
					<td>Khmer</td>
				</tr>
				<tr>
					<td><code>\p{script=Laoo}</code></td>
					<td>Lao</td>
				</tr>
				<tr>
					<td><code>\p{script=Latn}</code></td>
					<td>Latin</td>
				</tr>
				<tr>
					<td><code>\p{script=Mlym}</code></td>
					<td>Malayalam</td>
				</tr>
				<tr>
					<td><code>\p{script=Mymr}</code></td>
					<td>Myanmar</td>
				</tr>
				<tr>
					<td><code>\p{script=Orya}</code></td>
					<td>Oriya</td>
				</tr>
				<tr>
					<td><code>\p{script=Sinh}</code></td>
					<td>Sinhala</td>
				</tr>
				<tr>
					<td><code>\p{script=Taml}</code></td>
					<td>Tamil</td>
				</tr>
				<tr>
					<td><code>\p{script=Telu}</code></td>
					<td>Telugu</td>
				</tr>
				<tr>
					<td><code>\p{script=Thaa}</code></td>
					<td>Thaana</td>
				</tr>
				<tr>
					<td><code>\p{script=Thai}</code></td>
					<td>Thai</td>
				</tr>
				<tr>
					<td><code>\p{script=Tibt}</code></td>
					<td>Tibetan</td>
				</tr>
			</table>
		</div>
		<p>As of Unicode 10.0, there is no longer a distinction between
			aspirational use and limited use scripts, as this has not proven
			to be productive for the derivation of identifier-related classes
			used in security profiles. (See <em>UTS #39, Unicode Security Mechanisms</em>
			[<a href="../tr41/tr41-36.html#UTS39">UTS39</a>].) Thus the aspirational use scripts
			in <em>Table 6, <a href="#Aspirational_Use_Scripts">Aspirational Use Scripts</a></em> have been recategorized
			as Limited Use and moved to <em>Table 7, <a
				href="#Table_Limited_Use_Scripts">Limited Use Scripts</a></em>.</p>
		<p class="caption">Table 6. <a name="Aspirational_Use_Scripts"
						href="#Aspirational_Use_Scripts"> Aspirational Use Scripts</a> (Withdrawn)</p>
		<div align="center">
			<table class="subtle">
				<tr>
					<th>Property Notation</th>
					<th>Description</th>
				</tr>
				<tr>
				  <td colspan="2"><em>intentionally blank</em></td>
			  </tr>
			</table>
		</div>
		<p>
			Modern scripts that are in more limited use are listed in <em>Table 7, <a
				href="#Table_Limited_Use_Scripts">Limited Use Scripts</a></em>.
			To avoid security issues, some implementations may wish to disallow
			the limited-use scripts in identifiers. For more information on
			usage, see the Unicode Locale project [<a
				href="../tr41/tr41-36.html#CLDR">CLDR</a>].
		</p>
		<blockquote>
			<p><b>Note:</b> Since Unicode 17, the Bopomofo script is listed as a Limited Use script.
				It is widely used, but mainly for educational purposes,
				not for the full range of “everyday” common uses.</p>
		</blockquote>
		<p class="caption">Table 7. <a name="Table_Limited_Use_Scripts"
						href="#Table_Limited_Use_Scripts">Limited Use Scripts</a></p>
		<div align="center">
			<table class="subtle">
				<tr>
					<th>Property Notation</th>
					<th>Description</th>
				</tr>
				<tr>
					<td><code>\p{script=Adlm}</code></td>
					<td>Adlam</td>
				</tr>
				<tr>
					<td><code>\p{script=Bali}</code></td>
					<td>Balinese</td>
				</tr>
				<tr>
					<td><code>\p{script=Bamu}</code></td>
					<td>Bamum</td>
				</tr>
				<tr>
					<td><code>\p{script=Batk}</code></td>
					<td>Batak</td>
				</tr>
				<tr>
					<td><code>\p{script=Bopo}</code></td>
					<td>Bopomofo</td>
				</tr>
				<tr>
					<td><code>\p{script=Cakm}</code></td>
					<td>Chakma</td>
				</tr>
				<tr>
					<td><code>\p{script=Cans}</code></td>
					<td>Canadian Aboriginal Syllabics</td>
				</tr>
				<tr>
					<td><code>\p{script=Cham}</code></td>
					<td>Cham</td>
				</tr>
				<tr>
					<td><code>\p{script=Cher}</code></td>
					<td>Cherokee</td>
				</tr>
				<tr>
					<td><code>\p{script=Hmnp}</code></td>
					<td>Nyiakeng Puachue Hmong</td>
				</tr>
			  <tr>
					<td><code>\p{script=Java}</code></td>
					<td>Javanese</td>
				</tr>
				<tr>
					<td><code>\p{script=Kali}</code></td>
					<td>Kayah Li</td>
				</tr>
				<tr>
					<td><code>\p{script=Lana}</code></td>
					<td>Tai Tham</td>
				</tr>
				<tr>
					<td><code>\p{script=Lepc}</code></td>
					<td>Lepcha</td>
				</tr>
				<tr>
					<td><code>\p{script=Limb}</code></td>
					<td>Limbu</td>
				</tr>
				<tr>
					<td><code>\p{script=Lisu}</code></td>
					<td>Lisu</td>
				</tr>
				<tr>
					<td><code>\p{script=Mand}</code></td>
					<td>Mandaic</td>
				</tr>
				<tr>
					<td><code>\p{script=Mtei}</code></td>
					<td>Meetei Mayek</td>
				</tr>
				<tr>
					<td><code>\p{script=Newa}</code></td>
					<td>Newa</td>
				</tr>
				<tr>
					<td><code>\p{script=Nkoo}</code></td>
					<td>Nko</td>
				</tr>
				<tr>
					<td><code>\p{script=Olck}</code></td>
					<td>Ol Chiki</td>
				</tr>
				<tr>
					<td><code>\p{script=Osge}</code></td>
					<td>Osage</td>
				</tr>
				<tr>
				  <td><code>\p{script=Plrd}</code></td>
				  <td>Miao</td>
			  	</tr>
			  	<tr>
  				  <td><code>\p{script=Rohg}</code></td>
				  <td>Hanifi Rohingya</td>
				</tr>
			  	<tr>
					<td><code>\p{script=Saur}</code></td>
					<td>Saurashtra</td>
			  	</tr>
				<tr>
					<td><code>\p{script=Sund}</code></td>
					<td>Sundanese</td>
				</tr>
				<tr>
					<td><code>\p{script=Sylo}</code></td>
					<td>Syloti Nagri</td>
				</tr>
				<tr>
					<td><code>\p{script=Syrc}</code></td>
					<td>Syriac</td>
				</tr>
				<tr>
					<td><code>\p{script=Tale}</code></td>
					<td>Tai Le</td>
				</tr>
				<tr>
					<td><code>\p{script=Talu}</code></td>
					<td>New Tai Lue</td>
				</tr>
				<tr>
					<td><code>\p{script=Tavt}</code></td>
					<td>Tai Viet</td>
				</tr>
				<tr>
				  <td><code>\p{script=Tfng}</code></td>
				  <td>Tifinagh</td>
			  	</tr>
				<tr>
					<td><code>\p{script=Vaii}</code></td>
					<td>Vai</td>
				</tr>
				<tr>
					<td><code>\p{script=Wcho}</code></td>
					<td>Wancho</td>
			  	</tr>
				<tr>
				  <td><code>\p{script=Yiii}</code></td>
				  <td>Yi</td>
			  	</tr>
			</table>
		</div>
		<p>
			This is the recommendation as of the current version of Unicode; as
			new scripts are added to future versions of Unicode, characters and scripts may
			be added to Tables <i><a
				href="#Table_Candidate_Characters_for_Exclusion_from_Identifiers">4</a></i>,
			<i><a href="#Table_Recommended_Scripts">5</a></i>, and <i><a
				href="#Table_Limited_Use_Scripts">7</a></i>. Scripts may also be
			moved from one table to another as more information becomes
			available.
		</p>
		<p>There are a few special cases:</p>
		<ul>
			<li>The Common and Inherited script values
				[\p{script=Zyyy}\p{script=Zinh}] are used widely with other scripts,
				rather than being scripts per se. See also the Script_Extensions
				property in the Unicode Character Database [<a
				href="../tr41/tr41-36.html#UAX44">UAX44</a>].
			</li>
			<li>The Unknown script \p{script=Zzzz} is used for Unassigned
				characters.</li>
			<li>Braille \p{script=Brai} consists only of symbols</li>
			<li>Katakana_Or_Hiragana \p{script=Hrkt} is empty. This value was used
				in earlier versions, but is no longer used.</li>
			<li>With respect to the scripts Balinese, Cham, Ol Chiki, Vai,
				Kayah Li, and Saurashtra, there may be large communities of people
				speaking an associated language, but the script itself is not  in
				widespread use. However, there are significant revival efforts.</li>
			<li>Bopomofo is used primarily in education.</li>
		</ul>
		<p>
			For programming language identifiers, normalization and case have a
			number of important implications. For a discussion of these issues,
			see <i>Section 5, <a href="#normalization_and_case">Normalization
					and Case</a></i>.
		</p>
		<h3>
			2.5 <a name="Backward_Compatibility" href="#Backward_Compatibility">Backward
				Compatibility</a>
		</h3>
		<p>
			Unicode General_Category values are kept as stable as possible, but
			they can change across versions of the Unicode Standard. The bulk of
			the characters having a given value are determined by other
			properties, and the coverage expands in the future according to the
			assignment of those properties. In addition, the Other_ID_Start
			property provides a small list of characters that qualified as
			ID_Start characters in some previous version of Unicode solely on the
			basis of their General_Category properties, but that no longer
			qualify in the current version.
		</p>
		<p>The Other_ID_Start property includes characters such as the
			following:</p>
		<blockquote>
			U+2118 ( ℘ ) SCRIPT CAPITAL P<br> U+212E ( ℮ ) ESTIMATED SYMBOL<br>
			U+309B ( ゛ ) KATAKANA-HIRAGANA VOICED SOUND MARK<br> U+309C ( ゜
			) KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK
		</blockquote>
		<p>Similarly, the Other_ID_Continue property adds a small list of
			characters that qualified as ID_Continue characters in some previous
			version of Unicode solely on the basis of their General_Category
			properties, but that no longer qualify in the current version.</p>
		<p>The Other_ID_Continue property includes characters such as the
			following:</p>
		<!-- Do NOT put the actual Ethiopic characters back in the examples. For
         whatever reason, they trigger an IE8 quirk that puts the browser in
         compatibility mode for this document! -->
		<blockquote>
			U+1369 ETHIOPIC DIGIT ONE...U+1371 ETHIOPIC DIGIT NINE<br>
			U+00B7 ( · ) MIDDLE DOT<br> U+0387 ( &#x0387; ) GREEK ANO TELEIA<br>
			U+19DA ( ᧚ ) NEW TAI LUE THAM DIGIT ONE
		</blockquote>
		<p>
			The exact list of characters covered by the Other_ID_Start and
			Other_ID_Continue properties depends on the version of Unicode. For
			more information, see Unicode Standard Annex #44, “Unicode Character
			Database” [<a href="../tr41/tr41-36.html#UAX44">UAX44</a>].
		</p>
		<p>The Other_ID_Start and Other_ID_Continue properties are thus
			designed to ensure that the Unicode identifier specification is
			backward compatible. Any sequence of characters that qualified as an
			identifier in some version of Unicode will continue to qualify as an
			identifier in future versions.</p>
		<p>If a specification tailors the Unicode recommendations for
			identifiers, then this technique can also be used to maintain
			backwards compatibility across versions.</p>


		<h2>
			3 <a name="Immutable_Identifier_Syntax"
				href="#Immutable_Identifier_Syntax">Immutable Identifiers</a><a
				name="Alternative_Identifier_Syntax"></a>
		</h2>
		<p>The disadvantage of working with the lexical classes defined
			previously is the storage space needed for the detailed definitions,
			plus the fact that with each new version of the Unicode Standard new
			characters are added, which an existing parser would not be able to
			recognize. In other words, the recommendations based on that table
			are not upwardly compatible.</p>
		<p>This problem can be addressed by turning the question around.
			Instead of defining the set of code points that are allowed, define a
			small, fixed set of code points that are reserved for syntactic use
			and allow everything else (including unassigned code points) as part
			of an identifier. All parsers written to this specification would
			behave the same way for all versions of the Unicode Standard, because
			the classification of code points is fixed forever.</p>
		<p>
			The drawback of this method is that it allows “nonsense” to be part
			of identifiers because the concerns of lexical classification and of
			human intelligibility are separated. Human intelligibility can,
			however, be addressed by other means, such as usage guidelines that
			encourage a restriction to meaningful terms for identifiers. For an
			example of such guidelines, see the XML specification by the W3C,
			Version 1.0 5th Edition or later [<a href="../tr41/tr41-36.html#XML">XML</a>].
		</p>
		<p>By increasing the set of disallowed characters, a reasonably
			intuitive recommendation for identifiers can be achieved. This
			approach uses the full specification of identifier classes, as of a
			particular version of the Unicode Standard, and permanently disallows
			any characters not recommended in that version for inclusion in
			identifiers. All code points unassigned as of that version would be
			allowed in identifiers, so that any future additions to the standard
			would already be accounted for. This approach ensures both upwardly
			compatible identifier stability and a reasonable division of
			characters into those that do and do not make human sense as part of
			identifiers.</p>
		<p>With or without such fine-tuning, such a compromise approach
			still incurs the expense of implementing large lists of code points.
			While they no longer change over time, it is a matter of choice
			whether the benefit of enforcing somewhat word-like identifiers
			justifies their cost.</p>
		<p>Alternatively, one can use the properties described below and
			allow all sequences of characters to be identifiers that are neither
			Pattern_Syntax nor Pattern_White_Space. This has the advantage of
			simplicity and small tables, but allows many more “unnatural”
			identifiers.		</p>
		<p>
			<b><a name="R2" href="#R2">UAX31-R2</a></b>.
				<b>Immutable Identifiers:</b> <i>To meet this requirement,
				an implementation shall
				choose either <a href="#R2-1">UAX31-R2-1</a> or <a href="#R2-2">UAX31-R2-2</a>.</i>
		</p>
		<p>
				<b><a name="R2-1" href="#R2-1">UAX31-R2-1</a></b>.
				<i>Define identifiers to be any non-empty
				string of characters that contains no character having any of the
				following property values:</i>
		</p>

		<ul>
			<li>Pattern_White_Space=True</li>
			<li>Pattern_Syntax=True</li>
			<li>General_Category=Private_Use, Surrogate, or Control</li>
			<li>Noncharacter_Code_Point=True</li>
		</ul>

		<p>
			<b><a name="R2-2" href="#R2-2">UAX31-R2-2</a></b>.
				<i>Declare that it uses a <b>profile</b>
				of <a href="#R2-1">UAX31-R2-1</a>
				and define that profile with a precise specification of the
				characters and character sequences that are added to or removed from the sets of code points
				defined by these properties and/or provide a list of additional constraints on identifiers.
			</i>
		</p>
		<blockquote>
			<b>Note:</b> The expectation from an implementation meeting requirement UAX31-R2 Immutable Identifiers is that it will never change its definition of identifiers; in particular, that it will not switch to UAX31-R1 Default Identifiers. However, the downsides of normalization issues and the inapplicability of measures guarding against spoofing attacks may warrant such a change in definition. In such circumstances, a profile should be used to extend XID_Start and XID_Continue to cover likely existing usages. See <i>Section 3.3, Language Evolution</i>, in Unicode Technical Standard #55, “Unicode Source Code Handling” [<a href="../tr41/tr41-36.html#UTS55">UTS55</a>].
		</blockquote>

		<p>In its profile, a specification can define identifiers to be
			more in accordance with the Unicode identifier definitions at the
			time the profile is adopted, while still allowing for strict
			immutability. For example, an implementation adopting a profile after
			a particular version of Unicode is released (such as Unicode 5.0)
			could define the profile as follows:</p>
		<ol>
			<li>All characters satisfying <i><a href="#R1">UAX31-R1
						Default Identifiers</a></i> according to Unicode 5.0
			</li>
			<li>Plus all code points unassigned in Unicode 5.0 that do not
				have the property values specified in
				<i><a href="#R2">UAX31-R2 Immutable Identifiers</a></i>.
			</li>
		</ol>
		<p>This technique allows identifiers to have a more natural
			format—excluding symbols and punctuation already defined—yet also
			provides absolute code point immutability.</p>
		<p>Immutable identifiers are intended for those cases (like XML) that
			cannot update across versions of Unicode, and do not require
			information about normalization form, or properties such as
			General_Category and Script. Immutable identifers that allow
			unassigned characters cannot provide for normalization forms
			or these properties, which means that they:</p>
        <ul>
          <li>cannot be compared for NFC, NFKC, or case-insensitive equality</li>
          <li>are unsuitable for restrictions such as those in UTS #39</li>
        </ul>
        <p>For best practice, a profile disallowing unassigned characters should be provided where  possible.</p>
        <p>
			Specifications should also include guidelines and recommendations for
			those creating new identifiers. Although
			<i><a href="#R2">UAX31-R2 Immutable Identifiers</a></i> permits a wide range of
			characters, as a best practice identifiers should be in the format
			NFKC, without using any unassigned characters. For more information
			on NFKC, see Unicode Standard Annex #15, “Unicode Normalization
			Forms” [<a href="../tr41/tr41-36.html#UAX15">UAX15</a>].
		</p>

		<h2>
			4 <a name="Whitespace_and_Syntax" href="#Whitespace_and_Syntax">Whitespace and Syntax</a>
		</h2>

		<p>Most programming languages have a concept of
			whitespace as part of their lexical structure, as well as some set of
			characters that are disallowed in identifiers but have syntactic
			use, such as arithmetic operators.
			Beyond general programming languages,
			there are also many circumstances where software interprets
			patterns that are a mixture of literal characters, whitespace, and syntax
			characters. Examples include regular expressions, Java collation
			rules, Excel or ICU number formats, and many others. In the past,
			regular expressions and other formal languages have been forced to
			use clumsy combinations of ASCII characters for their syntax. As
			Unicode becomes ubiquitous, some of these will start to use non-ASCII
			characters for their syntax: first as more readable optional
			alternatives, then eventually as the standard syntax.</p>
		<p>
			For forward and backward compatibility, it is advantageous to have a
			fixed set of whitespace and syntax code points.
			This follows the recommendations that the Unicode Consortium has made
			regarding completely stable identifiers, and the practice that is
			seen in XML 1.0, 5th Edition or later [<a
				href="../tr41/tr41-36.html#XML">XML</a>]. (In particular, the
			Unicode Consortium is committed to not allocating characters suitable
			for identifiers in the range U+2190..U+2BFF, which is being used by
			XML 1.0, 5th Edition.)
		</p>

		<p>As of Unicode 4.1, two Unicode character properties are defined
			to provide for stable syntax: Pattern_White_Space and
			Pattern_Syntax.&nbsp; Particular languages may, of course,
			override these recommendations, for example, by adding or removing
			other characters for compatibility with ASCII usage.</p>
		<p>For stability, the values of these properties are absolutely
			invariant, not changing with successive versions of Unicode. Of
			course, this does not limit the ability of the Unicode Standard to
			encode more symbol or whitespace characters, but the default sets of syntax and
			whitespace code points recommended for use in computer languages will not
			change.</p>

		<p>
			<b><a name="R3" href="#R3">UAX31-R3</a></b>. <b>Pattern_White_Space
					and Pattern_Syntax Characters:</b> <i>To meet this requirement, an
				implementation shall
				meet both <a href="#R3a">UAX31-R3a</a> and <a href="#R3b">UAX31-R3b</a>.</i>
		</p>
		<blockquote>
			<p>
				<b>Note:</b> When meeting requirement <a href="#R3">UAX31-R3</a> with no profile, all characters except
				those that have the Pattern_White_Space or Pattern_Syntax properties
				are available for use in the definition of identifiers or literals.
			</p>
		</blockquote>
		<h3>4.1 <a name="Whitespace" href="#Whitespace">Whitespace</a></h3>

		<p>
		Many computer languages treat two categories of whitespace differently: horizontal space (such as the ASCII horizontal tabulation and space), and line terminators.
		</p>
		<p>
			When a syntax supports non-ASCII characters, it is useful to consider a third category: <em>ignorable format controls</em>. Ignorable format controls may be inserted between lexical elements in order to resolve bidirectional ordering issues, as described in <i>Section 4.1.1, <a href="#Bidirectional_Ordering">Bidirectional Ordering</a></i>. The insertion of these characters does not change the meaning of the program; in particular, they are not spacing characters. See <i>Section 4.1.2, <a href="#Required_Spaces">Required Spaces</a></i>.
		</p>
		<blockquote>
			<b>Note:</b> Allowing for the insertion of ignorable format controls does not prevent spoofing based on bidirectional reordering.
			In order to guard against such spoofing, implementations should make use of the higher-level protocols and conversion to plain text described in Unicode Standard Annex #9, “Unicode Bidirectional Algorithm” [<a href="../tr41/tr41-36.html#UAX9">UAX9</a>]. See Unicode Technical Standard #55, “Unicode Source Code Handling” [<a href="../tr41/tr41-36.html#UTS55">UTS55</a>].
		</blockquote>
		<blockquote>
			<b>Note:</b> Since these characters are allowed only where a boundary would, in their absence, exist between lexical elements, an implementation could ignore them when lexing, and then consider as illegal any lexical element that contains them. An exception must be made for comments and strings, which should be able to freely contain these characters.
		</blockquote>
		<p>
			Implementations should also allow these characters in other contexts where reordering issues could arise. See Unicode Technical Standard #55, “Unicode Source Code Handling” [<a href="../tr41/tr41-36.html#UTS55">UTS55</a>].
		</p>
		<p>
			<b><a name="R3a" href="#R3a">UAX31-R3a</a></b>. <b>Pattern_White_Space Characters:</b> <i>To meet this requirement, an
				implementation shall
				choose either <a href="#R3a-1">UAX31-R3a-1</a> or <a href="#R3a-2">UAX31-R3a-2</a>.</i>
		</p>
		<p>
		<b><a name="R3a-1" href="#R3a-1">UAX31-R3a-1</a></b>.
			<i>Use Pattern_White_Space characters as the set of characters interpreted as whitespace in parsing, as follows:</i>
		</p>
		<ol type="1">
			<li><i>A sequence of one or more of any of the following characters shall be interpreted as a sequence of one or more end of line:</i>
				<ol type="a">
					<li>U+000A (line feed)</li>
					<li>U+000B (vertical tabulation)</li>
					<li>U+000C (form feed)</li>
					<li>U+000D (carriage return)</li>
					<li>U+0085 (next line)</li>
					<li>U+2028 LINE SEPARATOR</li>
					<li>U+2029 PARAGRAPH SEPARATOR</li>
				</ol>
			</li>
			<li><i>The Pattern_White_Space characters with the property Default_Ignorable_Code_Point shall be treated as ignorable format controls; they shall be allowed in the contexts <a href="#I1">UAX31-I1</a>, <a href="#I2">UAX31-I2</a>, and <a href="#I3">UAX31-I3</a> defined in <i>Section 4.1.3, <a href="#Contexts_for_Ignorable_Format_Controls">Contexts for Ignorable Format Controls</a></i>,  where their insertion shall have no effect on the meaning of the program.</i></li>
			<li><i>All other characters in Pattern_White_Space shall be interpreted as horizontal space.</i></li>
		</ol>
		<p>
			<b><a name="R3a-2" href="#R3a-2">UAX31-R3a-2</a></b>.
				<i>Declare that it uses a <b>profile</b>
				of <a href="#R3a-1">UAX31-R3a-1</a>
				and define that profile with a precise specification of the
				characters that are added to or removed from the set of code points
				defined by the Pattern_White_Space property, and of any changes to the criteria under which a character or sequence of characters is interpreted as an end of line, as ignorable format controls, or as horizontal space.
			</i>
		</p>
		<blockquote>
			<b>Note:</b> The characters to be treated as ignorable format controls under item 2 of <a href="#R3a-1">UAX31-R3a-1</a> are U+200E LEFT-TO-RIGHT MARK and U+200F RIGHT-TO-LEFT MARK. The characters to be treated as horizontal space under item 3 of <a href="#R3a-1">UAX31-R3a-1</a> are U+0020 SPACE and U+0009 (horizontal tabulation, TAB).
		</blockquote>
		<blockquote>
			<b>Note:</b> The characters LEFT-TO-RIGHT MARK and RIGHT-TO-LEFT MARK are two of the Implicit Directional Marks defined by <i>Section 2.6, Implicit Directional Marks</i>, in Unicode Standard Annex #9, “Unicode Bidirectional Algorithm” [<a href="../tr41/tr41-36.html#UAX9">UAX9</a>]. The third one, ARABIC LETTER MARK, is used far less frequently than the others, even in Arabic text; its behavior differs subtly from RIGHT-TO-LEFT MARK in ways that are not usually relevant to the ordering of source code. If it is added to the set of whitespace characters by a profile, it is interpreted as an ignorable format control.
		</blockquote>
		<blockquote>
			<b>Note:</b> Failing to interpret all characters listed in item 1 of <a href="#R3a-1">UAX31-R3a-1</a> as line terminators would lead to spoofing issues; see Unicode Technical Standard #55, “Unicode Source Code Handling” [<a href="../tr41/tr41-36.html#UTS55">UTS55</a>].
		</blockquote>

		<h4>4.1.1 <a href="#Bidirectional_Ordering" name="Bidirectional_Ordering">Bidirectional Ordering</a></h4>
			<p>
				Requirement <a href="#R3a">UAX31-R3a</a> is relevant even for languages that do not
				use immutable identifiers, or that have lexical structure outside of the
				categories of syntax and whitespace characters. In particular, the set of
				Pattern_White_Space characters is chosen to make it possible to correct
				bidirectional ordering issues that can arise in a wide range of programming
				languages, visually obfuscating the logic of expressions.
				In the absence of higher-level protocols (see Section 4.3,
				<i>Higher-Level Protocols</i>, in
				[<a href="../tr41/tr41-36.html#UAX9">UAX9</a>]), tokens may be visually
				reordered by the Unicode Bidi Algorithm in bidirectional source text,
				producing a visual result that conveys a different logical intent.
				To remedy that, two implicit directional marks are among Pattern_White_Space
				characters; if these can be freely inserted between tokens, implicit
				directional marks <i>consistent with the paragraph direction</i> can be used to
				ensure that the visual order of tokens matches their logical order.
			</p>
		<blockquote>
			<p>
				<b>Example:</b> Consider the following two lines:
			</p>

			<blockquote>
				(1) <code>x + tav == 1</code>
			</blockquote>

			<blockquote>
				(2) <code>x + תו == 1</code>
			</blockquote>

			<p>
				Internally, they are the same except that the ASCII identifier <code>tav</code> in line (1) is replaced by the Hebrew
				identifier <code>תו</code> in line (2). However, with a plain text display (with left-to-right paragraph direction) the user
				will be misled, thinking that line (2) is a comparison between <code>(x + 1)</code> and <code>תו</code>, whereas it is actually a
				comparison between <code>(x + תו)</code> and <code>1</code>.
				The misleading rendering of (2) occurs because the directionality of the identifier תו
				influences subsequent weakly-directional tokens; inserting a left-to-right
				mark after the identifier <code>תו</code> stops it from influencing the remainder of the
				line, and thus yields a better rendering in plain text with left-to-right
				paragraph direction, as demonstrated in the following table, wherein characters
				whose ordering is affected by that identifier have been highlighted.
			</p>
		<div align="center">
			<table class="subtle">
				<tr>
					<th colspan="12">Underlying Representation</th>
					<th>Display (LTR paragraph direction)</th>
				</tr>
				<tr>
					<td><code>x</code></td>
					<td><code>&nbsp;</code></td>
					<td><code>+</code></td>
					<td><code>&nbsp;</code></td>
					<td class="higher-resolved-level"><code>ת</code></td>
					<td class="higher-resolved-level"><code>ו</code></td>
					<td class="higher-resolved-level" colspan="2"><code>&nbsp;</code></td>
					<td class="higher-resolved-level"><code>=</code></td>
					<td class="higher-resolved-level"><code>=</code></td>
					<td class="higher-resolved-level"><code>&nbsp;</code></td>
					<td class="higher-resolved-level"><code>1</code></td>
					<td dir="ltr"><code>x + <span class="higher-resolved-level">תו == 1</span></code></td>
				</tr>
				<tr>
					<td><code>x</code></td>
					<td><code>&nbsp;</code></td>
					<td><code>+</code></td>
					<td><code>&nbsp;</code></td>
					<td class="higher-resolved-level"><code>ת</code></td>
					<td class="higher-resolved-level"><code>ו</code></td>
					<td>⟨LRM⟩</td>
					<td><code>&nbsp;</code></td>
					<td><code>=</code></td>
					<td><code>=</code></td>
					<td><code>&nbsp;</code></td>
					<td><code>1</code></td>
					<td dir="ltr"><code>x + <span class="higher-resolved-level">תו</span>&lrm; == 1</code></td>
				</tr>
			</table>
		</div>
			<p>
				<i>Section 5.2, Conversion to Plain Text</i>, in Unicode Technical Standard #55, “Unicode Source Code Handling” [<a href="../tr41/tr41-36.html#UTS55">UTS55</a>],
				specifies an algorithm for the automatic insertion of LRM characters.
			</p>
		</blockquote>
		<blockquote>
			<b>Note:</b> Left-to-right marks are used for this purpose when the main
			direction is left–to-right. Correspondingly, right-to-left marks are used
			when the main direction is right-to-left.
		</blockquote>
		<h4>4.1.2 <a href="#Required_Spaces" name="Required_Spaces">Required Spaces</a></h4>
		<p>
			Since the implicit directional marks are nonspacing, where a syntax requires
			a sequence of spaces (such as between identifiers), it should require that at
			least one of those be neither LEFT-TO-RIGHT MARK nor RIGHT-TO-LEFT MARK. The
			visual appearance would otherwise be too confusing to readers: “<code>else</code>⟨LRM⟩<code>if</code>”
			would be seen by the user as “<code>elseif</code>” but parsed by the compiler as “<code>else if</code>”,
			whereas “<code>else</code>⟨LRM⟩<code> if</code>” would be seen and parsed as “<code>else  if</code>” and be harmless.
		</p>

		<h4>4.1.3 <a href="#Contexts_for_Ignorable_Format_Controls" name="Contexts_for_Ignorable_Format_Controls">Contexts for Ignorable Format Controls</a></h4>

		<p>Implementations should at least allow for the insertion of ignorable format controls in the following contexts, illustrated by examples wherein the ignorable format control is represented by ⟨LRM⟩.</p>
		<p><b><a href="#I1" name="I1">UAX31-I1</a></b>. Adjacent to lexical horizontal space (within a sequence of lexical horizontal spaces, or at the start or end of such a sequence).</p>
		<blockquote>
			<p><b>Example:</b> Between the following keywords separated by a space:</p>
			<p><code>else </code>⟨LRM⟩<code>if</code></p>
		</blockquote>
		<blockquote>
			<b>Note:</b> The phrase “lexical horizontal space” refers to characters that are not merely in the set of horizontal space characters, but are also in a context where they are lexically spaces. For instance, it does not include horizontal space characters in string literals. Implementations should permit these characters in string literals, but in such a literal, their insertion has an effect on the meaning of the program, as they are then present in the string represented by that literal.
		</blockquote>
		<p><b><a href="#I2" name="I2">UAX31-I2</a></b>. As optional space, that is, wherever horizontal space could be inserted without changing the meaning of the program.</p>
		<blockquote>
			<p><b>Example:</b> Before the plus sign in the following arithmetic expression:</p>
			<p><code>x</code>⟨LRM⟩<code>+1</code></p>
		</blockquote>
		<p><b><a href="#I3" name="I3">UAX31-I3</a></b>. At the start and end of a lexical line.
		<blockquote>
			<p><b>Example:</b> Before the word import in the following line of Python:
			<p>⟨LRM⟩<code>import unicodedata</code></p>
		</blockquote>
		<blockquote>
			<b>Note:</b> As is the case for <a href="#I1">UAX31-I1</a>, the start and end of a “lexical line” in <a href="#I3">UAX31-I3</a> does not include the start and end of a line in a multiline string literal, respectively. This context is distinct from <a href="#I2">UAX31-I2</a> in languages where leading or trailing spaces are meaningful.
		</blockquote>

		<h3>4.2 <a name="Syntax" href="#Syntax">Syntax</a></h3>

		<p>The lexical structure of formal languages involves characters that are not allowed in identifiers and are not whitespace, but that have some special lexical significance other than being literal characters (such as in string literals) or ignored (such as in comments). These are referred to in this document as <em>characters with syntactic use</em>.</p>
		<p>Examples of characters with syntactic use include:</p>
		<ul>
			<li>decimal marks in numeric literals</li>
			<li>arithmetic operators, such as <code>+</code>, <code>-</code>, <code>*</code>, <code>/</code></li>
			<li>parentheses and other brackets</li>
			<li>characters in comment delimiters, such as <code>#</code>, <code>/*</code>, <code>--</code>, or <code>⍝</code></li>
			<li>quotation marks delimiting strings</li>
			<li>characters such as <code>\</code> introducing escape sequences</li>
		</ul>
		<p>
		It is useful to bound the set of characters with syntactic use.
		This makes it possible to build tools that handle source code, but do not validate it, such as
		syntax highlighters, in a forward-compatible way; see Unicode Technical Standard #55, “Unicode Source Code Handling” [<a href="../tr41/tr41-36.html#UTS55">UTS55</a>].
		It further provides a stable set of characters that can be used for user-defined operators.
		In addition, this allows for backward compatibility of literals (including patterns), as described in <i>Section 4.3, <a href="#Pattern_Syntax">Pattern Syntax</a></i>.
		</p>

		<p>
			<b><a name="R3b" href="#R3b">UAX31-R3b</a></b>. <b>Pattern_Syntax Characters:</b> <i>To meet this requirement, an
				implementation shall
				choose either <a href="#R3b-1">UAX31-R3b-1</a> or <a href="#R3b-2">UAX31-R3b-2</a>.</i>
		</p>
		<p>
		<b><a name="R3b-1" href="#R3b-1">UAX31-R3b-1</a></b>.
			<i>Use Pattern_Syntax characters as the set of characters
				with syntactic use. The following sets shall be disjoint:</i>
		</p>
		<ol>
			<li>characters allowed in identifiers</li>
			<li>characters treated as whitespace</li>
			<li>characters with syntactic use</li>
		</ol>
		<p>
			<b><a name="R3b-2" href="#R3b-2">UAX31-R3b-2</a></b>.
				<i>Declare that it uses a <b>profile</b>
				of <a href="#R3b-1">UAX31-R3b-1</a>
				and define that profile with a precise specification of the
				characters that are added to or removed from the set of code points
				defined by the Pattern_Syntax property.
			</i>
		</p>

		<blockquote>
			<p>
			<b>Note:</b> When meeting requirement <a href="#R3b">UAX31-R3b</a>, characters allowed in identifiers may be given special significance in the syntax even when they are not part of identifiers.
			</p>
			<p>
			For instance, in a language which uses the C syntax for hexadecimal literals and meets requirement <a href="#R1">UAX31-R1</a>, the literal <code>0xDEADBEEF</code> consists entirely of identifier characters, yet the <code>0x</code> has special significance in the syntax, and the characters after that prefix are subject to special restrictions (only 0 through 9 and A through F are allowed).
			</p>
			<p>
			However, characters outside of those allowed in identifiers, those treated as whitespace, and the set [:Pattern_Syntax:] cannot be given special significance in the syntax. For instance, if a language meets requirements <a href="#R1">UAX31-R1</a> and <a href="#R3">UAX31-R3</a> with no profile and allows for user-defined operators, that language cannot allow the user to define an operator 🐈.
			</p>
			<p>
			Characters outside of  those allowed in identifiers, those treated as whitespace, and those with syntactic use can still be allowed in a program, for instance, as part of string literals or comments.
			</p>
		</blockquote>

		<h4>4.2.1 <a name="User-Defined_Operators" href="#User-Defined_Operators">User-Defined Operators</a></h4>
		<p>
			Some programming languages allow for user-defined operators. When meeting requirement <a href="#R3b">UAX31-R3b</a>, the set of characters that can be allowed in operators is limited; however, that leaves open the exact definition of operators. In order to avoid ambiguities in lexical analysis, operators should not be allowed to contain characters that may be found at the beginning of an identifier or literal; for instance, <code>+1</code> or <code>−x</code> should not be operators.
		</p>
		<p>
			The following definition avoids such interactions with default identifiers and with numbers.
		</p>

		<p>
			<b><a name="R3c" href="#R3c">UAX31-R3c</a></b>. <b>Operator Identifiers:</b> <i>To meet this requirement, an implementation shall meet requirement <a href="#R3b">UAX31-R3b</a> Pattern_Syntax Characters, and, to determine whether a string is an operator, it shall choose either UAX31-R3c-1 or UAX31-R3c-2.</i>
		</p><p>
		<p>
			<b><a name="R3c-1" href="#R3c-1">UAX31-R3c-1</a></b>. <i>Use definition <a href="#D1">UAX31-D1</a>, setting Start to be the set of characters with syntactic use, setting Continue to be the union of the set of characters with syntactic use and the set of characters with General_Category Mn, and leaving Medial empty.</i>
		</p>
		<p>
			<b><a name="R3c-2" href="#R3c-2">UAX31-R3c-2</a></b>. <i>Declare that it uses a profile of <a href="#R3c-1">UAX31-R3c-1</a> and define that profile with a precise specification of the characters and character sequences that are added to or removed from Start, Continue, and Medial and/or provide a list of additional constraints on operators.</i>
		</p>
		<blockquote>
				<p>
				<b>Note:</b> The set of Pattern_Syntax characters, which is the default for characters with syntactic use, contains some emoji. Implementations may wish to remove them, either to allow for their use in identifiers, or to reduce potential confusion arising from ⚽ being an operator but 🏉 not being one. This may be done using the standard profile for <a href="#R3b">UAX31-R3b</a> Pattern_Syntax Characters defined in <i>Section 7.2, <a href="#Emoji_Profile">Emoji Profile</a></i>.
				</p>
				<p>
				Nonspacing marks are included in Continue because they are part of the representation for many operators, such as some of the negated operators.
				</p>
				<p>
				Unassigned code points are not characters; they are therefore excluded by this definition.
				</p>
		</blockquote>
		<p>
		When meeting this requirement, a profile is likely to be needed depending on the specifics of the syntax. For instance, a programming language wherein string literals start with " should remove that character from the characters allowed in operators.
		</p>

		<h3>4.3 <a name="Pattern_Syntax" href="#Pattern_Syntax">Pattern Syntax</a></h3>

		<p>With a fixed set of whitespace and syntax code points, a
			pattern language can have a policy requiring all possible syntax
			characters (even ones currently unused) to be quoted if they are
			literals. Using this policy preserves the freedom to extend the
			syntax in the future by using those characters. Past patterns on
			future systems will always work; future patterns on past systems will
			signal an error instead of silently producing the wrong results.
			Consider the following scenario, for example.</p>
		<blockquote>
			<p>
				In version 1.0 of program X, &#39;≈&#39; is a reserved syntax
				character; that is, it does not perform an operation, and it needs
				to be quoted. In this example, &#39;\&#39; <i>quotes</i> the next
				character; that is, it causes it to be treated as a literal instead
				of a syntax character. In version 2.0 of program X, &#39;≈&#39; is
				given a real meaning—for example, “uppercase the subsequent
				characters”.
			</p>
			<ul>
				<li>The pattern abc...\≈...xyz works on both versions 1.0 and
					2.0, and refers to the literal character because it is quoted in
					both cases.</li>
				<li>The pattern abc...≈...xyz works on version 2.0 and
					uppercases the following characters. On version 1.0, the engine
					(rightfully) has no idea what to do with ≈. Rather than silently
					fail (by ignoring ≈ or turning it into a literal), it has the
					opportunity to signal an error.</li>
			</ul>
		</blockquote>
		<p>
			When <i>generating</i> rules or patterns, all whitespace and syntax
			code points that are to be literals require quoting, using whatever
			quoting mechanism is available. For readability, it is recommended
			practice to quote or escape all literal whitespace and default-ignorable code points as well.
		</p>
		<blockquote>
			<p>Consider the following example, where the items in angle
				brackets indicate literal characters:</p>
			<blockquote>
				<p>a&lt;SPACE&gt;b &#x2192; x&lt;ZERO WIDTH SPACE&gt;y&nbsp; +
					z;</p>
			</blockquote>
			<p>Because &lt;SPACE&gt; is a Pattern_White_Space character, it
				requires quoting. Because &lt;ZERO WIDTH SPACE&gt; is a default-ignorable character, it should also be quoted for readability. So in
				this example, if \uXXXX is used for a code point literal, but is
				resolved before quoting, and if single quotes are used for quoting,
				this example might be expressed as:</p>
			<blockquote>
				<p>&#39;a\u0020b&#39; &#x2192; &#39;x\u200By&#39; + z;</p>
			</blockquote>
		</blockquote>

		<h2>
			5 <a name="normalization_and_case" href="#normalization_and_case">Normalization
				and Case</a>
		</h2>

		<p>This section discusses issues that must be taken into account
			when considering normalization and case folding of identifiers in
			programming languages or scripting languages. Using normalization
			avoids many problems where apparently identical identifiers are not
			treated equivalently. Such problems can appear both during
			compilation and during linking—in particular across different
			programming languages. To avoid such problems, programming languages
			can normalize identifiers before storing or comparing them. Generally
			if the programming language has case-sensitive identifiers, then
			Normalization Form C is appropriate; whereas, if the programming
			language has case-insensitive identifiers, then Normalization Form KC
			is more appropriate.</p>
		<p>Implementations that take normalization and case into account
			have two choices: to treat variants as equivalent, or to disallow
			variants.</p>

		<p>
			<b><a name="R4" href="#R4">UAX31-R4</a></b>. <b>Equivalent
					Normalized Identifiers:</b> <i>To meet this requirement, an implementation
				shall specify the Normalization Form and shall provide a precise
				specification of the characters that are excluded from
				normalization, if any. If the Normalization Form is NFKC, the
				implementation shall apply the modifications in Section 5.1, <a
				href="#NFKC_Modifications">NFKC Modifications</a>, given by the
				properties XID_Start and XID_Continue. Except for identifiers
				containing excluded characters, any two identifiers that have the
				same Normalization Form shall be treated as equivalent by the
				implementation.</i>
		</p>

		<p>
			<b><a name="R5" href="#R5">UAX31-R5</a></b>. <b>Equivalent
					Case-Insensitive Identifiers:</b> <i>To meet this requirement, an
				implementation shall specify either simple or full case folding, and
				adhere to the Unicode specification for that folding. Any two
				identifiers that have the same case-folded form shall be treated as
				equivalent by the implementation.</i>
		</p>

		<p>
			<b><a name="R6" href="#R6">UAX31-R6</a></b>. <b>Filtered
					Normalized Identifiers:</b> <i>To meet this requirement, an implementation
				shall specify the Normalization Form and shall provide a precise
				specification of the characters that are excluded from
				normalization, if any. If the Normalization Form is NFKC, the
				implementation shall apply the modifications in Section 5.1, <a
				href="#NFKC_Modifications">NFKC Modifications</a>, given by the
				properties XID_Start and XID_Continue. Except for identifiers
				containing excluded characters, allowed identifiers must be in the
				specified Normalization Form.</i>
		</p>

		<blockquote>
			<p>
				<b>Note:</b> For requirement UAX31-R6, filtering involves disallowing any
				characters in the set \p{NFKC_QuickCheck=No}, or equivalently,
				disallowing \P{isNFKC}.
			</p>
		</blockquote>

		<p>
			<b><a name="R7" href="#R7">UAX31-R7</a></b>. <b>Filtered
					Case-Insensitive Identifiers:</b> <i>To meet this requirement, an
				implementation shall specify either simple or full case folding, and
				adhere to the Unicode specification for that folding. Except for
				identifiers containing excluded characters, allowed identifiers must
				be in the specified case folded form.</i>
		</p>
	  <blockquote>
			<p>
				<b>Note:</b> For requirement UAX31-R7 with full case folding, filtering
				involves disallowing any characters in the set <code>\p{Changes_When_Casefolded}</code>.
		  </p>
		</blockquote>


		<p>
			As of Unicode 5.2, an additional string transform is available for
			use in matching identifiers:
			<code>toNFKC_Casefold(S)</code>.
			See <b>R5</b> in <em>Section 3.13, Default Case Algorithms</em> in
			[<a href="../tr41/tr41-36.html#Unicode">Unicode</a>]. That operation
			case folds and normalizes a string, and also removes default-ignorable code points.
			It can be used to support an implementation of <a href="#R4">UAX31-R4</a> and <a href="#R5">UAX31-R5</a>
			<i>Equivalent Case and Compatibility-Insensitive Identifiers</i>.
			In order to implement requirement <a href="#R4">UAX31-R4</a>, canonical
			decomposition must be applied prior to the toNFKC_Casefold operation.
			The resulting equivalence relation between identifiers is an <i>identifier caseless match</i>,
			see definition D147 of [<a href="../tr41/tr41-36.html#Unicode">Unicode</a>].
			There is a corresponding boolean property,
			Changes_When_NFKC_Casefolded, which can be used to support an
			implementation of <i>Filtered Case and Compatibility-Insensitive
				Identifiers</i>. The NFKC_Casefold character mapping property and the
			Changes_When_NFKC_Casefolded property are described in Unicode
			Standard Annex #44, "Unicode Character Database" [<a
				href="../tr41/tr41-36.html#UAX44">UAX44</a>].
		</p>
		<blockquote>
			<p>
				<b>Note:</b> In mathematically oriented programming languages that
				make distinctive use of the Mathematical Alphanumeric Symbols, such
				as U+1D400 MATHEMATICAL BOLD CAPITAL A, an application of NFKC must
				filter characters to exclude characters with the property value
				Decomposition_Type=Font.
			</p>
		</blockquote>

		<h3>
			5.1 <a name="NFKC_Modifications" href="#NFKC_Modifications">NFKC
				Modifications</a>
		</h3>

		<p>Where programming languages are using NFKC to fold differences
			between characters, they need the following modifications of the
			identifier syntax from the Unicode Standard to deal with the
			idiosyncrasies of a small number of characters. These modifications
			are reflected in the XID_Start and XID_Continue properties.</p>

		<h4>
			5.1.1 <a name="Combining_Mark_Mods" href="#Combining_Mark_Mods">
				Modifications for Characters that Behave Like Combining Marks</a>
		</h4>

		<p>Certain characters are not formally combining characters,
			although they behave in most respects as if they were. In most cases,
			the mismatch does not cause a problem, but when these characters have
			compatibility decompositions, they can cause identifiers not to be
			closed under Normalization Form KC. In particular, the following four
			characters are included in XID_Continue and not XID_Start:</p>

		<ul style="list-style-type: none">
			<li>U+0E33 THAI CHARACTER SARA AM</li>
			<li>U+0EB3 LAO VOWEL SIGN AM</li>
			<li>U+FF9E HALFWIDTH KATAKANA VOICED SOUND MARK</li>
			<li>U+FF9F HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK</li>
		</ul>
		<h4>
			5.1.2 <a name="Irreg_Decomp_Mods" href="#Irreg_Decomp_Mods">
				Modifications for Irregularly Decomposing Characters</a>
		</h4>

		<p>U+037A GREEK YPOGEGRAMMENI and certain Arabic presentation
			forms have irregular compatibility decompositions and are excluded
			from both XID_Start and XID_Continue. It is recommended that all
			Arabic presentation forms be excluded from identifiers in any event,
			although only a few of them must be excluded for normalization to
			guarantee identifier closure.</p>

		<h4>
			5.1.3 <a name="Identifier_Closure" href="#Identifier_Closure">
				Identifier Closure Under Normalization</a>
		</h4>

		<p>
			With these amendments to the identifier syntax, all identifiers are
			closed under all four Normalization Forms. This means that for any
			string S, the implications shown in <i>Figure 5</i> hold.
		</p>
		<p class="caption">Figure 5. <a name="Figure_Normalization_Closure"
						href="#Figure_Normalization_Closure">Normalization Closure</a></p>
		<div align="center">
			<table class="simple">
				<tr>
					<td style="vertical-align: middle"><code>isIdentifier(S)</code>&nbsp;&rarr;&nbsp;</td>
					<td style="vertical-align: middle"><code>
							isIdentifier(toNFD(S))<br> isIdentifier(toNFC(S))<br>
							isIdentifier(toNFKD(S))<br> isIdentifier(toNFKC(S))
						</code></td>
				</tr>
			</table>
		</div>
		<p>
			Identifiers are also closed under case operations. For any string S
			(with exceptions involving a single character), the implications
			shown in <i>Figure 6</i> hold.
		</p>
		<p class="caption">Figure 6. <a name="Figure_Case_Closure" href="#Figure_Case_Closure">
						Case Closure</a></p>
		<div align="center">
			<table class="simple">
				<tr>
					<td style="vertical-align: middle"><code>isIdentifier(S)</code>&nbsp;&rarr;&nbsp;</td>
					<td style="vertical-align: middle"><code>
							isIdentifier(toLowercase(S))<br>
							isIdentifier(toUppercase(S))<br>
							isIdentifier(toFoldedcase(S))
						</code></td>
				</tr>
			</table>
		</div>
		<p>The one exception for casing is U+0345 COMBINING GREEK
			YPOGEGRAMMENI. In the very unusual case that U+0345 is at the start
			of S, U+0345 is not in XID_Start, but its uppercase and case-folded
			versions are. In practice, this is not a problem because of the way
			normalization is used with identifiers.</p>
		<p>
			The reverse implication is true for canonical equivalence but <i>not</i>
			true in the case of compatibility equivalence:
		</p>
		<p class="caption">Figure 7. <a name="Figure_Reverse_Normalization_Closure"
						href="#Figure_Reverse_Normalization_Closure">Reverse
						Normalization Closure</a></p>
		<div align="center">
			<table class="simple">
				<tr>
					<td style="vertical-align: middle"><code>
							isIdentifier(toNFD(S))<br> isIdentifier(toNFC(S))
						</code></td>
					<td style="vertical-align: middle">&nbsp;&rarr;&nbsp;<code>isIdentifier(S)</code></td>
				</tr>
				<tr>
					<td style="vertical-align: middle"><code>
							isIdentifier(toNFKD(S))<br> isIdentifier(toNFKC(S))
						</code>&nbsp;</td>
					<td style="vertical-align: middle">&nbsp;↛&nbsp;<code>isIdentifier(S)</code></td>
				</tr>
			</table>
		</div>
		<p>
			There are many characters for which the reverse implication is not
			true for compatibility equivalence, because there are many characters
			counting as symbols or non-decimal numbers—and thus outside of
			identifiers—whose compatibility equivalents are letters or decimal
			numbers and thus in identifiers. Some examples are shown in <i><a
				href="#Figure_Compatibility_Equivalents_to_Letters_or_Decimal_Numbers">Table
					8</a></i>.
		</p>
		<p class="caption">Table 8. <a
						name="Figure_Compatibility_Equivalents_to_Letters_or_Decimal_Numbers"
						href="#Figure_Compatibility_Equivalents_to_Letters_or_Decimal_Numbers">
						Compatibility Equivalents to Letters or Decimal Numbers</a></p>
		<div align="center">
			<table class="subtle">
				<tr>
					<th>Code Points</th>
					<th>GC</th>
					<th>Samples</th>
					<th>Names</th>
				</tr>
				<tr>
					<td>2070</td>
					<td>No</td>
					<td>⁰</td>
					<td>SUPERSCRIPT ZERO</td>
				</tr>
				<tr>
					<td>20A8</td>
					<td>Sc</td>
					<td>₨</td>
					<td>RUPEE SIGN</td>
				</tr>
				<tr>
					<td>2116</td>
					<td>So</td>
					<td>№</td>
					<td>NUMERO SIGN</td>
				</tr>
				<tr>
					<td>2120..2122</td>
					<td>So</td>
					<td>℠..™</td>
					<td>SERVICE MARK..TRADE MARK SIGN</td>
				</tr>
				<tr>
					<td>2460..2473</td>
					<td>No</td>
					<td>①..⑳</td>
					<td>CIRCLED DIGIT ONE..CIRCLED NUMBER TWENTY</td>
				</tr>
				<tr>
					<td>3300..33A6</td>
					<td>So</td>
					<td>㌀..㎦</td>
					<td>SQUARE APAATO..SQUARE KM CUBED</td>
				</tr>
			</table>
		</div>
		<p>If an implementation needs to ensure both directions for
			compatibility equivalence of identifiers, then the identifier
			definition needs to be tailored to add these characters.</p>
		<p>
			For canonical equivalence the implication is true in both directions.
			<code>isIdentifier(toNFC(S))</code>
			if and only if
			<code>isIdentifier(S)</code>.
		</p>
		<p>
			There were two exceptions before Unicode 5.1, as shown in <a
				href="#Figure_Canonical_Equivalence_Exceptions_Prior_to_Unicode_5.1"><em>Table
					9</em></a>. If an implementation needs to ensure full canonical equivalence
			of identifiers, then the identifier definition must be tailored so
			that these characters have the same value, so that either both
			isIdentifier(S) and isIdentifier(toNFC(S)) are true, or so that both
			values are false.
		</p>
		<p class="caption">Table 9. <a
						name="Figure_Canonical_Equivalence_Exceptions_Prior_to_Unicode_5.1"
						href="#Figure_Canonical_Equivalence_Exceptions_Prior_to_Unicode_5.1">
						Canonical Equivalence Exceptions Prior to Unicode 5.1</a></p>
		<div align="center">
			<table class="subtle">
				<tr>
					<th>isIdentifier(toNFC(S))=True</th>
					<th>isIdentifier(S)=False</th>
					<th>Different in</th>
				</tr>
				<tr>
					<td>02B9 ( ʹ ) MODIFIER LETTER PRIME</td>
					<td>0374 ( ʹ ) GREEK NUMERAL SIGN</td>
					<td>XID and ID</td>
				</tr>
				<tr>
					<td>00B7 ( · ) MIDDLE DOT</td>
					<td>0387 ( · ) GREEK ANO TELEIA</td>
					<td>XID alone</td>
				</tr>
			</table>
		</div>
		<p>
			Those programming languages with case-insensitive identifiers should
			use the case foldings described in <i>Section 3.13, Default Case
				Algorithms</i>, of [<a href="../tr41/tr41-36.html#Unicode">Unicode</a>]
			to produce a case-insensitive normalized form.
		</p>
		<p>When source text is parsed for identifiers, the folding of
			distinctions (using case mapping or NFKC) must be delayed until after
			parsing has located the identifiers. Thus such folding of
			distinctions should not be applied to string literals or to comments
			in program source text.</p>
		<p>
			The Unicode Standard supports case folding with normalization, with
			the function toNFKC_Casefold(X). See definition R5 in <em>Section
				3.13, Default Case Algorithms</em> in [<a
				href="../tr41/tr41-36.html#Unicode">Unicode</a>] for the
			specification of this function and further explanation of its use.
		</p>
		<h3>
			5.2 <a name="Case_and_Stability" href="#Case_and_Stability">Case
				and Stability</a>
		</h3>
		<p>The alphabetic case of the initial character of an identifier
			is used as a mechanism to distinguish syntactic classes in some
			languages like Prolog, Erlang, Haskell, Clean, and Go. For example,
			in Prolog and Erlang, variables must begin with capital letters (or
			underscores) and atoms must not. There are some complications in the
			use of this mechanism.</p>
		<p>For such a casing distinction in a programming language to work
			with unicameral writing systems (such as Kanji or Devanagari),
			another mechanism (such as underscores) needs to substitute for the
			casing distinction.</p>
		<p>
			Casing stability is also an issue for bicameral writing systems. The
			assignment of General_Category property values, such as gc=Lu, is not
			guaranteed to be stable, nor is the assignment of characters to the
			broader properties such as Uppercase. So these property values cannot
			be used by themselves, without incorporating a 
			mechanism that preserves backward compatibility,
			such as is done for Unicode identifiers in <em>Section
				2.5 <a href="#Backward_Compatibility">Backward Compatibility</a></em>.
				That is, the implementation would maintain its own list of special
			inclusions and exclusions that require updating for each new version
			of Unicode.
		</p>
		<p>
			Alternatively, a programming language specification can use the
			operation specified in <a
				href="https://www.unicode.org/policies/stability_policy.html#Case_Folding">Case
				Folding Stability</a> as the basis for its casing distinction. That
			operation <em>is</em> guaranteed to be stable. That is, one can use a
			casing distinction such as the following:
		</p>
		<ol>
			<li>S is a <strong>variable</strong> if S begins with an
				underscore.
			</li>
			<li>Otherwise, produce S' = toCasefold(toNFKC(S))
				<ol type="a">
					<li>S is a <strong>variable</strong> if firstCodePoint(S) ≠
						firstCodePoint(S'),
					</li>
					<li>otherwise S is an <strong>atom</strong>.<br>
					</li>
				</ol></li>
		</ol>
		<p>This test can clearly be optimized ​for the normal cases, such
			as initial ASCII. It is also recommended that identifiers be in NFKC
			format, which makes the detection even simpler.</p>
		<h4>
			5.2.1 <a name="Edge_Cases_for_Folding" href="#Edge_Cases_for_Folding">Edge
				Cases for Folding</a>
		</h4>
		<p>In Unicode 8.0, the Cherokee script letters have been changed
			from gc=Lo to gc=Lu, and corresponding lowercase letters (gc=Ll) have
			been added. This is an unusual pattern; typically when case pairs are
			added, existing letters are changed from gc=Lo to gc=Ll, and new
			corresponding uppercase letters (gc=Lu) are added. In the case of
			Cherokee, it was felt that this solution provided the most
			compatibility for existing implementations in terms of font
			treatment.</p>
		<p>The downside of this approach is that the Cherokee characters,
			when case-folded, will convert as necessary to the pre-8.0
			characters, namely to the uppercase versions. This folding is unlike
			that of any other case-mapped characters in Unicode. Thus the
			case-folded version of a Cherokee string will contain uppercase
			letters instead of lowercase letters. Compatibility with fonts for
			the current user community was felt to be more important than the
			confusion introduced by this edge case of case folding, because
			Cherokee programmatic identifiers would be rare.</p>
		<p>The upshot is that when it comes to identifiers,
			implementations should never use the General_Category or Lowercase or
			Uppercase properties to test for casing conditions, nor use
			toUppercase(), toLowercase(), or toTitlecase() to fold or test
			identifiers. Instead, they should instead use Case_Folding or
			NFKC_CaseFold.</p>
		<h2>
			6 <a href="#hashtag_identifiers" name="hashtag_identifiers">Hashtag
				Identifiers</a>
		</h2>
		<p>Hashtag identifiers have become very popular in
			social media. They consist of a number sign in front of some string
			of characters, such as #emoji. The actual composition of allowable
			Unicode hashtag identifiers varies between vendors. It has also
			become common for hashtags to include emoji characters, without a
			clear notion of exactly which characters are included.</p>
		<p>This section presents a syntax that can be used
			for parsing Unicode hashtag identifiers  for increased interoperability.</p>
		<p>
			<b><a name="D2" href="#D2">UAX31-D2</a></b>. <b>Default
					Hashtag Identifier Syntax:</b>
		</p>
		<blockquote>
			<p><code>&lt;Hashtag-Identifier&gt; := &lt;Start&gt; &lt;Continue&gt;*
          (&lt;Medial&gt; &lt;Continue&gt;+)*</code></p>
		</blockquote>
		When parsing hashtags in flowing text, it is
				recommended that an extended Hashtag only be recognized when there
				is no Continue character before a Start character. For example, in
				“abc#def” there would be no hashtag, while there would be in “abc
				#def” or “abc.#def”.
		<p>
			<b><a name="R8" href="#R8">UAX31-R8</a></b>. <b>Extended
					Hashtag Identifiers:</b> <i>To meet this requirement, to determine whether
				a string is a hashtag identifier an implementation shall
				choose either <a href="#R8-1">UAX31-R8-1</a> or <a href="#R8-2">UAX31-R8-2</a>.</i>
		</p>
		<p>
		<b><a name="R8-1" href="#R8-1">UAX31-R8-1</a></b>.
			<i>Use definition <a href="#D2">UAX31-D2</a>, setting:</i>
		</p>
		<ol>
			<li>Start := [#﹟#]
				<ul>
					<li>U+0023 NUMBER SIGN</li>
					<li>U+FE5F SMALL NUMBER SIGN</li>
					<li>U+FF03 FULLWIDTH NUMBER SIGN</li>
					<li>(These are # and its compatibility equivalents.)</li>
				</ul>
			</li>
			<li>Medial is currently empty, but can be used for customization.</li>
			<li>Continue := XID_Continue, plus Extended_Pictographic, Emoji_Component, and “_”, “-”, “+”, minus Start characters.
				<ul>
					<li>Note the subtraction of # characters.</li>
					<li>This is expressed in UnicodeSet notation as:<br>
					[\p{XID_Continue}\p{Extended_Pictographic}\p{Emoji_Component}[-+_]-[#﹟#]]</li>
				</ul>
		  	</li>
		</ol>
		<p>
			<b><a name="R8-2" href="#R8-2">UAX31-R8-2</a></b>.
			<i>Declare that
			it uses a <b>profile</b> of <a href="#R8-1">UAX31-R8-1</a> as in <b><a href="#R1">UAX31-R1</a></b>.</i>
		</p>
		<p>The emoji properties are from the corresponding version of [<a href="../tr41/tr41-36.html#UTS51">UTS51</a>]. The version of the emoji properties is tied to the version of the Unicode Standard, starting with Version 11.0.</p>
		<p>The techniques mentioned in Section 2.5 <a
				href="#Backward_Compatibility">Backward Compatibility</a> may be
			used where stability between successive versions is required.</p>
		<p>Comparison and matching should be done after converting to NFKC_CF format. Thus #MötleyCrüe should match #MÖTLEYCRÜE	and	other variants.</p>
		<p>Implementations may choose to add characters in <em>Table 3a, <a href="#Table_Optional_Medial">Optional Characters for Medial</a></em> to <strong>Medial</strong> and <em>Table 3b, <a href="#Table_Optional_Continue">Optional Characters for Continue</a></em> to <strong>Continue</strong> for better identifiers for natural languages.</p>

		<h2>
			7 <a name="Standard_Profiles"
				href="#Standard_Profiles">Standard Profiles</a>
		</h2>
		<p>
		Two standard profiles for default identifiers are provided to cater to common patterns of use observed in programming languages with less restrictive identifier syntaxes, including those that use UAX31-R2 default identifiers: the inclusion of characters suitable for mathematical usage in identifiers, and the inclusion of emoji in identifiers.
		</p>
		<p>
		These profiles are associated with profiles for requirements <a href="#R3b">UAX31-R3b</a>.
		</p>
		<p>
		Further, a standard profile is provided to exclude default-ignorable code points from identifiers. Having no visible effect in most contexts, these characters can lead to spoofing issues; see <i>Section 2.3, <a href="#Layout_and_Format_Control_Characters">Layout and Format Control Characters</a></i>.
		</p>
		<p>
		For guidance on the applicability of these profiles to programming languages, see Unicode Technical Standard #55, “Unicode Source Code Handling” [<a href="../tr41/tr41-36.html#UTS55">UTS55</a>].
		</p>
		<h3>7.1 <a href="#Mathematical_Compatibility_Notation_Profile" name="Mathematical_Compatibility_Notation_Profile">Mathematical Compatibility Notation Profile</a></h3>
		<p>
		The Mathematical Compatibility Notation Profile for default identifiers consists of the addition of the set [:ID_Compat_Math_Start:] to the set <i>Start</i>, and the set [:ID_Compat_Math_Continue:] to the set <i>Continue</i>, in definition <a href="#D1">UAX31-D1</a>.
		</p>
		<blockquote>
			<b>Note:</b> The set [:ID_Compat_Math_Start:] comprises ∂, ∇, and their mathematical style variants, as well as ∞.
			The set [:ID_Compat_Math_Continue:] comprises [:ID_Compat_Math_Start:], as well as subscript and superscript digits and signs with mathematical use.
		</blockquote>
		<p>
		It is associated with a profile for <a href="#R3b">UAX31-R3b</a>, which consists of removing the characters in the intersection [[:Pattern_Syntax:] &amp; [:ID_Compat_Math_Continue:]] from the set of characters with syntactic use (these are the characters ∂, ∇, and ∞).
		</p>
		<blockquote>
			<b>Note:</b> While <em>supporting</em> these characters is recommended for some computer languages because they can be beneficial in some applications, these characters, like many others characters that are allowed in default identifiers, are discouraged in general use, as they are confusing to most readers. See Unicode Technical Standard #55, “Unicode Source Code Handling” [<a href="../tr41/tr41-36.html#UTS55">UTS55</a>].
		</blockquote>

		<h3>7.2 <a href="#Emoji_Profile" name="Emoji_Profile">Emoji Profile</a></h3>
		<p>
			The Emoji Profile for  default identifiers provides for the inclusion of emoji characters and sequences in identifiers. A large subset of emoji are already supported in some programming languages, but this profile provides a mechanism for treating them consistently as part of the lexical structure of a language.
		</p>
		<p>
			The Emoji Profile for default identifiers consists of:
		</p>
		<ol>
			<li>
				The addition of the RGI emoji set defined by ED-27 in Unicode Technical Standard #51, “Unicode Emoji” [<a href="../tr41/tr41-36.html#UTS51">UTS51</a>] for a given version of Unicode to the sets <i>Start</i> and <i>Continue</i> in definition <a href="#D1">UAX31-D1</a>.
			</li>
			<li>
				The removal of the code point U+FE0E VARIATION SELECTOR-15 (the Text Presentation Selector) from the set <i>Continue</i>.
			</li>
		</ol>
		<blockquote>
			<b>Note:</b> The Emoji Profile requires the use of character sequences, rather than individual code points, in the sets <i>Start</i> and <i>Continue</i> defined by <a href="#D1">UAX31-D1</a>.  When using this profile, U+002A asterisk (*), U+203C double exclamation mark (‼), or U+263A white smiling face (☺) are not legal identifiers, but the sequences (U+002A, U+FE0F, U+20E3) *️⃣, (U+203C, U+FE0F) ‼️, and (U+263A, U+FE0F) ☺️ are allowed in identifiers. This would require some changes to lexers: when they hit a character that starts an emoji sequence they will (logically) switch to a different mechanism for parsing.
		</blockquote>
		<p>
			The Emoji Profile includes characters that are in Pattern_Syntax; it is therefore associated with a profile for <a href="#R3b">UAX31-R3b</a>, which consists of replacing each emoji character of a certain subset of [:Pattern_Syntax:] by its <b><i>text presentation sequence</i></b> (ED-8a):
		</p>
		<ol>
			<li>
			Remove the characters in the set [[:Pattern_Syntax:]&amp;[:Emoji_Presentation:]] from the set of characters with syntactic use.
			</li>
			<li>
			For all C in [[:Pattern_Syntax:]&amp;[:Emoji_Presentation:]], add the sequence consisting of C followed by U+FE0E VARIATION SELECTOR-15 (the Text Presentation Selector) to the set of characters with syntactic use.
			</li>
		</ol>
		<p>
			In addition, in order to avoid lexical ambiguities between identifiers and operators, the Emoji Profile includes a profile for <a href="#R3c">UAX31-R3c</a>, which consists of the removal of the character U+FE0F VARIATION SELECTOR-16 (the Emoji Presentation Selector) from the set <i>Continue</i>.
		</p>
		<blockquote><b>Example:</b> Consider a language that meets requirements <a href="#R3b">UAX31-R3b</a>  and <a href="#R3b">UAX31-R3c</a> with no profile. U+2615 HOT BEVERAGE (☕) is a character with syntactic use, and therefore it is an operator. When meeting these requirements with the Emoji Profile, U+2615 HOT BEVERAGE (☕) is not a character with syntactic use (which allows it to be an identifier character) and ☕ is not a valid operator. However, the sequence U+2615 U+FE0F (☕︎) is added to the set of characters with syntactic use, and therefore ☕︎ is a valid operator.
		</blockquote>
		<p>
			This change means that if some of the Pattern_Syntax characters with the Emoji_Presentation property were in syntactic use (e.g., in operators)  prior to adopting the Emoji Profile, they become identifiers once the profile is adopted, but can be turned back into operators by adding U+FE0E VARIATION SELECTOR-15, allowing for a migration path.
		</p>
		<p>
			Of course, if a programming language only uses a subset of the Pattern_Syntax characters that does not include these characters, no action needs to be taken.
		</p>
		<p>
			Some other characters in Pattern_Syntax (such as ↔) are used in emoji (such as ↔️), but they are not emoji on their own, so that they do not need to be removed from the set of characters with syntactic use as long as lexical analysis properly takes sequences into account.
		</p>
		<p>
		The emoji sequences require 98 default-ignorable characters:
		</p>
		<ul>
			<li>U+200D ZERO WIDTH JOINER (also known as ZWJ)</li>
			<li>U+FE0F VARIATION SELECTOR-16 (also known as Emoji Presentation Selector)</li>
			<li>U+E0020..U+E007F 96 TAG characters</li>
		</ul>
		<p>
		Thus, if this profile is combined with any profile that removes default-ignorable characters, such as the Default-Ignorable Exclusion Profile, those characters need to be retained in the context of emoji sequences.
		</p>
		<p>Consider the following examples, in a language that meets requirement <a href="#R1">UAX31-R1</a> with both the Emoji Profile and the Default Ignorable Exclusion Profile:</p>
		<div align="center">
		<table class="subtle">
		<thead><tr><th>Sequence</th><th>Appearance</th><th>Legal Identifier?</th><th>Reason</th></tr></thead>
		<tbody>
		<tr><td>A+ZWJ+B</td><td>A‍B</td><td>No</td><td>ZWJ is not part of an emoji sequence</td></tr>
		<tr><td>U+1F408 + ZWJ + U+2B1B</td><td>🐈‍⬛</td><td>Yes</td><td rowspan="2">ZWJ is part of an emoji sequence
(for <i>black cat</i>)</td></tr>
		<tr><td>BIG + U+1F408 + ZWJ + U+2B1B</td><td>BIG🐈‍⬛</td><td>Yes</td></tr>
		</tbody>
		</table>
		</div>
		<h3>7.3 <a name="Default_Ignorable_Exclusion_Profile" href="#Default_Ignorable_Exclusion_Profile">Default-Ignorable Exclusion Profile</a></h3>
		<p>
			The default-ignorable exclusion profile for default identifiers consists of the exclusion of the code points with property Default_Ignorable_Code_Point from the sets <i>Start</i> and <i>Continue</i> in definition <a href="#D1">UAX31-D1</a>.
		</p>
		<blockquote>
			<b>Note:</b> While it reduces the attack surface, excluding default-ignorable code points does not prevent spoofing issues. More comprehensive mechanisms are described in Unicode Technical Standard #39, “Unicode Security Mechanisms” [<a href="../tr41/tr41-36.html#UTS39">UTS39</a>]; in particular, the exclusion of default-ignorable code points is part of the General for Profile for Identifiers.
		</blockquote>
		<blockquote>
			<b>Note:</b> Where higher level diagnostics are available, such as in programming environments, more targeted measures can be taken in order to still allow for the legitimate use of these characters. See Unicode Technical Standard #55, “Unicode Source Code Handling” [<a href="../tr41/tr41-36.html#UTS55">UTS55</a>].
		</blockquote>

<h2 class="nonumber">
	  <a name="Acknowledgments" href="#Acknowledgments">Acknowledgments</a>
	</h2>
	  <p>Mark Davis is the author of the initial version and has added
			to and maintained the text of this annex. Robin Leroy has assisted in updating it starting with Version 15.0.</p>
		<p>
			The attendees of the Source Code Working Group meetings assisted with the substantial changes made in Versions 15.0 and 15.1:
			Peter Constable,
			Elnar Dakeshov,
			Mark Davis,
			Barry Dorrans,
			Steve Dower,
			Michael Fanning,
			Asmus Freytag,
			Dante Gagne,
			Rich Gillam,
			Manish Goregaokar,
			Tom Honermann,
			Jan Lahoda,
			Nathan Lawrence,
			Robin Leroy,
			Chris Ries,
			Markus Scherer,
			Richard Smith.
		</p>
		<p>Thanks to Eric Muller, Asmus Freytag, Lisa Moore, Julie Allen, Jonathan Warden, Kenneth
			Whistler, David Corbett, Klaus Hartke, Martin Dürst, Deborah Anderson, Steve Downey, Ned Holbrook, Corentin Jabot, 梁海 Liang Hai, Jens Maurer, Hubert Tong, and Crystal Durham for feedback on this annex.</p>
		<h2 class="nonumber">
			<a name="References" href="#References">References</a>
		</h2>
		<p>
			For references for this annex, see Unicode Standard Annex #41, “<a
				href="../tr41/tr41-36.html">Common References for Unicode
				Standard Annexes</a>.”
		</p>
		<h2>
			<a name="Migration" href="#Migration">Migration</a>
		</h2>
		<p><strong>Version 15.1</strong></p>
			<p>Requirement <a href="#R1a">UAX31-R1a Restricted Format Characters</a> has been withdrawn.</p>
			<p>If implementations that claimed conformance to UAX31-R1a wish to retain the contextual checks for ZWJ and ZWNJ, they should refer to the General Security Profile in Unicode Technical Standard #39, “Unicode Security Mechanisms” [<a href="../tr41/tr41-36.html#UTS39">UTS39</a>].</p>
			<p>In previous versions, requirement <a href="#R3">UAX31-R3 Pattern_White_Space and Pattern_Syntax Characters</a> did not require any particular interpretation of whitespace characters. It now specifies which characters are to be treated as line terminators, horizontal space, and ignorable format controls. The meaning of syntactic use has also been clarified.</p>
			<p>Implementations that claim conformance to UAX31-R3 should check that they interpret the characters in Pattern_White_Space as described in <a href="#R3a">UAX31-R3a Pattern_White_Space Characters</a>, and that their use of Pattern_Syntax characters is consistent with <a href="#R3b">UAX31-R3b Pattern_Syntax Characters</a>.</p>
		<p><strong>Version 15.0</strong></p>
			<p>In previous versions, the note explaining how to implement requirement <a href="#R7">UAX31-R7 Filtered Case-Insensitive Identifiers</a> with full case folding referred to the wrong property, and the requirement itself incorrectly refered to Normalization Form rather than case folded form.</p>
			<p>Implementations that claim conformance to UAX31-R7 should check that they use the correct property.</p>
		<p><strong>Version 13.0</strong></p>
        <p>Version 13.0 changed the structure of Table 4. <a href="#Table_Candidate_Characters_for_Exclusion_from_Identifiers">Excluded Scripts</a> significantly, dropping conditions that were not based on script. Implementations that were based on Table 4 should refer to <em>UTS #39, Unicode Security Mechanisms</em> [<a href="../tr41/tr41-36.html#UTS39">UTS39</a>] for additional restrictions.</p>
        <p><strong>Version 11.0</strong></p>
		<p>Version 11.0 refines the use of ZWJ in identifiers (adding some restrictions and relaxing others slightly), and broadens the definition of hashtag identifiers somewhat. For details, see the <a href="#Modifications">Modifications</a>.</p>

		<p><strong>Version 9.0</strong>
	  </p>
		<p>In previous versions, the text favored the use
			of XID_Start and XID_Continue, as in the following paragraph. However, the formal definition used ID_Start and ID_Continue.</p>
		<blockquote>
			<p>
				The XID_Start and XID_Continue properties are improved lexical
				classes that incorporate the changes described in <i>Section
					5.1, <a href="#NFKC_Modifications">NFKC Modifications</a></i>.
					They are recommended for most purposes, especially for security,
				over the original ID_Start and ID_Continue properties.
			</p>
		</blockquote>
		<p>In version 9.0, that is swapped and the X versions are
			stated explicitly in the formal definition. This affects just the
			following characters.</p>
		<blockquote>
			<p>
				<code>
					037A ; GREEK YPOGEGRAMMENI<br> 0E33 ; THAI CHARACTER SARA AM<br>
					0EB3 ; LAO VOWEL SIGN AM<br> 309B ; KATAKANA-HIRAGANA VOICED
					SOUND MARK<br> 309C ; KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK<br>
					FC5E..FC63 ; ARABIC LIGATURE SHADDA WITH SUPERSCRIPT ALEF ISOLATED
					FORM<br> FDFA ; ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM<br>
					FDFB ; ARABIC LIGATURE JALLAJALALOUHOU<br> FE70 ; ARABIC
					FATHATAN ISOLATED FORM<br> FE72 ; ARABIC DAMMATAN ISOLATED
					FORM<br> FE74 ; ARABIC KASRATAN ISOLATED FORM<br> FE76 ;
					ARABIC FATHA ISOLATED FORM<br> FE78 ; ARABIC DAMMA ISOLATED
					FORM<br> FE7A ; ARABIC KASRA ISOLATED FORM<br> FE7C ;
					ARABIC SHADDA ISOLATED FORM<br> FE7E ; ARABIC SUKUN ISOLATED
					FORM<br> FF9E ; HALFWIDTH KATAKANA VOICED SOUND MARK<br>
					FF9F ; HALFWIDTH KATAKANA SEMI-VOICED SOUND MARK
				</code>
			</p>
		</blockquote>
		<p> Implementations that wish to maintain
			conformance to the older recommendation need only declare a profile
			that uses  ID_Start and ID_Continue instead of XID_Start and XID_Continue.</p>
		<p>Version 9.0 splits the older Table 3 from Version 8.0 into 3
			parts.</p>
		<div align="center">
		<table class="subtle">
			<tr>
				<th>Current Tables</th>
				<th>Unicode 8.0</th>
			</tr>
			<tr>
				<td><em>Table 3, <a href="#Table_Optional_Start">Optional Characters for Start</a></em></td>
				<td style="text-align: center" rowspan="2"><em>Table 3, Candidate Characters for Inclusion in ID_Continue</em></td>
			</tr>
			<tr>
				<td><em>Table 3a, <a href="#Table_Optional_Medial">Optional Characters for Medial</a></em></td>
			</tr>
			<tr>
				<td><em>Table 3b, <a href="#Table_Optional_Continue">Optional Characters for Continue</a></em></td>
				<td style="text-align: center"><em>only outlined in text</em></td>
			</tr>
		</table>
		</div>
		<p>
			<strong>Version 6.1</strong>
		</p>
		<p>Between Unicode Versions 5.2, 6.0 and 6.1, Table 5 was split in
			three. In Version 6.1, the resulting tables were renumbered for
			easier reference. The titles and links remain the same, for
			stability.</p>
		<p>The following shows the correspondences:</p>
		<div align="center">
			<table class="subtle">
				<tr>
					<th>Current Tables</th>
					<th>Unicode 6.0</th>
					<th>Unicode 5.2</th>
				</tr>
				<tr>
					<td><em>Table 5, <a href="#Table_Recommended_Scripts">Recommended Scripts</a></em></td>
					<td style="text-align: center" rowspan="2">5a</td>
					<td style="text-align: center" rowspan="3">5</td>
				</tr>
				<tr>
					<td><em>Table 6, <a href="#Aspirational_Use_Scripts">Aspirational Use Scripts</a></em></td>
				</tr>
				<tr>
					<td><em>Table 7, <a href="#Table_Limited_Use_Scripts">Limited Use Scripts</a></em></td>
					<td style="text-align: center">5b</td>
				</tr>
				<tr>
					<td><i>Table 8, <a
							href="#Figure_Compatibility_Equivalents_to_Letters_or_Decimal_Numbers">Compatibility Equivalents to Letters or Decimal Numbers</a></i></td>
					<td style="text-align: center">6</td>
					<td style="text-align: center">6</td>
				</tr>
				<tr>
					<td><em>Table 9, <a
						href="#Figure_Canonical_Equivalence_Exceptions_Prior_to_Unicode_5.1">Canonical Equivalence Exceptions Prior to Unicode 5.1</a></em></td>
					<td style="text-align: center">7</td>
					<td style="text-align: center">7</td>
				</tr>
			</table>
		</div>

		<h2 class="nonumber">
			<a name="Modifications" href="#Modifications">Modifications</a>
		</h2>

		<p>The following summarizes modifications from the previously published version
			of this annex.</p>

		<h3><b>Revision 43</b></h3>
		<ul>
			<li><b>Reissued</b> for Unicode 17.0.</li>
			<li><i>Section 7.1, <a href="#Mathematical_Compatibility_Notation_Profile">Mathematical Compatibility Notation Profile</a></i>:
				Corrected the description of the associated profile for UAX31-R3b based on public feedback. [<a href="https://www.unicode.org/cgi-bin/GetL2Ref.pl?184-C32">184-C32</a>]</li>
			<li>Before <i>Table 5. <a href="#Table_Recommended_Scripts">Recommended Scripts</a></i>:
				Added a note about challenges with using the Tibetan script in identifiers.
				([<a href="https://www.unicode.org/cgi-bin/GetL2Ref.pl?183-A74">183-A74</a>])</li>
			<li>Moved Bopomofo from <i>Table 5. <a href="#Table_Recommended_Scripts">Recommended Scripts</a></i>
				to <i>Table 7. <a href="#Table_Limited_Use_Scripts">Limited Use Scripts</a></i>.
				([<a href="https://www.unicode.org/cgi-bin/GetL2Ref.pl?183-A78">183-A78</a>])</li>
			<li>Added the scripts newly encoded in Unicode 16 —
				Garay, Gurung Khema, Kirat Rai, Ol Onal, Sunuwar, Todhri, and Tulu-Tigalari —
				to <i>Table 4. <a href="#Table_Candidate_Characters_for_Exclusion_from_Identifiers">Excluded Scripts</a></i>.
				([<a href="https://www.unicode.org/cgi-bin/GetL2Ref.pl?184-A73">184-A73</a>])</li>
			<li>Added the scripts newly encoded in Unicode 17 —
				Sidetic, Tolong Siki, Beria Erfe, and Tai Yo —
				to <i>Table 4. <a href="#Table_Candidate_Characters_for_Exclusion_from_Identifiers">Excluded Scripts</a></i>.
				([<a href="https://www.unicode.org/cgi-bin/GetL2Ref.pl?183-A80">183-A80</a>])</li>
		</ul>


	  <p>Modifications for previous versions are listed in those respective versions.</p>

  <hr width="50%">
  <p class="copyright">© 2005–2025 Unicode, Inc. This publication is protected by copyright, and permission must be obtained from Unicode, Inc. prior to any reproduction, modification, or other use not permitted by the <a href="https://www.unicode.org/copyright.html">Terms of Use</a>. Specifically, you may make copies of this publication and may annotate and translate it solely for personal or internal business purposes and not for public distribution, provided that any such permitted copies and modifications fully reproduce all copyright and other legal notices contained in the original. You may not make copies of or modifications to this publication for public distribution, or incorporate it in whole or in part into any product or publication without the express written permission of Unicode.</p>

  <p class="copyright">Use of all Unicode Products, including this publication, is governed by the Unicode <a href="https://www.unicode.org/copyright.html">Terms of Use</a>. The authors, contributors, and publishers have taken care in the preparation of this publication, but make no express or implied representation or warranty of any kind and assume no responsibility or liability for errors or omissions or for consequential or incidental damages that may arise therefrom. This publication is provided “AS-IS” without charge as a convenience to users.</p>

  <p class="copyright">Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries.</p>

	</div>
	<!-- body -->
</body>
</html>
Rendered documentLive HTML preview