tr57-5.html
1274 lines<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head><base href="https://www.unicode.org/reports/tr57/tr57-5.html">
<title>UAX #57: Unicode Egyptian Hieroglyph Database</title>
<link rel="stylesheet" type="text/css" href="https://www.unicode.org/reports/reports-v2.css">
</head>
<body>
<table class="header">
<tr>
<td class="icon" style="width:38px; height:35px">
<a href="https://www.unicode.org/">
<img border="0" src="https://www.unicode.org/webscripts/logo60s2.gif" align="middle"
alt="[Unicode]" width="34" height="33"></a>
</td>
<td class="icon" style="vertical-align:middle">
<a class="bar"> </a>
<a class="bar" href="https://www.unicode.org/reports/"><font size="3">Technical Reports</font></a>
</td>
</tr>
<tr>
<td colspan="2" class="gray"> </td>
</tr>
</table>
<div class="body">
<h2 class="uaxtitle">Unicode® Standard Annex #57</h2>
<h1>Unicode Egyptian Hieroglyph Database (Unikemet)</h1>
<table class="simple" width="90%">
<tbody>
<tr>
<td valign="top" width="20%">Version</td>
<td valign="top">Unicode 17.0.0</td>
</tr>
<tr>
<td valign="top">Editors</td>
<td valign="top">Michel Suignard</td>
</tr>
<tr>
<td valign="top">Date</td>
<td valign="top">2025-07-31</td>
</tr>
<tr>
<td valign="top">This Version</td>
<td valign="top">
<a href="https://www.unicode.org/reports/tr57/tr57-5.html">https://www.unicode.org/reports/tr57/tr57-5.html</a></td>
</tr>
<tr>
<td valign="top">Previous Version</td>
<td valign="top">
<a href="https://www.unicode.org/reports/tr57/tr57-3.html">https://www.unicode.org/reports/tr57/tr57-3.html</a></td>
</tr>
<tr>
<td valign="top">Latest Version</td>
<td valign="top"><a href="https://www.unicode.org/reports/tr57/">https://www.unicode.org/reports/tr57/</a></td>
</tr>
<tr>
<td valign="top">Latest Proposed Update</td>
<td valign="top"><a href="https://www.unicode.org/reports/tr57/proposed.html">https://www.unicode.org/reports/tr57/proposed.html</a></td>
</tr>
<tr>
<td valign="top">Revision</td>
<td valign="top"><a href="#Modifications">5</a></td>
</tr>
</tbody>
</table>
<h4 style="margin-top: 1em;">Summary</h4>
<p><em>This document describes the organization and content of the Egyptian
Hieroglyph database.</em></p>
<h4 class="status">Status</h4>
<!-- NOT YET APPROVED
<p class="changed"><em>This is a<strong><font color="#ff3333"> draft </font></strong>document which may be updated, replaced, or superseded by other documents at any time. Publication does not imply endorsement by the Unicode Consortium. This is not a stable document; it is inappropriate to cite this document as other than a work in progress.</em></p>
END NOT YET APPROVED -->
<!-- APPROVED -->
<p><em>This document has been reviewed by Unicode members and other interested parties, and has been approved for publication by the Unicode Consortium. This is a stable document and may be used as reference material or cited as a normative reference by other specifications.</em></p>
<!-- END APPROVED -->
<blockquote>
<p><em><strong>A Unicode Standard Annex (UAX)</strong> forms an integral part of the Unicode Standard, but is published online as a separate document. The Unicode Standard may require conformance to normative content in a Unicode Standard Annex, if so specified in the Conformance chapter of that version of the Unicode Standard. The version number of a UAX document corresponds to the version of the Unicode Standard of which it forms a part.</em></p>
</blockquote>
<p><em>Please submit corrigenda and other comments with the online reporting
form [<a href="https://www.unicode.org/reporting.html">Feedback</a>].
Related information that is useful in understanding this annex is found in Unicode Standard Annex #41,
“<a href="https://www.unicode.org/reports/tr41/tr41-36.html">Common References for Unicode Standard Annexes</a>.”
For the latest version of the Unicode Standard, see [<a href="https://www.unicode.org/versions/latest/">Unicode</a>].
For a list of current Unicode Technical Reports, see [<a href="https://www.unicode.org/reports/">Reports</a>].
For more information about versions of the Unicode Standard, see [<a href="https://www.unicode.org/versions/">Versions</a>].
For any errata which may apply to this annex, see [<a href="https://www.unicode.org/errata/">Errata</a>].</em></p>
<h4 class="contents">Contents</h4>
<ul class="toc">
<li>1 <a href="#Introduction">Introduction</a></li>
<li>2 <a href="#Mechanics">Mechanics</a>
<ul class="toc">
<li>2.1 <a href="#DatabaseDesign">Database Design</a></li>
<li>2.2 <a href="#Unikemet.txt">Unikemet.txt</a></li>
</ul>
</li>
<li>3 <a href="#PropertyTypes">Property Types</a>
<ul class="toc">
<li>3.1 <a href="#CatalogIndex">Catalog Indexes</a></li>
<li>3.2 <a href="#Sources">Sources</a></li>
<li>3.3 <a href="#Description">Description</a></li>
<li>3.4 <a href="#Function">Function</a></li>
<li>3.5 <a href="#Core">Core</a></li>
<li>3.6 <a href="#MirroringRotation">Mirroring and Rotation</a></li>
<li>3.7 <a href="#OtherMappings">Other Mappings</a></li>
</ul>
</li>
<li>4 <a href="#Properties">The Properties</a>
<ul class="toc">
<li>4.1 <a href="#AlphabeticalListing">Alphabetical Listing</a></li>
<li>4.2 <a href="#ChronologicalListing">Listing by Version of Addition to the Unicode Standard</a></li>
</ul>
</li>
<li>5 <a href="#History">History</a></li>
<li><a href="#EncodingPrinciples">Appendix: Encoding Principles</a></li>
<li><a href="#References">References</a></li>
<li><a href="#Acknowledgements">Acknowledgements</a></li>
<li><a href="#Modifications">Modifications</a></li>
</ul>
<hr>
<h2>1 <a name="Introduction" href="#Introduction">Introduction</a></h2>
<p>The Unikemet database is the repository for the Unicode Consortium’s collective knowledge regarding the
Egyptian hieroglyphs contained in the Unicode Standard. It contains
ancillary data to help implement support for the Egyptian hieroglyphs. (The
term 'kemet' meant 'black land' in old Egyptian and was used as the official
name of their country.)</p>
<p>Formally, Egyptian hieroglyphs are defined within the Unicode Standard via their
names and assigned code points. However, while the first block: Egyptian
Hieroglyphs (U+13000..U+1342F) has character names based on the Gardiner
convention, the extended block: Egyptian Hieroglyphs Extended-A
(U+13460..U+143FF) use algorithmic names of the type EGYPTIAN
HIEROGLYPH-xxxxx where xxxxx is the 5-digit hexadecimal value of the code
point, therefore providing little information about the identity of the
character. The ancillary data provided by the database define additional
information such as a detailed description of the character, various
sources, catalog entries, and function. It also defines properties related
to these hieroglyphs, such as belonging to a Core set, whether they
rotate or not, and whether they mirror or not.</p>
<p>This document is a guide to that data, describing the mechanics of the Unikemet database, the nature of its contents, and the status of the various properties.</p>
<h2>2 <a name="Mechanics" href="#Mechanics">Mechanics</a></h2>
<h3>2.1 <a name="DatabaseDesign" href="#DatabaseDesign">Database Design</a></h3>
<p>The database consists of a number of fields containing data for each
Egyptian hieroglyph in the Unicode Standard. The fields, all of which correspond to properties, have names that consist entirely of ASCII letters and digits with no spaces or other punctuation except for underscore. For historical reasons, they all start with a lowercase <code>k</code>.</p>
<p>All data in the Unikemet database is stored in UTF-8 using Normalization Form C (NFC). Note, however, that the “Syntax” descriptions below, used for validation of property values, operate on Normalization Form D (NFD), primarily because that makes the regular expressions simpler.</p>
<h3>2.2 <a name="Unikemet.txt" href="#Unikemet.txt">Unikemet.txt</a></h3>
<p>Included with the [<a href="../tr41/tr41-36.html#UCD">UCD</a>] is a file
called <code>Unikemet.txt</code>. This is a snapshot of the public contents of
the Unikemet database as of the release date for this version of
the Unicode Standard.</p>
<p>The file is a single text
file, in UTF-8, NFC, and using Unix line endings which
contain the values for all properties in the Unikemet
database. Properties are described by categories in this document but
are nevertheless included in a single file (unlike, for example, the Unihan
database). </p>
<p>In this file, blank lines may be ignored; lines beginning with <code>#</code> are comment lines used to provide the header and footer. Each of the remaining lines is one entry, with three, tab-separated fields: the Unicode Scalar Value, the property name, and the value for the property for the given Unicode Scalar Value. For most of the properties, if multiple values are possible, the values are separated by spaces. No
hieroglyph may have more than one instance of a given property associated
with it, and no empty properties are included in <code>Unikemet.txt</code>.</p>
<p>There is no formal limit on the lengths of any of the property values. Any Unicode characters may be used in the property values except for
control characters (especially tab, newline, and carriage return). Note that
unlike Unihan, double quotes are allowed but are discouraged, and will likely
be removed in a future version.</p>
<p>The data lines are sorted by Unicode Scalar Value and property type as primary and secondary keys, respectively.</p>
<p>The file’s header includes a summary of the properties the file contains.</p>
<h2>3 <a name="PropertyTypes" href="#PropertyTypes">Property Types</a></h2>
<p>The data in the Unikemet database serves a multitude of purposes, and the properties are most conveniently grouped into categories according to the purpose they fulfill. We provide here a general discussion of the various categories, followed by a detailed description of the individual properties, alphabetically arranged.</p>
<!-- Section 3.1 -->
<h3>3.1 <a name="CatalogIndex" href="#CatalogIndex">Catalog index</a>es</h3>
<p>Two catalog indexes are defined: <tt>kEH_Cat</tt> and <tt>kEH_UniK</tt>.
The catalog index <tt>kEH_Cat</tt> is defined using a sign taxonomy based on a publication by Institut Français d’Archéologie Orientale (IFAO), see <a href="#kEH_IFAO">kEH_IFAO</a>. It is
written using a three-level classification: a group index, a sub-group
index, and an index within that sub-group. The higher level, the group index, is a combination of the
Gardiner A-Z (and Aa) classification and the IFAO chapter classification
(I to XXX in Roman notation). The second level uses the IFAO sub-chapter
classification already present in the IFAO publication. The third level is a
new index and just orders items sequentially within each sub-group. For
example, the catalog index 'A-01-001' represents the first element, designated by 001, of the
sub-group 'A-01'. The element 01 in 'A-01' represents the first sub-group of the group
'A'.</p>
<p>Within the group level, IFAO may include a few more items,
but these can
be easily mapped into existing Gardiner groups. For example, the IFAO groupings Gods
(Chapter III) and Goddesses (Chapter IV) can be combined in the Gardiner
group C (Anthropomorphic Deities). The following is the list of the first
level groups and their relationship with the IFAO groups:</p>
<table summary="Unikemet data base properties" border="1" cellpadding="2">
<tr>
<td bgcolor="#CCFFCC" width="50%">Gardiner groups </td>
<td bgcolor="#CCFFCC" width="50%">IFAO (translated from French)</td>
</tr>
<tr>
<td width="50%">A. Man and his occupations </td>
<td width="50%">I. Men and monarchs</td>
</tr>
<tr>
<td width="50%">B. Woman and her occupation</td>
<td width="50%">II. Women and monarchs</td>
</tr>
<tr>
<td width="50%">C. Anthropomorphic deities</td>
<td width="50%">III. Gods<br>IV. Goddesses</td>
</tr>
<tr>
<td width="50%">D. Parts of the human body</td>
<td width="50%">V. Human body parts</td>
</tr>
<tr>
<td width="50%">E. Mammals</td>
<td width="50%">VI. Mammals</td>
</tr>
<tr>
<td width="50%">F. Parts of mammals</td>
<td width="50%">VII. Mammal body parts</td>
</tr>
<tr>
<td width="50%">G. Birds</td>
<td width="50%">VIII. Birds</td>
</tr>
<tr>
<td width="50%">H. Parts of birds</td>
<td width="50%">IX. Bird parts</td>
</tr>
<tr>
<td width="50%">I. Amphibious animals, reptiles, etc.</td>
<td width="50%">X. Reptiles, amphibians</td>
</tr>
<tr>
<td width="50%">K. Fishes and parts of fishes</td>
<td width="50%">XI. Fishes and parts of fishes</td>
</tr>
<tr>
<td width="50%">L. Invertebrate and lesser animals</td>
<td width="50%">XII. Insects and arachnids</td>
</tr>
<tr>
<td width="50%">M. Trees and plants</td>
<td width="50%">XIII. Plants</td>
</tr>
<tr>
<td width="50%">N. Sky, earth, water</td>
<td width="50%">XIV. Sky, earth, water</td>
</tr>
<tr>
<td width="50%">O. Buildings, parts of buildings, etc.</td>
<td width="50%">XV. Edifices and parts of edifices</td>
</tr>
<tr>
<td width="50%">P. Ships and part of ships</td>
<td width="50%">XVI. Boats and parts of boat</td>
</tr>
<tr>
<td width="50%">Q. Domestic and funerary furniture</td>
<td width="50%">XVII. Everyday and funeral furniture</td>
</tr>
<tr>
<td width="50%">R. Temple furniture and sacred emblems</td>
<td width="50%">XVIII. Temple furniture</td>
</tr>
<tr>
<td width="50%">S. Crowns, dresses, staves, etc.</td>
<td width="50%">XIX. Crowns<br>XX. Jewels, clothes, staves</td>
</tr>
<tr>
<td width="50%">T. Warfare, hunting, butchery</td>
<td width="50%">XXII. Warfare, hunting, fishery, butchery</td>
</tr>
<tr>
<td width="50%">U. Agriculture, crafts, and professions</td>
<td width="50%">XXI. Agriculture and workshop tools</td>
</tr>
<tr>
<td width="50%">V. Rope, fiber, baskets, bags, etc.</td>
<td width="50%">XXIII. Rope, baskets, bags</td>
</tr>
<tr>
<td width="50%">W. Vessels of stone and earthenware</td>
<td width="50%">XXIV. Vases</td>
</tr>
<tr>
<td width="50%">X. Loaves and cakes</td>
<td width="50%">XXV. Bread loaves</td>
</tr>
<tr>
<td width="50%">Y. Writings, games, music</td>
<td width="50%">XXVI. Writings, games, music</td>
</tr>
<tr>
<td width="50%">Z. Strokes, signs derived from Hieratic,<br>geometrical figures</td>
<td width="50%">XXVII. Geometric shapes</td>
</tr>
<tr>
<td width="50%">AA. Unclassified</td>
<td width="50%">XXVIII. Ill-defined signs</td>
</tr>
</table>
<p>Notes:</p>
<ul>
<li>The order of the A-Z and I-XXVIII lists is identical, except for the
two groups XXI and XXII (which correspond to the groups U and T,
respectively).</li>
<li>IFAO Chapter XXIX (Uncertain
identity signs) and Chapter XXX (Conventional signs) are not used in the taxonomy because they are seldom used by other references.</li>
<li>Some characters
originally in the group ‘AA.XXVIII Unclassified Ill-defined signs’ have
been moved to other groups when their identity could be confirmed. Some
members originally in the IFAO group XXIX have also been reclassified.</li>
</ul>
<p>Because this catalog number is still a work in progress, its status is
provisional. </p>
<p>The <tt>kEH_UniK </tt>catalog index was originally defined exclusively
for the original Unicode Egyptian Hieroglyph block and is part of the
formal character name for these code points. This catalog index has been extended to cover
all newly encoded signs. The code points which
refer to the same Hieroglyphica and JSesh source
value use the prefix HJ
followed by a space and the common value between Hieroglyphica and JSesh
but zero padded to 3 digits. For
example, the catalog index for U+1346C is HJ A072A indicating that the
code point is
associated with the same Hieroglyphica and JSesh source value: A72A. New
entries not common to Hieroglyphica and JSesh were given new values
without a prefix. The main rationale for the catalog index
is to provide a Gardiner-like notation for all Egyptian hieroglyphs, which
is a feature requested by Egyptologists. A significant issue is that
the name space shared among the original Gardiner notation, the Unikemet
original catalog index, Hieroglyphica and JSesh values has many
collisions. For example, U+1304E has A71 as sources for both
Hieroglyphica and JSesh, but was assigned to A069 in the original block. In comparison,
U+1346A in the extended block has A69 as sources for
both Hieroglyphica and JSesh. To avoid an apparent name collision, the
catalog index for this character is not HJ A069, but A069A. Therefore, the notation
'HJ' is only used for new characters when the common Hieroglyphica and
JSesh source values do not collide with <tt>kEH_UniK</tt> values used in
the original block.</p>
<!-- Section 3.2 -->
<h3>3.2 <a name="Sources" href="#Sources">Sources</a></h3>
<p>Sources are among the normative parts of the Unikemet database, and
refer to some well-known Egyptian hieroglyphs collections. These
sources are defined as <tt>kEH_HG</tt>, the Hieroglyphica classification, <tt>kEH_JSesh</tt>, the JSesh index, and
<tt>kEH_IFAO</tt>, the IFAO entries. While these values are normative, they are not
immutable. Some values may be a matter of interpretation or may contain errors.
Many of these sources only use glyphic evidence, don't refer to
the original paleographic attestations, and don't provide a formal description of the
referred sign.</p>
<p>Detailed descriptions of the syntax used for these sources are to be found in <a href="#AlphabeticalListing">Section 4.1</a>, <em>Alphabetical Listing</em>, below.</p>
<!-- Section 3.3 -->
<h3>3.3 <a name="Description" href="#Description">Description</a></h3>
<p>While the description <tt>kEH_Desc</tt> is only informative, it is an essential
part of the identity of an Egyptian hieroglyph. Because many attestations of
these signs are imprecise, due to the imperfect preservation of the original
evidence, Egyptologists had to come to a rough consensus on how to describe
the abstract form of these signs as precisely as possible . While this
description
still allows variation in the font style used for their representation, it is
expected that all these variants will adhere to the description as stated by
this property. Due to the complexity of some of these signs, the description
can be a rather long expression.</p>
<p>For example, the description for U+13A6E reads as follows:</p>
<p> 'A ram
(Ovis longipes palaeo-aegyptiacus), standing, without a beard, with a
cobra (Naja haja), standing up, with expanded hood (Uraeus)(I64) on its
head, with the wings of a bird on its back, spread in a v-shape.' </p>
<p>Note that the description currently uses the Hieroglyphica/JSesh
references in many of these descriptions to designate another sign included
in the sign. The example above, 'I64' refer to U+13D79
which is itself described as 'A cobra (Naja haja), standing up, with
expanded hood (Uraeus)'. Because Hieroglyphica and JSesh do not always
coincide, in case of differences, the JSesh reference prevails.</p>
<!-- Section 3.4 -->
<h3>3.4 <a name="Function" href="#Function">Function</a></h3>
<p>The function type <tt>kEH_Func</tt> and its corresponding function value <tt>kEH_FVal</tt> are
only provisional; they are still a work in
progress. All signs are expected to have a function type representing either
a pictogram, a logogram, a phonogram (or “phonemogram”), a classifier (or “determinative”),
phono-repeater, a radicogram, or an interpretant. The function type also includes a function value with transliterated
text.</p>
<p>The following text defines the function types:</p>
<ul>
<li>Pictogram – pictorial symbol. It typically has no pronunciation.</li>
<li>Logogram – sign that represents a word in the Egyptian
language. As such it has a pronunciation and a meaning.</li>
<li>Phonogram or phonemogram – sign that represents a sound in the writing system. It does
not carry a semantic value. Strictly speaking, if we make a distinction
between phonetics and phonology, the term phonemogram would be preferred
to denote a phonological concept, but the two terms tend to be used
interchangeably.</li>
<li> Classifier – sign written at the end of words
that indicates the semantic category to which the respective word belongs.
As such it is always mute. It is traditionally called a determinative.</li>
<li> Phono-repeater – a sub-category of classifier which has a phonetic
meaning.</li>
<li> Radicogram – graphemes that both point to some form and
some content, but are not able to refer to an autonomous lexeme alone.</li>
<li> Interpretant – non-autonomous graphemes that interpret the phonemic
values of other semograms or phonograms. </li>
</ul>
<p>The function value uses the transliteration format convention that is commonly
known as the Gardiner 1957 convention. This convention already appears in the names list
annotations
of the original Egyptian Hieroglyph block. The transliteration format uses the
following letters: ꜣ, ꞽ, y, ꜥ, w, b, p, f, m, n, r h, ḥ, ḫ, ẖ, s, š, ḳ, k,
g, t, ṯ, d, ḏ. The initial letter may be capitalized to indicate a proper
noun, following <a href="https://thesaurus-linguae-aegyptiae.de"><em>TLA</em></a> practice.
The value may also contain additional punctuation for optional part,
alternative, semantic element, etc. This will be developed in future versions of this document.</p>
<p>A single hieroglyph may have multiple function types. At the moment,
most of the hieroglyphs have a single documented type, but in reality many
of them have multiple types. For example, the fact that a sign have a
given documented function type and a variant a different documented
function type should be interpreted as the
base sign using these two function types (or more), and not as a
discrepancy. </p>
<!-- Section 3.5 -->
<h3>3.5 <a name="Core" href="#Core">Core</a></h3>
<p>The provisional property <tt>kEH_Core</tt> determines whether an Egyptian hieroglyph is part
of a 'Core' set. The 'Core' set is a curated subset of characters from the
full Egyptian hieroglyph encoded set. It is the recommended set for
Egyptologists and should be implemented in widely used fonts. The Core set
represents the opinion of experts who reviewed the
evidence that was provided to them. (The same
group reviewed the full set.) This set is similar to UnihanCore2020 for CJK, which is
the minimal set of required ideographs for East Asia. For a description of
the selection process for the Core set by the Egyptologists involved, see
the “Principles” Appendix. Characters in the Core set were verified by an
image in photographs and trustworthy facsimiles. Transcription (a hand-drawn
sketch of a sign) alone was not normally considered to be verified
evidence. Images from hieratic texts could be considered if the hieroglyphic
nature of the sign could be easily reconstructed (cursive hieroglyphs).
Possible values for this enumerated property are 'C' for Core, 'L' for Legacy, and 'N' for None.
The Legacy value is used primarily for code points located in the Egyptian
Hieroglyphs block (U+13000..U+1342F) to denote that these characters may be
present in fonts for legacy reasons, but that their usage is discouraged. The 'None'
value is used in the new Egyptian Hieroglyphs Extended-A block
(U+13460..U+143FF) to denote that the code points with that property value
are not fully attested, but may eventually become part of the 'Core' set.</p>
<p>The following are the exceptions to the requirement for verification:</p>
<ul>
<li>the sign appears in the Unicode 5.2 repertoire, </li>
<li>the sign could not be
verified and could not be constructed using an overlay
or insertion mechanism.</li></ul>
<p>While the property is provisional, the eventual intent is to make it normative in a future
version of this document.</p>
<h3>3.6 <a name="MirroringRotation" href="#MirroringRotation">Mirroring and Rotation</a></h3>
<p>The properties <tt>kEH_NoMirror</tt> and <tt>kEH_NoRotate</tt> indicate specific and rare
behavior for some Egyptian hieroglyphs. </p>
<p>Most Egyptian hieroglyphs are expected to mirror relative to the
reading direction. For example, for asymmetrical 'faces', the face is
expected to face the start of the text, whether the line runs RTL or LTR.
However, the format control character U+13440 EGYPTIAN HIEROGLYPH MIRROR
HORIZONTALLY can be used to mirror a sign for aesthetical reasons.
Mirroring is based on the line direction, and the use of this formatting
character is independent of any mirroring produced by changing the base
direction of the text. In very rare
cases, the sign has a fixed orientation concerning mirroring. For example,
U+130BB and U+130BD are an apparent set of
mirrored walking legs. However, these two signs indicate opposite walking
directions. In these rare cases, U+13340 should not be used; the
separately encoded characters should be used instead. To indicate this scenario,
the property value <tt>kEH_NoMirror</tt> is set to 'Y'
for signs related to these cases.</p>
<p>Similarly, most Egyptian hieroglyphs can be rotated without changing
their meaning. Because these rotations are a common occurrence, variation
selectors should be used to represent these alternate representations.
However, there are some signs where the rotation is significant and
therefore, they cannot be rotated. In these rare cases, the property value
<tt>kEH_NoRotate</tt> will be set to 'Y'.</p>
<h3>3.7 <a name="OtherMappings" href="#OtherMappings">Other Mappings</a></h3>
<p>The value for the property in this category: <tt>kEH_AltSeq</tt> describes alternate sequences for encoded signs. The sequence may
be a single code point for variants or multiple code points for others, such
as mirrored, rotated and compound signs. While it is mostly used for 'Legacy' signs (kEH_Core property set to 'L'),
it may also be defined for other signs.</p>
<h2>4 <a name="Properties" href="#Properties">The Properties</a></h2>
<p>We now give two listings of the properties in the Unikemet database. The first is an alphabetical listing, with information on the property contents and syntax.
The second is a listing of the properties by the version of the Unicode Standard in which they were first introduced.</p>
<h3>4.1 <a name="AlphabeticalListing" href="#AlphabeticalListing">Alphabetical Listing</a></h3>
<p>For each property we give the following information in the alphabetical listing: its <em>Property</em> tag, its Unicode <em>Status</em>, its <em>Category</em> as
defined above, the Unicode version in which it was <em>Introduced</em>, its <em>Delimiter</em>, its <em>Syntax</em>, and its <em>Description</em>.</p>
<p>The <em>Property</em> name is the tag used in the Unikemet database to mark instances of this property.</p>
<p>The Unicode <em>Status</em> is either <em>Normative</em>, <em>Informative</em>, or <em>Provisional</em>, depending on whether it is a normative part of the standard,
an informative part of the standard, or neither. We may also include <em>Deprecated</em> as a Unicode Status if the property is no longer to be used.</p>
<p>Most of the properties which allow multiple property values have a <em>Delimiter</em> defined as “space” (<code>U+0020</code> <span class="name">SPACE</span>).
Properties which do not have multiple property values have this defined as “N/A.” Some properties do not currently have multiple values in the data but may do so in the future.</p>
<p>For most properties with multiple values, the order of the values is arbitrary and has no particular significance. The most common order in such cases is alphabetical
or numerical.</p>
<p>Because the property <tt>kEH_Func</tt> describing the function
type may correspond to multiple types and may have also multiple values,
the syntax is more complex. If there are multiple types, the types are
separated by '/', but in most cases they share the same value. Multiple
values are typically separated by either '/' or '|'; the "space" cannot be
used because it may be part of a value field. Note that this is a work a
progress, it denotes the current status among Egyptologists and may evolve
over time. Note, however, that the vast majority of Egyptian hieroglyphs have
a single function type and a single function value.</p>
<p>Validation is done as follows: The entry is split into subentries using the <em>Delimiter</em> (if defined), and each subentry converted to Normalization Form D (NFD). The value is valid if and only if each normalized subentry matches the property’s <em>Syntax</em> regular expression. Note that any given property’s <em>Syntax</em> is not guaranteed to be stable and may change in the future.</p>
<p>Finally, the <em>Description</em> contains not only a description of what the property contains, but also source information, known limitations, methodology used in deriving the data, and so on.</p>
<p>The properties covered in the table are:
<a href="#kEH_Cat">kEH_Cat</a>,
<a href="#kEH_Core">kEH_Core</a>,
<a href="#kEH_Desc">kEH_Desc</a>,
<a href="#kEH_Func">kEH_Func</a>,
<a href="#kEH_FVal">kEH_FVal</a>,
<a href="#kEH_HG">kEH_HG</a>,
<a href="#kEH_IFAO">kEH_IFAO</a>,
<a href="#kEH_JSesh">kEH_JSesh</a>,
<a href="#kEH_NoMirror">kEH_NoMirror</a>,
<a href="#kEH_NoRotate">kEH_NoRotate</a>,
<a href="#kEH_Unik">kEH_UniK</a>, and <a href="#kEH_AltSeq">kEH_AltSeq</a>.</p>
<!-- START MAIN TABLE -->
<!-- kEh_Cat -->
<table summary="kEH_Cat" border="1" cellpadding="2" width="100%">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%"><a id="kEH_Cat"><strong>kEH_Cat</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Informative</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%">Catalog Indexes</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">16.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Syntax</td>
<td width="90%">([A-IK-Z]|AA)-\d{2}-\d{3}</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Default</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">Catalog entry corresponding to the IFAO-based taxonomy.
This index uniquely identifies all Egyptian hieroglyphs.</td>
</tr>
</table><br>
<!-- kEH_Core -->
<table summary="kEH_Core" border="1" cellpadding="2" width="100%">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%"><a id="kEH_Core"><strong>kEH_Core</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Provisional</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%">Core</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">16.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%" style="height: 46px">Syntax</td>
<td width="90%">C|L|N</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%" style="height: 46px">Default</td>
<td width="90%">N</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">This enumerated property determines whether an Egyptian hieroglyph is part of
the 'Core' set (value 'C'), Legacy (value 'L') or None (value 'N'). The Legacy value is
primarily used for hieroglyphs in the
original Egyptian Hieroglyphs block but which are not part of the Core Set.</td>
</tr>
</table><br>
<!-- kEH_Desc -->
<table summary="kEH_Desc" border="1" cellpadding="2" width="100%">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%"><a id="kEH_Desc"><strong>kEH_Desc</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Informative</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%">Description</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">16.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Syntax</td>
<td width="90%">[^\t"]+</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Default</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">Detailed description of the appearance of the
hieroglyph. It can be any Unicode character, except for control
characters. Note that the text may use Gardiner syntax based source
references which are being converted to kEH_Unik values to insure
uniqueness. This is still a work in progress and will be improved in the
next revision.</td>
</tr>
</table><br>
<!-- kEH_Func -->
<table summary="kEH_Func" border="1" cellpadding="2" width="100%">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%"><a id="kEH_Func"><strong>kEH_Func</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Provisional</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%">Function</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">16.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">/ (see description)</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Syntax</td>
<td width="90%">[^\t"]+</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Default</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">All signs are expected to have a function type
representing a pictogram, a logogram, a phonemogram (or “phonogram”), a classifier (or “determinative”), a
phono-repeater (sub-category of classifier), a radicogram or
interpretant. It can be any Unicode character, except for control
characters. Some types such as logogram have an English description,
while others such as phonemogram typically do not. Most signs have a
single type, but some have multiple types (separated by '/'). Sometimes
additional context may be included in the type description, including
transliterated text. This text can also use '/' to denote alternative
description. Finally, while some signs are clearly attested, their type
is uncertain, unknown, or undocumented as yet. That uncertainty is
mentioned in the text itself.</td>
</tr>
</table>
<br>
<!-- kEH_FVal -->
<table summary="kEH_FVal" border="1" cellpadding="2" width="100%">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%"><a id="kEH_FVal"><strong>
kEH_FVal</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Provisional</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%">Function</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">16.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">/ or | (see description)</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Syntax</td>
<td width="90%">
<span>[BDF-HJKMNPR-TWY-bdf-hjkmnpr-twy\.,\/\-\+=;?>\&\(\)\{\}\s\x{303}\x{30C}\x{323}\x{32E}\x{331}\x{A722}x{A723}\x{A724}\x{A725}\x{A7BC}\x{A7BD}]+</span> </td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Default</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">All signs are expected to have a function value
corresponding to their function type. The value is expressed using the
Gardiner 1957 convention for Egyptian hieroglyph transliteration. The
delimiters '/' or '|' are used to separate alternative values, while
other punctuations may represent syntax
elements, optional values, etc. The current value field represents a draft version, as work is still in progress and will be refined, based
on feedback. Some signs still do not have a function value but are
expected to be documented in the future.</td>
</tr>
</table><br>
<!-- kEH_HG -->
<table summary="kEH_HG" border="1" cellpadding="2" width="100%">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%"><a id="kEH_HG"><strong>kEH_HG</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Normative</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%">Sources</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">16.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">space</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Syntax</td>
<td width="90%">([A-IK-Z]|AA)\d{1,3}[A-Za-z]{0,2}</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Default</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">Hieroglyphica source as specified in <em>Hieroglyphica –
Sign List</em>, Nicholas Grimal, Jochen Hallof, Dirk van der Plas, 2nd
edition, 2000. Multiple Hieroglyphica entries could
be assigned to the same code point.</td>
</tr>
</table><br>
<!-- kEH_IFAO -->
<table summary="kCCCII" border="1" cellpadding="2" width="100%">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%"><a id="kEH_IFAO"><strong>kEH_IFAO</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Normative</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%">Sources</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">16.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">space</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%" style="height: 3px">Syntax</td>
<td width="90%" >\d{1,3},\d{1,2}[ab]?</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Default</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">IFAO source value defined as page number and order in
that page, separated by a comma. IFAO is defined as <em>Catalogue de la
fonte hiéroglyphique de l’imprimerie de l’I.F.A.O.</em>, Institut Français
d’Archéologie Orientale du Caire, 1983, IF607, SEVPO, Paris, France. Multiple
IFAO entries could
be assigned to the same code point.</td>
</tr>
</table><br>
<!-- kEH_JSesh -->
<table summary="kEH_JSesh" border="1" cellpadding="2" width="100%">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%"><a id="kEH_JSesh"><strong>kEH_JSesh</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Normative</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%">Sources</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">16.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">space</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Syntax</td>
<td width="90%">([A-IK-Z]|Aa|NL|NU|Ff)\d{1,3}[A-Za-z]{0,5}<br>
|(US1|US22|US248|US685)([A-IK-Z]|Aa|NL|NU)\d{1,3}[A-Za-z]{0,5}</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Default</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">JSesh source as specified in Rosmorduc, Serge. (2014).
JSesh Documentation. [Online, version 7.5.5] Available at:
<a href="http://jseshdoc.qenherkhopeshef.org">
http://jseshdoc.qenherkhopeshef.org</a> [Accessed Feb 23rd 2021].
Current version is 7.6.1 as of October 4th 2023, and sources values may
have to be updated accordingly. Multiple JSesh entries could
be assigned to the same code point.</td>
</tr>
</table><br>
<!-- kEH_NoMirror -->
<table summary="kEH_NoMirror" border="1" cellpadding="2" width="100%">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%"><a id="kEH_NoMirror"><strong>kEH_NoMirror</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Normative</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%">Mirroring and Rotation</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">16.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Syntax</td>
<td width="90%">Y|N</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Default</td>
<td width="90%">N</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">It determines whether an Egyptian hieroglyph does not
mirror. Note the reverse property because by default, most hieroglyphs
can be mirrored depending on the reading direction.</td>
</tr>
</table><br>
<!-- kEH_NoRotate -->
<table summary="kEH_NoRotate" border="1" cellpadding="2" width="100%">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%"><a id="kEH_NoRotate"><strong>kEH_NoRotate</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Normative</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%">Mirroring and Rotation</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">16.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Syntax</td>
<td width="90%">Y|N</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Default</td>
<td width="90%">N</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">It determines whether an Egyptian hieroglyph does not
rotate. Note the reverse property because by default, most hieroglyphs
can be rotated without affecting their meaning.</td>
</tr>
</table><br>
<!-- kEH_UniK -->
<table summary="kEH_UniK" border="1" cellpadding="2" width="100%">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%"><a id="kEH_Unik"><strong>kEH_UniK</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Provisional</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%">Catalog Indexes</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">16.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Syntax</td>
<td width="90%">([A-IK-Z]|AA|NL|NU)\d{3}[A-Z]{0,2}<br>|HJ ([A-IK-Z]|AA)\d{3}[A-Z]{0,2}</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Default</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">Original Unikemet catalog index used by the Egyptian
Hieroglyph block, augmented for the extended blocks. This index uniquely
identifies all Egyptian hieroglyphs.</td>
</tr>
</table>
<br>
<!-- kEH_AltSeq -->
<table summary="kEH_AltSeq" border="1" cellpadding="2" width="100%">
<tr>
<td bgcolor="#FFFF99" width="10%">Property</td>
<td bgcolor="#CCFFCC" width="90%"><a id="kEH_AltSeqk"><strong>kEH_AltSeq</strong></a></td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Status</td>
<td width="90%">Provisional</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Category</td>
<td width="90%">Other Mappings</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Introduced</td>
<td width="90%">17.0</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Delimiter</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Syntax</td>
<td width="90%">[0-9A-F]{5}(\s[0-9A-F]{4,5})*</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Default</td>
<td width="90%">N/A</td>
</tr>
<tr>
<td bgcolor="#FFFFCC" width="10%">Description</td>
<td width="90%">Alternate sequence for some Egyptian hieroglyphs. For
variants, it is a single Egyptian Hieroglyph code point. For other signs
(mirrored, rotated, and compound), it is a sequence of code points
combining Egyptian hieroglyphs and Egyptian Format controls. While this
property is mostly used for 'Legacy' signs (kEH_Core property set to
'L'), it may also be defined for others, especially when these signs
have been encoded as single code points despite being compound signs.</td>
</tr>
</table>
<br>
<!-- Chronological Listing -->
<h3>4.2 <a name="ChronologicalListing" href="#ChronologicalListing">Listing by Version of Addition to the Unicode Standard</a></h3>
<p>The table below lists the properties of the Unikemet database by the version of the Unicode Standard in which they were first added.</p>
<table summary="Properties added or dropped" border="1" cellpadding="2" width="100%">
<tr>
<td bgcolor="#CCFFCC">Version</td>
<td bgcolor="#CCFFCC">Properties Added</td>
<td bgcolor="#CCFFCC">Properties Removed</td>
</tr>
<tr>
<td>16.0</td>
<td>
<a href="#kEH_Cat">kEH_Cat</a>,
<a href="#kEH_Core">kEH_Core</a>,
<a href="#kEH_Desc">kEH_Desc</a>,
<a href="#kEH_Func">kEH_Func</a>,
<a href="#kEH_FVal">kEH_FVal</a>,
<a href="#kEH_HG">kEH_HG</a>,
<a href="#kEH_IFAO">kEH_IFAO</a>,
<a href="#kEH_JSesh">kEH_JSesh</a>,
<a href="#kEH_NoMirror">kEH_NoMirror</a>,
<a href="#kEH_NoRotate">kEH_NoRotate</a>,
<a href="#kEH_Unik">kEH_UniK</a>
</td>
<td></td>
</tr>
<tr>
<td>17.0</td>
<td>
<a href="#kEH_AltSeq">kEH_AltSeq</a>
</td>
<td></td>
</tr>
</table>
<!-- 5 -->
<h2>5 <a name="History" href="#History">History</a></h2>
<p>The Unikemet database originated as a concept proposed by the original
Egyptian Hieroglyph proposal (ISO/IEC JTC1/SC2/WG2 N3237 =L2/07-097) as
an appendix to that document but never materialized as a true dataset. It
contained original source references which have been partly superseded by
this version. It should also be noted that N3237 is not 100% identical to
what was eventually adopted by ISO and Unicode and was not updated to
reflect the final code point values.</p>
<h2> <a name="EncodingPrinciples" href="#EncodingPrinciples">Appendix: Encoding Principles</a></h2>
<h3>General Principles</h3>
<ol>
<li>Hieroglyphic signs almost always have a number of different shapes that provide a visual contextualization of
the text in which the sign is used. They inform the ancient (and modern) reader about the intentions and the
mindset of the author. As such, the shapes are culturally relevant (chronological, geographical, social,
technological, botanical, biological, religious) and can also be linguistically relevant. Different shapes may
even be used within a single text. Included in such a Full List are the discrete differences that are clearly
identifiable and allow to differentiate one shape from another. The Full List (extended repertoire) may
grow as more evidences are identified. The Full List contains Core characters, Legacy characters (signs encoded
in Unicode v5.2 which are not Core characters), and other non-Core characters encoded in the extended block(s).</li>
<li>If shapes build a “continuum” or a cluster of sign variants and cannot be clearly differentiated, only a limited
number of shapes will be included (e.g. a man bowing down/inclining his back will be limited to partial
inclination U+13013 = A016 <img width="12px" height="14px" src="images/A16.svg" alt="man bowing down">) and a deep 90° bow
(<img width="14px" height="14px" src="images/A87.svg" alt="man deep bowing down"> U+134AA = HJ A087). “Chubby” or
“slender” signs are part of this continuum and will not be differentiated.</li>
<li>Signs with a large number of repetitive elements (e.g. the waves of
<img width="28px" height="14px" src="images/N35.svg" alt="water">, the number of loops of a snake tail
<img width="21px" height="14px" src="images/I15.svg" alt="snake 2 coils">, besides
<img width="21px" height="14px" src="images/I15B.svg" alt="snake 3 coils">,
<img width="28px" height="14px" src="images/I15C.svg" alt="snake 4 coils">,
<img width="35px" height="14px" src="images/I15D.svg" alt="snake 5 coils">, ...,
<img width="56px" height="14px" src="images/I15F.svg" alt="snake 7 coils"> will be limited to few selected examples based on
philologists’ judgment. At this moment, among these cobras, only the 2-coil, 3-coils, and the 4-coil versions are
encoded, respectively as U+1319A (Core), U+13DB7 (Core), and U+13DC6 (non Core).</li>
<li>Seemingly identical signs which refer to different realities and to different functions/words will be
differentiated and included (e.g. circular signs like the pupil of the eye, the pellet of sand, the geometrical
circle, the tambourine, the ring, the hole in the ground).</li>
<li>Signs that exist with and without inner detailing will be included with the inner detailing
(albeit sometimes reduced to “essential” details). This inner detailing is considered a relevant part of the sign
and helps in avoiding confusion (e.g. the sun disk with and without an inner circle will always get the inner
circle: U+131F3 = N005 <img width="14px" height="14px" src="images/N5.svg" alt="sun disk">; the Sed festival chapel
U+133B3 = W004 <img width="14px" height="14px" src="images/W4.svg" alt="Sed festival chapel"> will always have the
rhomb/diamond/lozenge at the bottom). If a shape is only attested in a source that provides only an outline
of the hieroglyph, necessary detailing may be added as far as appropriate.</li>
<li>Differences only based on color (e.g. a yellow, red, green sun disk in some royal tombs from
the Valley of the Kings) will not be included.</li>
<li>Not every detail is relevant (e.g. armlets and anklets on a person are usually irrelevant,
whereas the type of clothing might be relevant [simple loincloth vs. loincloth with triangular protrusion;
short vs. long loincloth, dress of the vizier, Ramesside courtier dress]; the number of arrows the soldier
U+1300E = A012 <img width="14px" height="14px" src="images/A12.svg" alt="soldier">is holding is irrelevant,
beside the variation between a single arrow or plurality of arrows). Philologists need to judge in
which cases details might be relevant (e.g. a quiver in A012 might be relevant).</li>
<li>The proposed characters are not intended to reflect fine paleographic detail, e.g.
<img width="21px" height="14px" src="images/E34.svg" alt="desert hare"> vs.
<img width="15px" height="10px" src="images/E34D.svg" alt="desert hare, smaller"> (signs of desert hares only
different in sizes). In such cases, users should instead rely on images or facsimile.</li>
<li>Signs are not eligible for separate inclusion in the proposed repertoire if they can be
constructed by a base sign and overlay and/or insertion of another sign(s), or mirroring and/or rotation of
a sign already in the repertoire. For description and examples of overlay, insertion etc., please see
<a href="https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-11/#G32499">Unicode Core Spec,
Egyptian Hieroglyphs, Format Controls</a>. This set includes common variants like the mirrored wickerwork basket
<img width="28px" height="14px" src="images/V31A.svg" alt="wickerwork basket handle to the front">
versus the regular <img width="28px" height="14px" src="images/V31.svg" alt="wickerwork basket handle to the front">, or
<img width="14px%" height="14px" src="images/F51H.svg" alt="three pieces of flesh"> which is three stacked pieces of
flesh <img width="14px" height="14px" src="images/F51.svg" alt="flesh">, etc.
(Note that these two “legacy” variants were encoded before these principles were adopted and are not included in
the Core list).</li>
<li>Also excluded from the full list are signs that can be constructed with common signs such as
<em>ḥw.t</em> (rectangular enclosure) <img width="14px" height="14px" src="images/O6.svg" alt="rectangular enclosure">,
<em>srḫ</em> (palace façade)<img width="21px" height="14px" src="images/O33C.svg" alt="palace facade">, etc.
However, exceptions can be made for signs which are widely used, such as the logogram for the Hathor divinity
U+13261 O10 <img width="18px" height="14px" src="images/O10.svg" alt="Hathor divinity">.</li>
<li>Sequences are added as atomic characters when the functions of the component parts do not
correspond to the function of a sequence. For example,
<img width="14px" height="14px" src="images/M14.svg" alt="cobra over papyrus stem with a bud"> (U+131C6 phonemogram
<em>wꜣḏ</em>) is a functionally explainable overlap, as the components correspond to
<img width="14px" height="14px" src="images/I10.svg" alt="cobra"> (U+13193 cobra, phonemogram
<em>wꜣḏ</em>) and <img width="14px" height="14px" src="images/M13.svg" alt="papyrus stem with a bud">
(U+131C5 papyrus stem with a bud, phonemogram <em>ḏ</em>). This sign was encoded before these principles
were adopted and consequently is not included in the Core list. On the other hand,
<img width="14px" height="14px" src="images/D57.svg" alt="bent leg with a knife over"> U+130BF, a classifier for
“damage, injury” (<em>nkn</em>), is not a functionally explainable overlap, as the components correspond to
<img width="14px" height="14px" src="images/D56.svg" alt="bent leg"> classifier <em>rd
(</em>leg) and
<img width="16px" height="14px" src="images/T30.svg" alt="knife"> classifier <em>ds
(</em>cutting). So, the latter is included
as an atomic character. Note that Graeco-Roman standards were applied to determine attestations of the
components function.</li>
<li>If a sequence can be graphically created, but there is evidence that the sign is considered a
single entity in any of the Ancient Egyptian scripts (outside ligatures in cursive scripts), such as
<img width="17px" height="14px" src="images/N35A.svg" alt="three ripples of water"> U+13217, a monogram for <em>mw</em>,
the sequence is added atomically.</li>
</ol>
<h3>Additions to the Core list from the characters currently encoded in Unicode since v5.2</h3>
<p>A sign from the Unicode v5.2 repertoire is added to the Core list if there is no conflict created
by adding the sign, even though it might not be verified. A conflict would e.g. arise if the sign could be created
as an overlay or an insertion. (An example of a conflict would be
<img width="17px" height="14px" src="images/F50.svg" alt="piece of cloth over intestine"> U+13138 which should have been encoded as
<img width="17px" height="14px" src="images/F46.svg" alt="intestine">
<img width="14px" height="14px" src="images/Uni13436.svg" alt="overlay">
<img width="16px" height="14px" src="images/S29.svg" alt="piece of cloth">
U+13132 U+13436 U+132F4 if following these principles.)
Consequently, that character is not in the Core list.</p>
<h3>Additions to the Core list from the Full list</h3>
<p>To be added to the Core list from the Full list, a sign needs to be verified with an image. Images
(photographs and trustworthy facsimiles) originate from carved, painted or written hieroglyphic texts, and cursive
hieroglyphic texts. Images from hieratic texts will only be considered if the hieroglyphic nature of the sign can
easily be reconstructed (cursive hieroglyphs).</p>
<p>If verified, the sign is included in both the Core and Full list, no matter the priority, or origin.
Exceptions are made for a sign that appears in verified sequences that require the addition of one of the parts of the
sequence into the Core list, even though there is not yet an attestation of the sign on its own, e.g.
U+137CB <img width="14px" height="14px" src="images/C337.svg" alt="falcon headed god with a star above">Logogram (god)
<em>ntr</em> in order to compose
<img width="14px" height="14px" src="images/O372.svg" alt="falcon headed god with a star above within an enclosure"> can then be encoded as
<img width="14px" height="14px" src="images/O6.svg" alt="rectangular enclosure">
<img width="14px" height="14px" src="images/Uni13439.svg" alt="insertion">
<img width="14px" height="14px" src="images/C337.svg" alt="falcon headed god with a star above">: U+13257 U+13439 U+137CB.
(A sign within the sequence counts for the existence of the sign.)</p>
<p>Signs with a lot of variant forms need to be checked as a group (<em>šzp</em>,
<em>ẖr.t-nṯr</em> etc.) by philologists/specialists. Certain variants of such signs were excluded from the
Core list based on manual review.</p>
<p>For the sake of standardization (for example: seated god with sun-disk and seated god with sun-disk
with uraeus), an unverified sign might be required to be added to the Core list.</p>
<h3>Inclusion in the Full List</h3>
<p>To be automatically included in the Full list, the sign should have three or more attestations in
printed or manuscript sign lists, in published computer-generated books like Dendara X, Esna VII and Athribis, in
<a href="https://thotsignlist.org]"><em>Thot Sign List</em></a>, <a href="http://ramses.ulg.ac.be"><em>Ramses</em></a>,
<a href="https://thesaurus-linguae-aegyptiae.de"><em>TLA</em></a>, <a href="https://aku-pal.uni-mainz.de"><em>Karnak text
database, AKU-PAL</em></a>, and the sign has a <a href="https://zenodo.org/records/5849135"><em>JSesh</em></a> entry
and/or Hieroglyphica number. Verification with an actual image is not a requirement.</p>
<p>Manual additions to the full list can be made based on common use by Egyptologists (see Leitz,
Kurth or H&S sign lists), using a manual override.</p>
<p>Transcription (e.g. a hand-drawn sketch of a sign) alone is not considered to be evidence for
verification for the Core list, but if the other principles are covered, the sign might still be added to the Full list.</p>
<h3>Missing signs</h3>
<p>For many of the signs that may be considered missing,
no evidence has yet been found that these signs have been listed or mentioned in existing sign lists (i.e., images are lacking or the quality of the images at our
disposal was not sufficient to decide). If users find a meaningful variant that is not in the list and they can provide
evidence of the sign (photo or facsimile of an ancient source with precise identification of the monument), please make
a submission to Thot Sign List (thotsignlist@gmail.com), including the evidence and full bibliographic information so
it can be reviewed. Eligible signs will be added to a future Unicode proposal and refer to the contributors.</p>
<h2 class="nonumber"><a name="References" href="#References">References</a></h2>
<p>[Dendara X to XV] Le Temple de Dendara, Les chapelles Osiriennes, Sylvie Cauville, Institut Français d’Archéologie Orientale, Tomes X to XV, Le Caire, 1997-2012
(composition hiéroglyphique Jochen HALLOF et Hans VAN DEN BERG). X: ISBN 2-7247-0199-2, XI: ISBN 2-7247-0279-4, XII: ISBN 978-2-7247-0460-0. Online versions available
at: <a href = "https://www.ifao.egnet.net/publications/catalogue/">https://www.ifao.egnet.net/publications/catalogue</a>.</p>
<p>[Esna I to VIII] Le Temple d’Esna, Serge Sauneron, Institut Français d’Archéologie Orientale. Le Caire, 1934-2009 (composition hiéroglyphique Jochen HALLOF
for vol VII). Online versions available at: <a href = "https://www.ifao.egnet.net/publications/catalogue/">https://www.ifao.egnet.net/publications/catalogue/</a>.</p>
<p>[Hieroglyphica] Hieroglyphica – Sign List, Nicholas Grimal, Jochen Hallof, Dirk van der Plas, 2nd edition, 2000.</p>
<p>For other references for this annex, see Unicode Standard Annex #41, “<a href="https://www.unicode.org/reports/tr41/tr41-36.html">Common References for Unicode Standard Annexes</a>.”</p>
<h2><a name="Acknowledgements" href="#Acknowledgements">Acknowledgements</a></h2>
<p>This new database is the result of a collective work by many
Egyptologists and is still a work in progress.</p>
<h2><a name="Modifications" href="#Modifications">Modifications</a></h2>
<p>The following summarizes modifications from the previous revision of this annex.</p>
<h3>Revision 5</h3>
<ul>
<li><strong>Reissued</strong> for Unicode 17.0.0.</li>
<li>Added New Appendix: Encoding Principles.</li>
<li>Updated description of the kEH_NoMirror property in section 3.</li>
<li>Added a new property: kEH_AltSeq.</li>
</ul>
<p>Revision 4 being a proposed update, only changes between revisions 5 and 3 are noted here.</p>
<p>Previous revisions can be accessed with the “Previous Version” link in the header.</p>
<hr width="50%">
<p class="copyright">© 2024–2025 Unicode, Inc. This publication is protected by copyright, and permission must be obtained from Unicode, Inc. prior to any reproduction, modification, or other use not permitted by the <a href="https://www.unicode.org/copyright.html">Terms of Use</a>. Specifically, you may make copies of this publication and may annotate and translate it solely for personal or internal business purposes and not for public distribution, provided that any such permitted copies and modifications fully reproduce all copyright and other legal notices contained in the original. You may not make copies of or modifications to this publication for public distribution, or incorporate it in whole or in part into any product or publication without the express written permission of Unicode.</p>
<p class="copyright">Use of all Unicode Products, including this publication, is governed by the Unicode <a href="https://www.unicode.org/copyright.html">Terms of Use</a>. The authors, contributors, and publishers have taken care in the preparation of this publication, but make no express or implied representation or warranty of any kind and assume no responsibility or liability for errors or omissions or for consequential or incidental damages that may arise therefrom. This publication is provided “AS-IS” without charge as a convenience to users.</p>
<p class="copyright">Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries.</p>
</div>
</body>
</html>
Rendered documentLive HTML preview