Dead Scripts in Unicode
Unicode encodes dozens of historic and extinct scripts — from Cuneiform and Egyptian Hieroglyphs to Linear B and Gothic — preserving them digitally for scholars and researchers. This guide surveys the historic scripts in Unicode, explains why they were included, and describes the academic and cultural communities that depend on them.
One of Unicode's most ambitious missions is the encoding of dead and ancient scripts — writing systems that are no longer in daily use but remain essential for historical research, archaeological documentation, and cultural preservation. From Egyptian Hieroglyphs to Sumerian Cuneiform, from Minoan Linear B to the mysterious Rongorongo of Easter Island, Unicode has systematically cataloged humanity's written heritage. This guide surveys the major dead scripts encoded in Unicode, the challenges of encoding ancient writing, and how scholars use these encodings today.
Why Encode Dead Scripts?
Before Unicode, scholars working with ancient texts had to use custom fonts, ad hoc encoding schemes, or image-based representations. This made digital scholarship fragmented: a cuneiform tablet transcription created at one university was unreadable on another's system.
Unicode encoding provides:
| Benefit | Explanation |
|---|---|
| Interoperability | Texts are readable on any Unicode-compliant system |
| Searchability | Ancient texts can be indexed, searched, and cross-referenced |
| Preservation | Digital archives have a stable, standards-based encoding |
| Analysis | Computational linguistics tools can process encoded text |
| Accessibility | Scholars worldwide can share data without proprietary fonts |
Major Dead Scripts in Unicode
Egyptian Hieroglyphs (U+13000–U+1342F, U+13430–U+1345F)
| Property | Value |
|---|---|
| Block | Egyptian Hieroglyphs, Egyptian Hieroglyph Format Controls |
| Code points | 1,071 hieroglyphs + 38 format controls |
| Added | Unicode 5.2 (2009), extended in later versions |
| Period | ~3200 BCE – 4th century CE |
| Languages | Ancient Egyptian |
| Plane | SMP (Plane 1) |
Egyptian hieroglyphs present unique encoding challenges. The script is logographic and phonographic simultaneously — signs can represent sounds (phonograms), meanings (ideograms), or provide classification (determinatives). Additionally, hieroglyphs can be arranged in horizontal rows or vertical columns, reading either left-to-right or right-to-left (the direction is indicated by which way the animal/human signs face).
Unicode encodes the individual hieroglyphic signs and provides format control characters (U+13430–U+1345F) for specifying quadrat arrangements — the grouping of signs into square blocks for aesthetic layout.
import unicodedata
# Egyptian hieroglyph example
hiero = chr(0x13000) # EGYPTIAN HIEROGLYPH A001
print(f"{hiero} U+{ord(hiero):04X} {unicodedata.name(hiero)}")
# Output: U+13000 EGYPTIAN HIEROGLYPH A001
Cuneiform (U+12000–U+123FF, U+12400–U+1247F, U+12480–U+1254F)
| Property | Value |
|---|---|
| Blocks | Cuneiform, Cuneiform Numbers/Punctuation, Early Dynastic Cuneiform |
| Code points | 1,234+ |
| Added | Unicode 5.0 (2006) |
| Period | ~3400 BCE – 1st century CE |
| Languages | Sumerian, Akkadian, Hittite, Elamite, Old Persian |
| Plane | SMP (Plane 1) |
Cuneiform is the oldest known writing system, originating in ancient Mesopotamia (modern-day Iraq). The name means "wedge-shaped," referring to the impressions made by a reed stylus on wet clay tablets. Cuneiform was used by multiple civilizations speaking unrelated languages — much as the Latin alphabet serves many languages today.
The Unicode encoding follows the sign lists established by Assyriologists, particularly the Borger and Labat numbering systems. Each sign is encoded as a single character, though many signs have multiple readings depending on context and language.
# Cuneiform signs
an_sign = chr(0x1202D) # CUNEIFORM SIGN AN (sky/god)
print(f"{an_sign} U+{ord(an_sign):04X}")
# Iterate through some cuneiform signs
for cp in range(0x12000, 0x12010):
ch = chr(cp)
name = unicodedata.name(ch, "UNKNOWN")
print(f"{ch} U+{cp:04X} {name}")
Linear B (U+10000–U+1007F, U+10080–U+100FF)
| Property | Value |
|---|---|
| Blocks | Linear B Syllabary, Linear B Ideograms |
| Code points | 211 |
| Added | Unicode 4.0 (2003) |
| Period | ~1450–1200 BCE |
| Languages | Mycenaean Greek |
| Plane | SMP (Plane 1) |
Linear B was the writing system of the Mycenaean civilization, the earliest known form of Greek. It was deciphered in 1952 by Michael Ventris, who showed it recorded an early form of Greek — predating the Greek alphabet by centuries. Linear B is a syllabary with about 87 syllabic signs plus over 100 ideograms (logograms for commodities like wheat, oil, and livestock).
Linear B holds a special place in Unicode history: its code points start at U+10000, making it the first script encoded in the Supplementary Multilingual Plane (Plane 1).
Linear A (U+10600–U+1077F)
| Property | Value |
|---|---|
| Block | Linear A |
| Code points | 341 |
| Added | Unicode 7.0 (2014) |
| Period | ~1800–1450 BCE |
| Languages | Unknown (Minoan) |
Linear A, the predecessor of Linear B, remains undeciphered. It was used by the Minoan civilization on Crete. Unicode encodes it to allow scholars to create searchable digital transcriptions of the tablets and compare sign forms across corpora — even without knowing what the signs mean.
Old Persian Cuneiform (U+103A0–U+103DF)
| Property | Value |
|---|---|
| Block | Old Persian |
| Code points | 50 |
| Added | Unicode 4.1 (2005) |
| Period | ~525–330 BCE |
| Languages | Old Persian |
Created by order of King Darius I for royal inscriptions (most famously at Behistun), Old Persian cuneiform is a semi-alphabetic script distinct from Sumero-Akkadian cuneiform. It has 36 phonetic signs, 8 ideograms, and several number signs.
Other Notable Dead Scripts
| Script | Unicode Block | Period | Code Points | Status |
|---|---|---|---|---|
| Phoenician | U+10900–U+1091F | ~1050–150 BCE | 29 | Ancestor of Greek, Latin, Arabic, Hebrew |
| Coptic | U+2C80–U+2CFF | ~2nd century CE – present (liturgical) | 128 | Last stage of Egyptian language |
| Gothic | U+10330–U+1034F | 4th century CE | 27 | Bible translation by Bishop Wulfila |
| Ogham | U+1680–U+169F | 4th–10th century CE | 29 | Irish/Pictish stone inscriptions |
| Runic | U+16A0–U+16FF | 2nd–15th century CE | 89 | Germanic inscriptions (Elder/Younger Futhark, Anglo-Saxon) |
| Cypriot | U+10800–U+1083F | 11th–2nd century BCE | 55 | Syllabary for Cypriot Greek |
| Ugaritic | U+10380–U+1039F | 14th–12th century BCE | 31 | Cuneiform alphabet from ancient Syria |
| Lydian | U+10920–U+1093F | 7th–1st century BCE | 27 | Anatolian kingdom of Lydia |
| Carian | U+102A0–U+102DF | 7th–1st century BCE | 49 | Southwest Anatolia |
| Lycian | U+10280–U+1029F | 5th–4th century BCE | 29 | Southern Anatolia |
Challenges of Encoding Dead Scripts
Sign Identification
Many ancient signs have uncertain readings. Cuneiform signs evolved over 3,000 years, and the same concept might be written differently in Sumerian, Akkadian, and Hittite texts. Unicode generally encodes signs by form rather than meaning, but establishing which variant forms should be unified as a single code point requires expert consultation.
Undeciphered Scripts
Unicode encodes several undeciphered scripts (Linear A, Proto-Elamite proposals). This
raises a philosophical question: how do you assign character properties (letter, number,
punctuation) to signs whose function is unknown? Unicode uses the General Category Lo
(Letter, other) as a default for most undeciphered signs.
Complex Layout
Egyptian hieroglyphs can be arranged in quadrats (2D groupings). Cuneiform was originally pictographic with complex layouts before becoming wedge-shaped. Unicode focuses on encoding the logical sequence of signs, leaving visual arrangement to rendering engines and format control characters.
SMP Rendering
Most dead scripts are encoded in the Supplementary Multilingual Plane (Plane 1, U+10000+), which requires surrogate pairs in UTF-16. Some legacy systems and fonts do not fully support SMP characters, leading to rendering failures.
// Checking SMP support
const linearB = "\u{10000}"; // LINEAR B SYLLABLE B008 A
console.log(linearB.length); // 2 (UTF-16 surrogate pair)
console.log([...linearB].length); // 1 (actual character count)
Working with Dead Scripts
Font Resources
| Font | Scripts Covered |
|---|---|
| Noto Sans Egyptian Hieroglyphs | Egyptian Hieroglyphs |
| Noto Sans Cuneiform | Cuneiform |
| Noto Sans Linear B | Linear B |
| Noto Sans Ogham | Ogham |
| Noto Sans Runic | Runic |
| Noto Sans Coptic | Coptic |
| Google Noto family | Most encoded scripts |
Scholarly Tools
Several digital humanities projects build on Unicode dead script encoding:
- CDLI (Cuneiform Digital Library Initiative) — 350,000+ cuneiform texts
- ORACC (Open Richly Annotated Cuneiform Corpus) — linguistic annotations
- JSesh — hieroglyphic text editor using Unicode code points
- DAMOS — Database of Mycenaean at Oslo, using Linear B encoding
Key Takeaways
- Unicode encodes dozens of dead and ancient scripts, enabling digital scholarship, searchable archives, and computational analysis of ancient texts.
- Egyptian Hieroglyphs (1,071+ signs), Cuneiform (1,234+ signs), and Linear B (211 signs) are the largest dead script encodings, all in the Supplementary Multilingual Plane.
- Even undeciphered scripts like Linear A are encoded to support scholarly transcription and corpus analysis, with default character properties assigned.
- Dead script encoding faces unique challenges: sign identification across millennia of variation, complex layout systems (hieroglyphic quadrats), and SMP rendering support in legacy systems.
- The Noto font family provides comprehensive coverage for most encoded dead scripts, making them displayable on modern devices.
- Digital corpora like CDLI and ORACC depend on Unicode encoding to make hundreds of thousands of ancient texts searchable and interoperable.
Lainnya di Script Stories
Arabic is the third most widely used writing system in the world, …
Devanagari is an abugida script used to write Hindi, Sanskrit, Marathi, and …
Greek is one of the oldest alphabetic writing systems and gave Unicode …
Cyrillic is used to write Russian, Ukrainian, Bulgarian, Serbian, and over 50 …
Hebrew is an abjad script written right-to-left, used for Biblical Hebrew, Modern …
Thai is an abugida script with no spaces between words, complex vowel …
Japanese is unique in using three scripts simultaneously — Hiragana, Katakana, and …
Hangul was invented in 1443 by King Sejong as a scientific alphabet …
Bengali is an abugida script with over 300 million speakers, used for …
Tamil is one of the oldest living writing systems, with a literary …
The Armenian alphabet was created in 405 AD by the monk Mesrop …
Georgian has three distinct historical scripts — Mkhedruli, Asomtavruli, and Nuskhuri — …
The Ethiopic script (Ge'ez) is an abugida used to write Amharic, Tigrinya, …
There are hundreds of writing systems in use around the world today, …