📜 Script Stories

Dead Scripts in Unicode

Unicode encodes dozens of historic and extinct scripts — from Cuneiform and Egyptian Hieroglyphs to Linear B and Gothic — preserving them digitally for scholars and researchers. This guide surveys the historic scripts in Unicode, explains why they were included, and describes the academic and cultural communities that depend on them.

Published 2023-12-04 · Updated 2025-03-18

One of Unicode's most ambitious missions is the encoding of dead and ancient scripts — writing systems that are no longer in daily use but remain essential for historical research, archaeological documentation, and cultural preservation. From Egyptian Hieroglyphs to Sumerian Cuneiform, from Minoan Linear B to the mysterious Rongorongo of Easter Island, Unicode has systematically cataloged humanity's written heritage. This guide surveys the major dead scripts encoded in Unicode, the challenges of encoding ancient writing, and how scholars use these encodings today.

Why Encode Dead Scripts?

Before Unicode, scholars working with ancient texts had to use custom fonts, ad hoc encoding schemes, or image-based representations. This made digital scholarship fragmented: a cuneiform tablet transcription created at one university was unreadable on another's system.

Unicode encoding provides:

Benefit	Explanation
Interoperability	Texts are readable on any Unicode-compliant system
Searchability	Ancient texts can be indexed, searched, and cross-referenced
Preservation	Digital archives have a stable, standards-based encoding
Analysis	Computational linguistics tools can process encoded text
Accessibility	Scholars worldwide can share data without proprietary fonts

Major Dead Scripts in Unicode

Egyptian Hieroglyphs (U+13000–U+1342F, U+13430–U+1345F)

Property	Value
Block	Egyptian Hieroglyphs, Egyptian Hieroglyph Format Controls
Code points	1,071 hieroglyphs + 38 format controls
Added	Unicode 5.2 (2009), extended in later versions
Period	~3200 BCE – 4th century CE
Languages	Ancient Egyptian
Plane	SMP (Plane 1)

Egyptian hieroglyphs present unique encoding challenges. The script is logographic and phonographic simultaneously — signs can represent sounds (phonograms), meanings (ideograms), or provide classification (determinatives). Additionally, hieroglyphs can be arranged in horizontal rows or vertical columns, reading either left-to-right or right-to-left (the direction is indicated by which way the animal/human signs face).

Unicode encodes the individual hieroglyphic signs and provides format control characters (U+13430–U+1345F) for specifying quadrat arrangements — the grouping of signs into square blocks for aesthetic layout.

import unicodedata

# Egyptian hieroglyph example
hiero = chr(0x13000)  # EGYPTIAN HIEROGLYPH A001
print(f"{hiero} U+{ord(hiero):04X} {unicodedata.name(hiero)}")
# Output: U+13000 EGYPTIAN HIEROGLYPH A001

Cuneiform (U+12000–U+123FF, U+12400–U+1247F, U+12480–U+1254F)

Property	Value
Blocks	Cuneiform, Cuneiform Numbers/Punctuation, Early Dynastic Cuneiform
Code points	1,234+
Added	Unicode 5.0 (2006)
Period	~3400 BCE – 1st century CE
Languages	Sumerian, Akkadian, Hittite, Elamite, Old Persian
Plane	SMP (Plane 1)

Cuneiform is the oldest known writing system, originating in ancient Mesopotamia (modern-day Iraq). The name means "wedge-shaped," referring to the impressions made by a reed stylus on wet clay tablets. Cuneiform was used by multiple civilizations speaking unrelated languages — much as the Latin alphabet serves many languages today.

The Unicode encoding follows the sign lists established by Assyriologists, particularly the Borger and Labat numbering systems. Each sign is encoded as a single character, though many signs have multiple readings depending on context and language.

# Cuneiform signs
an_sign = chr(0x1202D)  # CUNEIFORM SIGN AN (sky/god)
print(f"{an_sign} U+{ord(an_sign):04X}")

# Iterate through some cuneiform signs
for cp in range(0x12000, 0x12010):
    ch = chr(cp)
    name = unicodedata.name(ch, "UNKNOWN")
    print(f"{ch} U+{cp:04X} {name}")

Linear B (U+10000–U+1007F, U+10080–U+100FF)

Property	Value
Blocks	Linear B Syllabary, Linear B Ideograms
Code points	211
Added	Unicode 4.0 (2003)
Period	~1450–1200 BCE
Languages	Mycenaean Greek
Plane	SMP (Plane 1)

Linear B was the writing system of the Mycenaean civilization, the earliest known form of Greek. It was deciphered in 1952 by Michael Ventris, who showed it recorded an early form of Greek — predating the Greek alphabet by centuries. Linear B is a syllabary with about 87 syllabic signs plus over 100 ideograms (logograms for commodities like wheat, oil, and livestock).

Linear B holds a special place in Unicode history: its code points start at U+10000, making it the first script encoded in the Supplementary Multilingual Plane (Plane 1).

Linear A (U+10600–U+1077F)

Property	Value
Block	Linear A
Code points	341
Added	Unicode 7.0 (2014)
Period	~1800–1450 BCE
Languages	Unknown (Minoan)

Linear A, the predecessor of Linear B, remains undeciphered. It was used by the Minoan civilization on Crete. Unicode encodes it to allow scholars to create searchable digital transcriptions of the tablets and compare sign forms across corpora — even without knowing what the signs mean.

Old Persian Cuneiform (U+103A0–U+103DF)

Property	Value
Block	Old Persian
Code points	50
Added	Unicode 4.1 (2005)
Period	~525–330 BCE
Languages	Old Persian

Created by order of King Darius I for royal inscriptions (most famously at Behistun), Old Persian cuneiform is a semi-alphabetic script distinct from Sumero-Akkadian cuneiform. It has 36 phonetic signs, 8 ideograms, and several number signs.

Other Notable Dead Scripts

Script	Unicode Block	Period	Code Points	Status
Phoenician	U+10900–U+1091F	~1050–150 BCE	29	Ancestor of Greek, Latin, Arabic, Hebrew
Coptic	U+2C80–U+2CFF	~2nd century CE – present (liturgical)	128	Last stage of Egyptian language
Gothic	U+10330–U+1034F	4th century CE	27	Bible translation by Bishop Wulfila
Ogham	U+1680–U+169F	4th–10th century CE	29	Irish/Pictish stone inscriptions
Runic	U+16A0–U+16FF	2nd–15th century CE	89	Germanic inscriptions (Elder/Younger Futhark, Anglo-Saxon)
Cypriot	U+10800–U+1083F	11th–2nd century BCE	55	Syllabary for Cypriot Greek
Ugaritic	U+10380–U+1039F	14th–12th century BCE	31	Cuneiform alphabet from ancient Syria
Lydian	U+10920–U+1093F	7th–1st century BCE	27	Anatolian kingdom of Lydia
Carian	U+102A0–U+102DF	7th–1st century BCE	49	Southwest Anatolia
Lycian	U+10280–U+1029F	5th–4th century BCE	29	Southern Anatolia

Challenges of Encoding Dead Scripts

Sign Identification

Many ancient signs have uncertain readings. Cuneiform signs evolved over 3,000 years, and the same concept might be written differently in Sumerian, Akkadian, and Hittite texts. Unicode generally encodes signs by form rather than meaning, but establishing which variant forms should be unified as a single code point requires expert consultation.

Undeciphered Scripts

Unicode encodes several undeciphered scripts (Linear A, Proto-Elamite proposals). This raises a philosophical question: how do you assign character properties (letter, number, punctuation) to signs whose function is unknown? Unicode uses the General Category Lo (Letter, other) as a default for most undeciphered signs.

Complex Layout

Egyptian hieroglyphs can be arranged in quadrats (2D groupings). Cuneiform was originally pictographic with complex layouts before becoming wedge-shaped. Unicode focuses on encoding the logical sequence of signs, leaving visual arrangement to rendering engines and format control characters.

SMP Rendering

Most dead scripts are encoded in the Supplementary Multilingual Plane (Plane 1, U+10000+), which requires surrogate pairs in UTF-16. Some legacy systems and fonts do not fully support SMP characters, leading to rendering failures.

// Checking SMP support
const linearB = "\u{10000}";  // LINEAR B SYLLABLE B008 A
console.log(linearB.length);  // 2 (UTF-16 surrogate pair)
console.log([...linearB].length);  // 1 (actual character count)

Working with Dead Scripts

Font Resources

Font	Scripts Covered
Noto Sans Egyptian Hieroglyphs	Egyptian Hieroglyphs
Noto Sans Cuneiform	Cuneiform
Noto Sans Linear B	Linear B
Noto Sans Ogham	Ogham
Noto Sans Runic	Runic
Noto Sans Coptic	Coptic
Google Noto family	Most encoded scripts

Scholarly Tools

Several digital humanities projects build on Unicode dead script encoding:

CDLI (Cuneiform Digital Library Initiative) — 350,000+ cuneiform texts
ORACC (Open Richly Annotated Cuneiform Corpus) — linguistic annotations
JSesh — hieroglyphic text editor using Unicode code points
DAMOS — Database of Mycenaean at Oslo, using Linear B encoding

Key Takeaways

Unicode encodes dozens of dead and ancient scripts, enabling digital scholarship, searchable archives, and computational analysis of ancient texts.
Egyptian Hieroglyphs (1,071+ signs), Cuneiform (1,234+ signs), and Linear B (211 signs) are the largest dead script encodings, all in the Supplementary Multilingual Plane.
Even undeciphered scripts like Linear A are encoded to support scholarly transcription and corpus analysis, with default character properties assigned.
Dead script encoding faces unique challenges: sign identification across millennia of variation, complex layout systems (hieroglyphic quadrats), and SMP rendering support in legacy systems.
The Noto font family provides comprehensive coverage for most encoded dead scripts, making them displayable on modern devices.
Digital corpora like CDLI and ORACC depend on Unicode encoding to make hundreds of thousands of ancient texts searchable and interoperable.