📜 Script Stories

Dead Scripts in Unicode

Unicode encodes dozens of historic and extinct scripts — from Cuneiform and Egyptian Hieroglyphs to Linear B and Gothic — preserving them digitally for scholars and researchers. This guide surveys the historic scripts in Unicode, explains why they were included, and describes the academic and cultural communities that depend on them.

·

One of Unicode's most ambitious missions is the encoding of dead and ancient scripts — writing systems that are no longer in daily use but remain essential for historical research, archaeological documentation, and cultural preservation. From Egyptian Hieroglyphs to Sumerian Cuneiform, from Minoan Linear B to the mysterious Rongorongo of Easter Island, Unicode has systematically cataloged humanity's written heritage. This guide surveys the major dead scripts encoded in Unicode, the challenges of encoding ancient writing, and how scholars use these encodings today.

Why Encode Dead Scripts?

Before Unicode, scholars working with ancient texts had to use custom fonts, ad hoc encoding schemes, or image-based representations. This made digital scholarship fragmented: a cuneiform tablet transcription created at one university was unreadable on another's system.

Unicode encoding provides:

Benefit Explanation
Interoperability Texts are readable on any Unicode-compliant system
Searchability Ancient texts can be indexed, searched, and cross-referenced
Preservation Digital archives have a stable, standards-based encoding
Analysis Computational linguistics tools can process encoded text
Accessibility Scholars worldwide can share data without proprietary fonts

Major Dead Scripts in Unicode

Egyptian Hieroglyphs (U+13000–U+1342F, U+13430–U+1345F)

Property Value
Block Egyptian Hieroglyphs, Egyptian Hieroglyph Format Controls
Code points 1,071 hieroglyphs + 38 format controls
Added Unicode 5.2 (2009), extended in later versions
Period ~3200 BCE – 4th century CE
Languages Ancient Egyptian
Plane SMP (Plane 1)

Egyptian hieroglyphs present unique encoding challenges. The script is logographic and phonographic simultaneously — signs can represent sounds (phonograms), meanings (ideograms), or provide classification (determinatives). Additionally, hieroglyphs can be arranged in horizontal rows or vertical columns, reading either left-to-right or right-to-left (the direction is indicated by which way the animal/human signs face).

Unicode encodes the individual hieroglyphic signs and provides format control characters (U+13430–U+1345F) for specifying quadrat arrangements — the grouping of signs into square blocks for aesthetic layout.

import unicodedata

# Egyptian hieroglyph example
hiero = chr(0x13000)  # EGYPTIAN HIEROGLYPH A001
print(f"{hiero} U+{ord(hiero):04X} {unicodedata.name(hiero)}")
# Output: U+13000 EGYPTIAN HIEROGLYPH A001

Cuneiform (U+12000–U+123FF, U+12400–U+1247F, U+12480–U+1254F)

Property Value
Blocks Cuneiform, Cuneiform Numbers/Punctuation, Early Dynastic Cuneiform
Code points 1,234+
Added Unicode 5.0 (2006)
Period ~3400 BCE – 1st century CE
Languages Sumerian, Akkadian, Hittite, Elamite, Old Persian
Plane SMP (Plane 1)

Cuneiform is the oldest known writing system, originating in ancient Mesopotamia (modern-day Iraq). The name means "wedge-shaped," referring to the impressions made by a reed stylus on wet clay tablets. Cuneiform was used by multiple civilizations speaking unrelated languages — much as the Latin alphabet serves many languages today.

The Unicode encoding follows the sign lists established by Assyriologists, particularly the Borger and Labat numbering systems. Each sign is encoded as a single character, though many signs have multiple readings depending on context and language.

# Cuneiform signs
an_sign = chr(0x1202D)  # CUNEIFORM SIGN AN (sky/god)
print(f"{an_sign} U+{ord(an_sign):04X}")

# Iterate through some cuneiform signs
for cp in range(0x12000, 0x12010):
    ch = chr(cp)
    name = unicodedata.name(ch, "UNKNOWN")
    print(f"{ch} U+{cp:04X} {name}")

Linear B (U+10000–U+1007F, U+10080–U+100FF)

Property Value
Blocks Linear B Syllabary, Linear B Ideograms
Code points 211
Added Unicode 4.0 (2003)
Period ~1450–1200 BCE
Languages Mycenaean Greek
Plane SMP (Plane 1)

Linear B was the writing system of the Mycenaean civilization, the earliest known form of Greek. It was deciphered in 1952 by Michael Ventris, who showed it recorded an early form of Greek — predating the Greek alphabet by centuries. Linear B is a syllabary with about 87 syllabic signs plus over 100 ideograms (logograms for commodities like wheat, oil, and livestock).

Linear B holds a special place in Unicode history: its code points start at U+10000, making it the first script encoded in the Supplementary Multilingual Plane (Plane 1).

Linear A (U+10600–U+1077F)

Property Value
Block Linear A
Code points 341
Added Unicode 7.0 (2014)
Period ~1800–1450 BCE
Languages Unknown (Minoan)

Linear A, the predecessor of Linear B, remains undeciphered. It was used by the Minoan civilization on Crete. Unicode encodes it to allow scholars to create searchable digital transcriptions of the tablets and compare sign forms across corpora — even without knowing what the signs mean.

Old Persian Cuneiform (U+103A0–U+103DF)

Property Value
Block Old Persian
Code points 50
Added Unicode 4.1 (2005)
Period ~525–330 BCE
Languages Old Persian

Created by order of King Darius I for royal inscriptions (most famously at Behistun), Old Persian cuneiform is a semi-alphabetic script distinct from Sumero-Akkadian cuneiform. It has 36 phonetic signs, 8 ideograms, and several number signs.

Other Notable Dead Scripts

Script Unicode Block Period Code Points Status
Phoenician U+10900–U+1091F ~1050–150 BCE 29 Ancestor of Greek, Latin, Arabic, Hebrew
Coptic U+2C80–U+2CFF ~2nd century CE – present (liturgical) 128 Last stage of Egyptian language
Gothic U+10330–U+1034F 4th century CE 27 Bible translation by Bishop Wulfila
Ogham U+1680–U+169F 4th–10th century CE 29 Irish/Pictish stone inscriptions
Runic U+16A0–U+16FF 2nd–15th century CE 89 Germanic inscriptions (Elder/Younger Futhark, Anglo-Saxon)
Cypriot U+10800–U+1083F 11th–2nd century BCE 55 Syllabary for Cypriot Greek
Ugaritic U+10380–U+1039F 14th–12th century BCE 31 Cuneiform alphabet from ancient Syria
Lydian U+10920–U+1093F 7th–1st century BCE 27 Anatolian kingdom of Lydia
Carian U+102A0–U+102DF 7th–1st century BCE 49 Southwest Anatolia
Lycian U+10280–U+1029F 5th–4th century BCE 29 Southern Anatolia

Challenges of Encoding Dead Scripts

Sign Identification

Many ancient signs have uncertain readings. Cuneiform signs evolved over 3,000 years, and the same concept might be written differently in Sumerian, Akkadian, and Hittite texts. Unicode generally encodes signs by form rather than meaning, but establishing which variant forms should be unified as a single code point requires expert consultation.

Undeciphered Scripts

Unicode encodes several undeciphered scripts (Linear A, Proto-Elamite proposals). This raises a philosophical question: how do you assign character properties (letter, number, punctuation) to signs whose function is unknown? Unicode uses the General Category Lo (Letter, other) as a default for most undeciphered signs.

Complex Layout

Egyptian hieroglyphs can be arranged in quadrats (2D groupings). Cuneiform was originally pictographic with complex layouts before becoming wedge-shaped. Unicode focuses on encoding the logical sequence of signs, leaving visual arrangement to rendering engines and format control characters.

SMP Rendering

Most dead scripts are encoded in the Supplementary Multilingual Plane (Plane 1, U+10000+), which requires surrogate pairs in UTF-16. Some legacy systems and fonts do not fully support SMP characters, leading to rendering failures.

// Checking SMP support
const linearB = "\u{10000}";  // LINEAR B SYLLABLE B008 A
console.log(linearB.length);  // 2 (UTF-16 surrogate pair)
console.log([...linearB].length);  // 1 (actual character count)

Working with Dead Scripts

Font Resources

Font Scripts Covered
Noto Sans Egyptian Hieroglyphs Egyptian Hieroglyphs
Noto Sans Cuneiform Cuneiform
Noto Sans Linear B Linear B
Noto Sans Ogham Ogham
Noto Sans Runic Runic
Noto Sans Coptic Coptic
Google Noto family Most encoded scripts

Scholarly Tools

Several digital humanities projects build on Unicode dead script encoding:

  • CDLI (Cuneiform Digital Library Initiative) — 350,000+ cuneiform texts
  • ORACC (Open Richly Annotated Cuneiform Corpus) — linguistic annotations
  • JSesh — hieroglyphic text editor using Unicode code points
  • DAMOS — Database of Mycenaean at Oslo, using Linear B encoding

Key Takeaways

  • Unicode encodes dozens of dead and ancient scripts, enabling digital scholarship, searchable archives, and computational analysis of ancient texts.
  • Egyptian Hieroglyphs (1,071+ signs), Cuneiform (1,234+ signs), and Linear B (211 signs) are the largest dead script encodings, all in the Supplementary Multilingual Plane.
  • Even undeciphered scripts like Linear A are encoded to support scholarly transcription and corpus analysis, with default character properties assigned.
  • Dead script encoding faces unique challenges: sign identification across millennia of variation, complex layout systems (hieroglyphic quadrats), and SMP rendering support in legacy systems.
  • The Noto font family provides comprehensive coverage for most encoded dead scripts, making them displayable on modern devices.
  • Digital corpora like CDLI and ORACC depend on Unicode encoding to make hundreds of thousands of ancient texts searchable and interoperable.

Script Stories में और