📜 Script Stories

Ethiopic Script

The Ethiopic script (Ge'ez) is an abugida used to write Amharic, Tigrinya, Oromo, and many other languages of the Horn of Africa, with Unicode's Ethiopic block containing over 500 characters. This guide explores the history and structure of Ethiopic script, its Unicode encoding, and the challenges of digital Ethiopic text.

·

Ethiopic, also known as Ge'ez script, is one of the oldest writing systems still in active daily use. With roots stretching back over 2,000 years to the ancient Kingdom of Aksum in modern-day Ethiopia and Eritrea, Ethiopic is an abugida — a writing system where each character represents a consonant-vowel syllable. Used today by over 100 million people writing in Amharic, Tigrinya, and other Ethio-Semitic and Cushitic languages, the Ethiopic script occupies a substantial footprint in Unicode with over 460 encoded characters. This guide explores the script's history, syllabic structure, Unicode encoding, and practical considerations for developers.

History of Ethiopic Script

From Sabean to Ge'ez

Ethiopic script evolved from the South Arabian (Sabean) script, which was brought to the Horn of Africa by Semitic-speaking peoples around the 8th century BCE. The earliest known Ethiopic inscriptions, from the Kingdom of Aksum (circa 5th century BCE to 1st century CE), were written in a purely consonantal script — like its Sabean ancestor.

The revolutionary development came around the 4th century CE, when Ethiopic was transformed from a consonantal script (abjad) into an abugida by adding vowel diacritics that were incorporated directly into the consonant forms. This coincided with the Christianization of the Aksumite Empire under King Ezana. The modified script enabled the translation of the Bible into Ge'ez, which became the liturgical language of the Ethiopian Orthodox Church.

Ge'ez Language vs. Ge'ez Script

An important distinction:

Term Meaning
Ge'ez (language) Ancient Ethio-Semitic language, now used only in Ethiopian/Eritrean Orthodox liturgy
Ge'ez (script) / Ethiopic The writing system used for multiple living languages

The script outlived the language. While Ge'ez as a spoken language declined around the 10th century, the Ge'ez script was adopted by successor languages: Amharic (Ethiopia's official language, ~50 million speakers), Tigrinya (~10 million speakers in Eritrea and Ethiopia), Tigre, Harari, Gurage languages, and non-Semitic languages like Oromo (in some contexts) and Blin.

How the Ethiopic Abugida Works

The Syllable Matrix

Ethiopic is organized as a matrix of consonants and vowels. Each consonant has seven orders (forms), each representing the consonant combined with one of seven vowels:

Order Vowel Name Example (ሀ h-row)
1st ä (default) Ge'ez ሀ (hä)
2nd u Ka'eb ሁ (hu)
3rd i Salis ሂ (hi)
4th a Rabe' ሃ (ha)
5th e Hamis ሄ (he)
6th (none/ə) Sadis ህ (hə/h)
7th o Sabe' ሆ (ho)

The 6th order represents the bare consonant or a reduced vowel (schwa). The visual modifications between orders are systematic but not always predictable — some orders modify the right leg, others add small appendages, and some change the character's shape entirely.

Consonant Families

The basic Ethiopic syllabary has 26 base consonants (the traditional Ge'ez set), each with 7 vowel forms, giving 182 base syllable characters. Languages like Amharic and Tigrinya add additional consonants:

Language Base Consonants Total Syllable Characters
Ge'ez (classical) 26 182
Amharic 33+ 231+
Tigrinya 32+ 224+
Extended (all languages) 50+ 350+

Numerals and Punctuation

Ethiopic has its own numeral system (derived from Greek numerals) and punctuation:

Character Code Point Name
U+1369 Ethiopic digit one
U+136A Ethiopic digit two
U+1372 Ethiopic number ten
U+137B Ethiopic number hundred
U+137C Ethiopic number ten thousand
U+1362 Ethiopic full stop
U+1361 Ethiopic wordspace
U+1363 Ethiopic comma
U+1364 Ethiopic semicolon

Notably, Ethiopic traditionally uses U+1361 (Ethiopic wordspace ፡) rather than a regular space character to separate words, though modern usage increasingly uses ordinary spaces (U+0020).

Ethiopic in Unicode

Unicode Blocks

Ethiopic characters span four Unicode blocks:

Block Range Characters Content
Ethiopic U+1200–U+137F 384 Core syllabary, numerals, punctuation
Ethiopic Supplement U+1380–U+139F 32 Tonal marks, additional characters
Ethiopic Extended U+2D80–U+2DDF 96 Characters for Sebatbeit, Me'en, Blin
Ethiopic Extended-A U+AB00–U+AB2F 48 Characters for Gamo-Gofa-Dawro, Basketo
Ethiopic Extended-B U+1E7E0–U+1E7FF 32 Characters for additional languages

That totals over 460 code points — making Ethiopic one of the largest script encodings in Unicode after CJK ideographs and Hangul.

Encoding Structure

Unlike many Indic abugidas where vowel diacritics are separate combining characters, Ethiopic encodes each consonant-vowel combination as a single precomposed code point. There are no combining marks for vowels:

# Each syllable is a single code point — no decomposition
import unicodedata

syllable = "ሀ"  # ha
print(f"U+{ord(syllable):04X}")  # U+1200
print(unicodedata.name(syllable))  # ETHIOPIC SYLLABLE HA
print(unicodedata.decomposition(syllable))  # "" (empty — no decomposition)

# Compare: 7 orders of the "h" consonant
h_row = [chr(0x1200 + i) for i in range(7)]
for s in h_row:
    print(f"{s} U+{ord(s):04X} {unicodedata.name(s)}")
# ሀ U+1200 ETHIOPIC SYLLABLE HA
# ሁ U+1201 ETHIOPIC SYLLABLE HU
# ሂ U+1202 ETHIOPIC SYLLABLE HI
# ሃ U+1203 ETHIOPIC SYLLABLE HAA
# ሄ U+1204 ETHIOPIC SYLLABLE HEE
# ህ U+1205 ETHIOPIC SYLLABLE HE
# ሆ U+1206 ETHIOPIC SYLLABLE HO

This design means: - No normalization issues — there is only one way to encode each syllable - Simple string processing — each code point is one syllable - Larger block size — many code points are needed (7 per consonant)

Detecting Ethiopic

import unicodedata

def is_ethiopic(ch):
    try:
        return "ETHIOPIC" in unicodedata.name(ch)
    except ValueError:
        return False

# Or by code point range
def is_ethiopic_range(ch):
    cp = ord(ch)
    return (0x1200 <= cp <= 0x137F or   # Ethiopic
            0x1380 <= cp <= 0x139F or   # Ethiopic Supplement
            0x2D80 <= cp <= 0x2DDF or   # Ethiopic Extended
            0xAB00 <= cp <= 0xAB2F or   # Ethiopic Extended-A
            0x1E7E0 <= cp <= 0x1E7FF)   # Ethiopic Extended-B
// JavaScript Unicode property escapes
const ethiopicRegex = /\p{Script=Ethiopic}/u;
console.log(ethiopicRegex.test("ሀ")); // true
console.log(ethiopicRegex.test("A")); // false

Practical Considerations

Font Support

Font Platform Notes
Noto Sans Ethiopic Cross-platform Full coverage, recommended
Noto Serif Ethiopic Cross-platform Serif variant
Abyssinica SIL Cross-platform SIL-designed, excellent
Nyala Windows System font since Vista
Kefa macOS/iOS Apple system font

Text Direction

Ethiopic is written left-to-right, top-to-bottom — the same direction as Latin text. This simplifies layout compared to bidirectional scripts.

Line Breaking and Word Spacing

Traditionally, Ethiopic uses the wordspace character (U+1361 ፡) between words. In modern digital text, regular spaces are increasingly common. The Unicode Line Breaking Algorithm treats Ethiopic characters as class AL (Alphabetic), and line breaks are permitted at ordinary space boundaries.

Key Takeaways

  • Ethiopic (Ge'ez script) is an abugida where each character represents a consonant-vowel syllable, organized in a matrix of consonants (rows) by seven vowel orders (columns).
  • With over 460 encoded characters across five Unicode blocks, Ethiopic is one of the largest script encodings in Unicode, serving Amharic (~50M speakers), Tigrinya (~10M), and multiple other languages.
  • Unlike most Indic abugidas, Ethiopic uses precomposed code points for each syllable (no combining marks for vowels), which eliminates normalization issues but requires many code points.
  • The script evolved from South Arabian consonantal writing into a full abugida around the 4th century CE, coinciding with the Christianization of the Aksumite Empire.
  • Ethiopic has its own numeral system (U+1369–U+137C) and punctuation including the traditional wordspace character (U+1361 ፡).
  • Modern font support is excellent through the Noto Ethiopic family and platform-specific fonts (Nyala on Windows, Kefa on macOS).

Ещё в Script Stories