Writing Systems of the World · บทที่ 9

Hebrew: Ancient Script in the Digital Age

Hebrew is one of the oldest scripts still in daily use, with unique challenges including right-to-left directionality, cantillation marks, and the distinction between biblical and modern forms.

~3,500 คำ · ~14 นาทีที่อ่าน · · Updated

Hebrew is a writing system that has survived a near-death experience. By the 4th century CE, Hebrew had ceased to be a spoken vernacular language — it persisted only in religious texts, liturgy, and scholarly writing. Yet through an unprecedented linguistic resurrection in the 19th and 20th centuries, driven by Zionist intellectuals and particularly by Eliezer Ben-Yehuda, Hebrew became a living spoken language again, with millions of native speakers today. This unique history — ancient script, medieval continuity, modern revival — makes Hebrew one of the most remarkable writing systems in Unicode, and one with particularly interesting technical challenges in the digital age.

Three Thousand Years of Continuity

The Hebrew alphabet's history reaches back to the ancient Canaanite writing tradition of the Levant. The Paleo-Hebrew script used before the Babylonian exile (6th century BCE) is virtually identical to Phoenician — the ancestor of Greek and thus Latin. After the exile, Jews returning from Babylon brought a different script tradition: Aramaic square script (also called Assyrian script), which gradually replaced Paleo-Hebrew for secular use. This square script — the recognizable block letters of modern Hebrew — became dominant by the Second Temple period.

Paleo-Hebrew survived in two contexts: on coins (for national-symbolic reasons) and in Torah scrolls for the divine name (the Tetragrammaton, YHWH). Today, Paleo-Hebrew (U+10900–U+1091F) is encoded in Unicode for academic and historical purposes, while standard Hebrew uses the square script.

The 22 letters of the Hebrew alphabet are all consonants — Hebrew is, like Arabic, an abjad. The letters are:

Letter Name Numeric Value Sound
א Alef 1 silent / glottal stop
ב Bet 2 /b/ or /v/
ג Gimel 3 /g/
ד Dalet 4 /d/
ה He 5 /h/
ו Vav 6 /v/ or vowel
ז Zayin 7 /z/
ח Het 8 /x/ (voiceless velar/pharyngeal fricative)
ט Tet 9 /t/
י Yod 10 /j/ or vowel
כ/ך Kaf/Final Kaf 20 /k/ or /x/
ל Lamed 30 /l/
מ/ם Mem/Final Mem 40 /m/
נ/ן Nun/Final Nun 50 /n/
ס Samekh 60 /s/
ע Ayin 70 silent / pharyngeal
פ/ף Pe/Final Pe 80 /p/ or /f/
צ/ץ Tsadi/Final Tsadi 90 /ts/
ק Qof 100 /k/
ר Resh 200 /r/
ש Shin/Sin 300 /ʃ/ or /s/
ת Tav 400 /t/

Five letters (כ מ נ פ צ) have special final forms (sofit) used at the end of words. Unicode encodes both the regular and final forms as separate code points.

Niqqud: The Vowel Pointing System

Unvocalized Hebrew (כתיב חסר, ketiv ḥaser) — standard in modern Israeli publishing — is read fluently by native speakers who supply vowels from context. But religious texts, children's literature, poetry, and Hebrew learning materials use niqqud (נִיקּוּד) — a system of dots and dashes placed above, below, and within letters to indicate vowels.

The major niqqud marks in Unicode:

Mark Unicode Name Sound
◌ָ U+05B8 Qamats /a/
◌ַ U+05B7 Patah /a/
◌ֵ U+05B5 Tsere /e/
◌ֶ U+05B6 Segol /e/
◌ִ U+05B4 Hiriq /i/
◌ֹ U+05B9 Holam /o/
◌ּ U+05BC Dagesh consonant doubling or hardening
◌ׁ U+05C1 Shin Dot marks ש as /ʃ/
◌ׂ U+05C2 Sin Dot marks ש as /s/
◌ׇ U+05C7 Qamats Qatan /o/ in some contexts

A single Hebrew letter with full vowel pointing may have three or four combining marks. Consider the word בְּרֵאשִׁית (bereshit, "in the beginning" — the first word of the Torah): each letter carries dagesh, vowel marks, and sometimes cantillation marks.

Cantillation Marks

Biblical Hebrew texts in synagogue use include teamim (טְעָמִים) — cantillation marks that indicate the melodic pattern for chanting and also serve as a system of syntactic punctuation. The Unicode Hebrew block (U+0591–U+05C7) includes an extensive set of these cantillation marks:

  • U+0591 HEBREW ACCENT ETNAHTA
  • U+0592 HEBREW ACCENT SEGOL
  • U+0593 HEBREW ACCENT SHALSHELET
  • U+05A0 HEBREW ACCENT TELISHA GEDOLA
  • U+05A1 HEBREW ACCENT PAZER
  • ... (dozens of cantillation marks)

A fully marked biblical Hebrew text may have on each word: the consonantal letter, up to several niqqud vowel marks, and one or more cantillation marks. This stacking of combining characters makes fully vocalized biblical Hebrew among the most diacritic-dense text in Unicode.

Bidirectional Hebrew and IDN Security

Hebrew, like Arabic, is written right-to-left and participates in the Unicode Bidirectional Algorithm. The same technical challenges apply: the bidi algorithm must correctly interleave Hebrew text with embedded Latin words, URLs, numbers, and punctuation. The dir="rtl" attribute and RLI/LRI isolate controls are essential for correct web display.

Hebrew's IDN security concerns parallel Cyrillic's. While Hebrew-Latin confusables are fewer than Cyrillic-Latin (the scripts are visually more distinct), some exist — notably U+05D0 א (Alef) can resemble a modified A, and several letters when poorly rendered may cause confusion.

Yiddish and Ladino

Hebrew script is used for languages beyond Hebrew itself. Yiddish — a High German language written in Hebrew script, historically the vernacular of Ashkenazi Jews — has its own orthographic conventions that differ significantly from Hebrew. Yiddish fully vocalizes vowels with regular letters (vu, vi, alef, ayin) rather than niqqud, and has its own set of digraphs and ligatures:

  • U+FB1D HEBREW LETTER YOD WITH HIRIQ — Yiddish yod
  • U+FB1F HEBREW LETTER ALTERNATIVE AYIN — Yiddish ayin
  • U+FB2E HEBREW LETTER ALEF WITH PATAH — Komets-alef

Ladino (Judeo-Spanish), written in the Rashi script or standard square Hebrew script, is another Semitic-script Romance language with specialized orthographic conventions.

The Dead Sea Scrolls and Digital Hebrew

The Dead Sea Scrolls — ancient Jewish manuscripts discovered between 1947 and 1956 near Qumran — represent the oldest known biblical Hebrew manuscripts, dating from the 3rd century BCE to the 1st century CE. The Israel Antiquities Authority has undertaken a massive digitization project, photographing fragments in multispectral imaging and making them available online.

Encoding these texts creates interesting challenges: they are written in a transitional script between Paleo-Hebrew and square Hebrew, use non-standard orthography, and include texts not in the standard biblical canon. Unicode's Paleo-Hebrew block and the standard Hebrew block together can represent most characters, but scholarly apparatus requires additional specialized encoding in some cases.

Modern Hebrew computing — from Israeli government systems to WhatsApp messages in Tel Aviv — runs on the standard Unicode Hebrew block, a remarkable continuity between the world's newest revival language and one of its most ancient scripts.