Hebrew Script
Hebrew is an abjad script written right-to-left, used for Biblical Hebrew, Modern Hebrew, and Yiddish, with optional vowel diacritics called niqqud that are encoded as combining characters. This guide covers the Hebrew Unicode block, how the bidirectional algorithm handles Hebrew text, and the history of this ancient script.
Hebrew is one of the oldest writing systems still in daily use. Its alphabet has been in continuous use for over 3,000 years — from the ancient inscriptions of Iron Age Israel to the smartphones of modern Tel Aviv. Hebrew script is used for both Modern Hebrew (spoken by 9 million people) and Yiddish (with its distinct set of orthographic conventions), as well as Ladino and Judeo-Arabic. For Unicode, Hebrew presents a fascinating combination of challenges: right-to-left directionality, optional vowel points (nikkud), cantillation marks for biblical text, and a tradition of calligraphic and typographic complexity. This guide explores how Unicode encodes Hebrew script, how the bidirectional algorithm handles it, and what developers need to know.
History
Hebrew script descends from the Phoenician alphabet (c. 1050 BCE), one of the earliest alphabetic writing systems. The earliest known Hebrew inscriptions date to the 10th century BCE. The modern "square" letter forms (Ktav Ashuri) were adopted during the Babylonian exile (6th century BCE) and have remained largely unchanged for over 2,000 years.
| Period | Script Form | Example |
|---|---|---|
| 10th c. BCE | Paleo-Hebrew | Gezer Calendar inscription |
| 6th c. BCE | Square script (Ktav Ashuri) | Dead Sea Scrolls |
| 2nd c. CE | Mishna, Talmud manuscripts | Standardized square forms |
| 10th c. CE | Tiberian vocalization system | Masoretic Text of the Bible |
| 1880s | Modern Hebrew revival | Eliezer Ben-Yehuda |
| Today | Modern Hebrew (Israel) | 9M+ speakers |
The revival of Hebrew as a spoken language in the late 19th and early 20th centuries is one of the most remarkable linguistic achievements in history. Hebrew went from being primarily a liturgical and scholarly language to the everyday tongue of an entire nation.
The Hebrew Alphabet
Hebrew has 22 consonant letters. It is an abjad — a writing system that primarily represents consonants, with vowels optionally indicated by diacritical marks:
| Letter | Name | Unicode | Transliteration | Final Form |
|---|---|---|---|---|
| א | Alef | U+05D0 | (glottal stop) | — |
| ב | Bet | U+05D1 | b/v | — |
| ג | Gimel | U+05D2 | g | — |
| ד | Dalet | U+05D3 | d | — |
| ה | He | U+05D4 | h | — |
| ו | Vav | U+05D5 | v/o/u | — |
| ז | Zayin | U+05D6 | z | — |
| ח | Het | U+05D7 | ch | — |
| ט | Tet | U+05D8 | t | — |
| י | Yod | U+05D9 | y/i | — |
| כ | Kaf | U+05DB | k/kh | ך U+05DA |
| ל | Lamed | U+05DC | l | — |
| מ | Mem | U+05DE | m | ם U+05DD |
| נ | Nun | U+05E0 | n | ן U+05DF |
| ס | Samekh | U+05E1 | s | — |
| ע | Ayin | U+05E2 | (pharyngeal) | — |
| פ | Pe | U+05E4 | p/f | ף U+05E3 |
| צ | Tsadi | U+05E6 | ts | ץ U+05E5 |
| ק | Qof | U+05E7 | q | — |
| ר | Resh | U+05E8 | r | — |
| ש | Shin | U+05E9 | sh/s | — |
| ת | Tav | U+05EA | t | — |
Final Forms (Sofit)
Five Hebrew letters have alternate forms used when they appear at the end of a word: Kaf (ך), Mem (ם), Nun (ן), Pe (ף), and Tsadi (ץ). These are encoded as separate characters in Unicode — they are not contextual variants like Arabic positional forms.
# Final forms are distinct code points
FINAL_FORMS: dict[str, str] = {
"\u05DB": "\u05DA", # Kaf → Final Kaf
"\u05DE": "\u05DD", # Mem → Final Mem
"\u05E0": "\u05DF", # Nun → Final Nun
"\u05E4": "\u05E3", # Pe → Final Pe
"\u05E6": "\u05E5", # Tsadi → Final Tsadi
}
Unicode Blocks for Hebrew
| Block | Range | Characters | Purpose |
|---|---|---|---|
| Hebrew | U+0590–U+05FF | 88 | Consonants, vowels, accents |
| Alphabetic Presentation Forms | U+FB00–U+FB4F | 58 (Hebrew subset) | Ligatures, wide/alternative letters |
The Main Hebrew Block (U+0590–U+05FF)
This block is organized into three sections:
- Cantillation marks (U+0591–U+05AF): Accents used in biblical text
- Points and vowels (U+05B0–U+05BD, U+05BF, U+05C1–U+05C2, U+05C4–U+05C5): Nikkud
- Letters (U+05D0–U+05EA): The 22 consonants + 5 final forms
Alphabetic Presentation Forms (Hebrew Subset)
The Alphabetic Presentation Forms block (U+FB1D–U+FB4F) contains:
- Wide letters for justified text
- Alternative letter forms (e.g., alternative Ayin)
- Yiddish ligatures (e.g., double Vav, Vav-Yod, double Yod)
- Precomposed letter + dagesh combinations
Like Arabic Presentation Forms, these are primarily for compatibility. New text should use the base characters from the main Hebrew block.
Nikkud: The Vowel System
In everyday Modern Hebrew, text is written without vowel marks (ktiv maleh — "full writing" uses matres lectionis: Vav and Yod as vowel indicators). The full vowel system, called nikkud (ניקוד, "dotting"), is used in:
- The Torah and other religious texts
- Children's books and educational materials
- Dictionaries and poetry
- Disambiguation of homographs
- Texts for Hebrew language learners
The Vowel Marks
| Mark | Name | Unicode | Sound | Position |
|---|---|---|---|---|
| ַ | Patach | U+05B7 | /a/ | Below |
| ָ | Qamats | U+05B8 | /a/ or /o/ | Below |
| ֶ | Segol | U+05B6 | /e/ | Below |
| ֵ | Tsere | U+05B5 | /e/ | Below |
| ִ | Hiriq | U+05B4 | /i/ | Below |
| ֹ | Holam | U+05B9 | /o/ | Above |
| ֻ | Qubuts | U+05BB | /u/ | Below |
| ְ | Shva | U+05B0 | /e/ or silent | Below |
| ֲ | Hataf Patach | U+05B2 | /a/ (reduced) | Below |
| ֳ | Hataf Qamats | U+05B3 | /o/ (reduced) | Below |
| ֱ | Hataf Segol | U+05B1 | /e/ (reduced) | Below |
The Dagesh
The dagesh (דגש, U+05BC) is a dot placed inside a consonant that changes its pronunciation. There are two types:
- Dagesh Kal (light): Changes fricative to plosive (e.g., ב /v/ → בּ /b/)
- Dagesh Chazak (strong): Indicates gemination (doubling of the consonant)
Six letters change pronunciation with dagesh: Bet (בּ/ב), Gimel (גּ/ג), Dalet (דּ/ד), Kaf (כּ/כ), Pe (פּ/פ), Tav (תּ/ת). These are known as the BeGeD KeFeT letters.
The Shin Dot and Sin Dot
The letter Shin (ש) represents two different sounds, distinguished by a dot:
| Form | Name | Unicode Sequence | Sound |
|---|---|---|---|
| שׁ | Shin | U+05E9 + U+05C1 | /sh/ |
| שׂ | Sin | U+05E9 + U+05C2 | /s/ |
The dot (shin dot or sin dot) is a combining mark placed above-right or above-left of the letter.
Encoding Order for Pointed Text
When a consonant has multiple marks (vowel, dagesh, cantillation), they must be stored in a specific order. Unicode's canonical ordering for Hebrew combining marks follows this pattern:
Base consonant + Shin/Sin dot + Dagesh + Vowel + Cantillation marks
Example: שָׁלוֹם (shalom) is encoded as:
U+05E9 SHIN ש
U+05C1 SHIN DOT ׁ (marks shin, not sin)
U+05B8 QAMATS ָ (vowel /a/)
U+05DC LAMED ל
U+05D5 VAV ו
U+05B9 HOLAM ֹ (vowel /o/)
U+05DD FINAL MEM ם
import unicodedata
shalom = "\u05E9\u05C1\u05B8\u05DC\u05D5\u05B9\u05DD"
print(shalom) # שָׁלוֹם
# Inspect each code point
for ch in shalom:
print(f" U+{ord(ch):04X} {unicodedata.name(ch)} "
f"cat={unicodedata.category(ch)}")
Cantillation Marks (Te'amim)
For biblical Hebrew text, Unicode provides a comprehensive set of cantillation marks (טעמים, te'amim) — accent marks that indicate melodic patterns for liturgical reading. These occupy U+0591–U+05AF in the Hebrew block:
| Mark | Name | Unicode | Position |
|---|---|---|---|
| ֑ | Etnahta | U+0591 | Below |
| ֒ | Segol (accent) | U+0592 | Above |
| ֓ | Shalshelet | U+0593 | Above |
| ֔ | Zaqef Qatan | U+0594 | Above |
| ֕ | Zaqef Gadol | U+0595 | Above |
| ֖ | Tipeha | U+0596 | Below |
| ֗ | Revia | U+0597 | Above |
| ֚ | Yetiv | U+059A | Below |
| ֛ | Tevir | U+059B | Below |
| ֣ | Munah | U+05A3 | Below |
| ֤ | Mahapakh | U+05A4 | Below |
| ֥ | Merkha | U+05A5 | Below |
A fully pointed and accented biblical text can have three or more combining marks on a single consonant — a vowel, a dagesh, and one or more cantillation marks.
Bidirectional Text
Hebrew, like Arabic, is written right-to-left (RTL). The Unicode Bidirectional Algorithm (UBA) handles Hebrew text alongside LTR content. Hebrew characters have Bidi_Class R (Right-to-Left).
Common Bidi Challenges with Hebrew
Mixing Hebrew and English:
<!-- Proper isolation of embedded LTR text -->
<p dir="rtl">הפרוטוקול <bdi>HTTP/2</bdi> הוא מהיר יותר.</p>
Numbers in Hebrew text: Hebrew uses Western digits (0-9), which are classified as European Number (EN) in the Bidi algorithm. They generally render correctly, but punctuation adjacent to numbers can jump to unexpected positions.
Parentheses and brackets: These are neutral characters whose direction is resolved by context. In Hebrew text, parentheses are automatically mirrored:
English: Hello (world)
Hebrew: (שלום (עולם — parentheses mirror in RTL context
HTML/CSS for Hebrew
<html dir="rtl" lang="he">
<head>
<style>
body {
direction: rtl;
unicode-bidi: isolate;
text-align: start;
font-family: "Frank Ruhl Libre", "David", serif;
}
/* CSS logical properties */
.indent {
margin-inline-start: 2rem; /* Right margin in RTL */
padding-inline-end: 1rem; /* Left padding in RTL */
}
/* Pointed text needs extra line-height for marks */
.nikkud {
line-height: 2;
}
</style>
</head>
Yiddish in Unicode
Yiddish uses Hebrew script but with significant orthographic differences. While Hebrew uses consonant letters with optional vowels, Yiddish uses certain Hebrew letters as full vowels:
| Hebrew Letter | Yiddish Use | Sound |
|---|---|---|
| א (Alef) | Silent or /a/ | Depends on context |
| אַ (Alef + Patach) | /a/ | Always /a/ |
| אָ (Alef + Qamats) | /o/ | Always /o/ |
| ו (Vav) | /u/ | Always /u/ |
| וּ (Vav + dagesh) | /u/ (explicitly marked) | /u/ |
| וו (double Vav) | /v/ | Consonant |
| י (Yod) | /i/ | Always /i/ |
| יי (double Yod) | /ey/ | Diphthong |
| ײַ (double Yod + Patach) | /ay/ | Diphthong |
The Alphabetic Presentation Forms block includes Yiddish ligatures:
| Character | Unicode | Name |
|---|---|---|
| ﬠ | U+FB20 | ALTERNATIVE AYIN |
| ﬡ | U+FB21 | WIDE ALEF |
| ײ | U+FB1F | YIDDISH YOD YOD PATACH |
| וו | U+FB35 | VAV WITH DAGESH (for Yiddish /u/) |
Gematria: Numerical Values
Hebrew letters have traditional numerical values, a system called gematria (גימטריה). This is used in religious texts, dates on Jewish gravestones, and page numbering in some Hebrew books:
| Letters | Values |
|---|---|
| א-ט | 1–9 |
| י-צ | 10–90 |
| ק-ת | 100–400 |
GEMATRIA: dict[str, int] = {
"\u05D0": 1, "\u05D1": 2, "\u05D2": 3, "\u05D3": 4,
"\u05D4": 5, "\u05D5": 6, "\u05D6": 7, "\u05D7": 8,
"\u05D8": 9, "\u05D9": 10, "\u05DB": 20, "\u05DC": 30,
"\u05DE": 40, "\u05E0": 50, "\u05E1": 60, "\u05E2": 70,
"\u05E4": 80, "\u05E6": 90, "\u05E7": 100, "\u05E8": 200,
"\u05E9": 300, "\u05EA": 400,
}
def gematria_value(word: str) -> int:
# Calculate the gematria value of a Hebrew word.
return sum(GEMATRIA.get(ch, 0) for ch in word)
# שלום (shalom) = 300 + 30 + 6 + 40 = 376
print(gematria_value("\u05E9\u05DC\u05D5\u05DD")) # 376
Working with Hebrew in Code
Python
import unicodedata
# Modern Hebrew (unpointed)
text = "\u05E9\u05DC\u05D5\u05DD" # שלום (shalom)
print(len(text)) # 4 — one code point per letter
# Strip nikkud from pointed text
def strip_nikkud(text: str) -> str:
# Remove vowel points and cantillation marks.
return "".join(
ch for ch in text
if unicodedata.category(ch) != "Mn"
or not (0x0591 <= ord(ch) <= 0x05C7)
)
pointed = "\u05E9\u05C1\u05B8\u05DC\u05D5\u05B9\u05DD" # שָׁלוֹם
print(strip_nikkud(pointed)) # שלום
JavaScript
// Match Hebrew characters
const hebrewPattern = /\p{Script=Hebrew}/u;
const text = "\u05E9\u05DC\u05D5\u05DD";
console.log(hebrewPattern.test(text)); // true
// Strip nikkud (combining marks in Hebrew range)
function stripNikkud(text) {
return text.normalize("NFD").replace(/[\u0591-\u05C7]/g, "");
}
console.log(stripNikkud("\u05E9\u05C1\u05B8\u05DC\u05D5\u05B9\u05DD"));
// שלום
Summary
Hebrew script combines ancient tradition with modern practicality. Its Unicode encoding handles everything from casual Modern Hebrew text messages to fully pointed and accented biblical manuscripts. Key takeaways:
- Hebrew is an abjad — consonant-only writing with optional vowels (nikkud) encoded as combining marks
- Final forms are separate code points — unlike Arabic contextual shaping, Hebrew final letters (ך, ם, ן, ף, ץ) have their own code points
- Nikkud order matters — follow Unicode canonical ordering: base letter, shin/sin dot, dagesh, vowel, cantillation
- Right-to-left handling requires proper
dir="rtl"attributes and CSS logical properties - Strip nikkud for search — Modern Hebrew text is usually unpointed, so search logic should normalize by removing combining marks
- Yiddish uses Hebrew letters differently — certain letters serve as vowels, and Yiddish has its own ligatures in the Presentation Forms block
- Biblical Hebrew adds cantillation marks on top of nikkud, potentially stacking 3+ combining marks per consonant — ensure adequate line-height
Script Stories içinde daha fazlası
Arabic is the third most widely used writing system in the world, …
Devanagari is an abugida script used to write Hindi, Sanskrit, Marathi, and …
Greek is one of the oldest alphabetic writing systems and gave Unicode …
Cyrillic is used to write Russian, Ukrainian, Bulgarian, Serbian, and over 50 …
Thai is an abugida script with no spaces between words, complex vowel …
Japanese is unique in using three scripts simultaneously — Hiragana, Katakana, and …
Hangul was invented in 1443 by King Sejong as a scientific alphabet …
Bengali is an abugida script with over 300 million speakers, used for …
Tamil is one of the oldest living writing systems, with a literary …
The Armenian alphabet was created in 405 AD by the monk Mesrop …
Georgian has three distinct historical scripts — Mkhedruli, Asomtavruli, and Nuskhuri — …
The Ethiopic script (Ge'ez) is an abugida used to write Amharic, Tigrinya, …
Unicode encodes dozens of historic and extinct scripts — from Cuneiform and …
There are hundreds of writing systems in use around the world today, …