📜 Script Stories

Writing Systems of the World

There are hundreds of writing systems in use around the world today, from alphabets and syllabaries to abjads and abugidas, and Unicode aims to encode all of them in a single standard. This overview explains the major types of writing systems, how they are classified, and which ones are currently supported or missing from Unicode.

Published 2024-01-15 · Updated 2025-10-06

Human writing systems are among the greatest inventions in history, and they come in far more varieties than most people realize. The Latin alphabet used by English is just one of many approaches to encoding spoken language in visual form. Across the world and throughout history, humans have developed alphabets, abjads, abugidas, syllabaries, and logographic systems — each with fundamentally different strategies for representing sound and meaning. Understanding these categories is essential for working with Unicode, which encodes all of them. This guide provides a comprehensive overview of the world's writing system types, with examples, comparisons, and Unicode encoding details.

The Five Major Types

Writing systems are classified by what each basic unit represents:

Type	Unit Represents	Vowels	Example Scripts
Alphabet	Individual consonants AND vowels	Written as full letters	Latin, Greek, Cyrillic, Armenian, Georgian
Abjad	Primarily consonants	Omitted or optional diacritics	Arabic, Hebrew, Syriac, Thaana
Abugida	Consonant-vowel syllables	Inherent vowel modified by marks	Devanagari, Thai, Ethiopic, Tibetan
Syllabary	Whole syllables	Integral to syllable sign	Japanese Kana, Cherokee, Yi
Logography	Words or morphemes	N/A (meaning-based)	Chinese characters, ancient Egyptian

Most real-world scripts are not pure examples of a single type. Japanese uses all five strategies simultaneously. English uses an alphabet but with logographic elements (& for "and"). Korean Hangul is an alphabet whose letters are grouped into syllable blocks. The classification is a spectrum, not a set of rigid boxes.

Alphabets

Definition

An alphabet writes both consonants and vowels as independent, co-equal letters. The word "alphabet" itself comes from the first two letters of the Greek alphabet: alpha (Α) and beta (Β).

Major Alphabets in Unicode

Script	Unicode Block(s)	Letters	Languages
Latin	Basic Latin, Latin Extended-A/B/Additional	26 base + hundreds of extended	English, Spanish, French, Turkish, Vietnamese, ...
Greek	Greek and Coptic	24	Greek
Cyrillic	Cyrillic, Cyrillic Supplement/Extended	33 (Russian) / 74+ total	Russian, Ukrainian, Bulgarian, Serbian, ...
Armenian	Armenian	38	Armenian
Georgian	Georgian, Georgian Extended	33 active	Georgian
Hangul*	Hangul Jamo, Hangul Syllables	24 jamo → 11,172 syllable blocks	Korean

Hangul is sometimes classified as a featural alphabet* because its letter shapes systematically reflect the phonetic features of the sounds they represent (place and manner of articulation).

Characteristics

Vowels and consonants have equal status as letters
Word spelling consists of a linear sequence of letters
Typically 20–40 letters
Case distinctions (uppercase/lowercase) in many alphabets
Unicode encodes letters individually; rendering is linear

Abjads

Definition

An abjad writes consonants only. Vowels are either omitted entirely or indicated by optional diacritical marks. Readers reconstruct vowels from context and their knowledge of the language. The term "abjad" comes from the Arabic letter names: alif, ba, jim, dal.

Major Abjads in Unicode

Script	Unicode Block	Consonants	Vowel Marking	Direction
Arabic	Arabic, Arabic Supplement/Extended	28 base	Optional harakat (diacritics)	Right-to-left
Hebrew	Hebrew	22	Optional nikkud (diacritics)	Right-to-left
Syriac	Syriac	22	Dots/diacritics	Right-to-left
Thaana	Thaana	24	Obligatory diacritics	Right-to-left

Note that Thaana (used for Dhivehi/Maldivian) is sometimes classified as an alphabet rather than an abjad because its vowel diacritics are obligatory, not optional.

How Abjad Vowel Marking Works

Consider the Arabic root k-t-b (كتب), meaning "write":

Form	Arabic	Vowel Marks	Meaning
Unvoweled	كتب	None	"wrote" / "books" / "was written" (context-dependent)
Fully voweled	كَتَبَ	fatHa on each consonant	kataba — "he wrote"
Fully voweled	كُتُب	Damma on each	kutub — "books"
Fully voweled	كُتِبَ	Mixed	kutiba — "was written"

In practice, most Arabic text is written without vowel marks. Only the Quran, children's books, poetry, and texts for learners are fully voweled.

Unicode Encoding

In Unicode, abjad vowel marks are encoded as combining characters that follow the base consonant:

# Arabic: consonant + vowel mark (combining character)
# Ba + Fatḥa = "ba" sound
ba = "\u0628"        # ARABIC LETTER BA
fatha = "\u064E"     # ARABIC FATHAH (combining)
ba_with_fatha = ba + fatha
print(ba_with_fatha)  # بَ
print(len(ba_with_fatha))  # 2 code points, 1 grapheme cluster

Abugidas

Definition

An abugida (also called an alphasyllabary) writes consonant-vowel units. Each consonant letter has an inherent vowel (typically /a/), and other vowels are indicated by modifying the consonant sign — adding marks above, below, before, or after. The term comes from Ethiopic letter names: a, bu, gi, da.

Major Abugidas in Unicode

Script	Unicode Block(s)	Inherent Vowel	Languages
Devanagari	Devanagari	/a/	Hindi, Sanskrit, Marathi, Nepali
Bengali	Bengali	/a/ or /o/	Bengali, Assamese
Tamil	Tamil	/a/	Tamil
Thai	Thai	/o/ (context-dependent)	Thai
Tibetan	Tibetan	/a/	Tibetan, Dzongkha
Ethiopic	Ethiopic	/a/ (1st order)	Amharic, Tigrinya
Khmer	Khmer	/a/ or /o/	Khmer (Cambodian)

How Abugidas Work

Using Devanagari (Hindi) as an example:

Written	Code Points	Sound	Explanation
क	U+0915	/ka/	Consonant with inherent vowel /a/
कि	U+0915 U+093F	/ki/	Vowel sign appears BEFORE consonant
कु	U+0915 U+0941	/ku/	Vowel sign appears BELOW consonant
की	U+0915 U+0940	/ki:/	Long vowel sign AFTER consonant
क्	U+0915 U+094D	/k/	Virama "kills" inherent vowel

The virama (halant) is crucial — it suppresses the inherent vowel, allowing bare consonants and consonant clusters (conjuncts). Rendering engines must handle complex rules for when to show a virama and when to form a ligature.

Encoding Difference: Indic vs. Ethiopic

Indic abugidas use base consonant + combining vowel marks (compositional encoding). Ethiopic uses precomposed syllable characters (each consonant-vowel combination is a separate code point). Both are abugidas, but their Unicode encodings follow different strategies.

Syllabaries

Definition

A syllabary has one distinct symbol for each syllable in the language. There is no systematic relationship between the sign for "ka" and the signs for "ki" or "ta" — each is an independent symbol.

Major Syllabaries in Unicode

Script	Unicode Block	Syllables	Languages
Hiragana	Hiragana	46 base	Japanese (native words)
Katakana	Katakana	46 base	Japanese (loanwords, emphasis)
Cherokee	Cherokee	85	Cherokee
Yi	Yi Syllables	1,165	Yi (Nuosu), China
Cypriot	Cypriot Syllabary	55	Ancient Cypriot Greek
Linear B	Linear B Syllabary	87	Mycenaean Greek

Japanese Kana

Japanese Hiragana and Katakana are the best-known modern syllabaries. Each has 46 base characters representing CV (consonant-vowel) syllables:

Hiragana	Katakana	Syllable
あ	ア	a
か	カ	ka
さ	サ	sa
た	タ	ta
な	ナ	na

Additional syllables are created through dakuten (゛, voicing mark) and handakuten (゜, semi-voicing mark): か (ka) → が (ga), は (ha) → ば (ba) → ぱ (pa).

Cherokee Syllabary

Created by Sequoyah in 1821, the Cherokee syllabary is one of the few modern writing systems invented by a single individual. Its 85 characters represent the syllables of the Cherokee language. Remarkably, Sequoyah was illiterate in English when he created the system — some letters resemble Latin letters but represent completely different sounds.

Logographies

Definition

A logographic system uses symbols that represent words or morphemes (meaningful units) rather than sounds. The best-known example is Chinese characters (hanzi/kanji/hanja).

CJK in Unicode

Block Family	Code Points	Languages
CJK Unified Ideographs	97,000+ (across multiple blocks)	Chinese, Japanese, Korean, Vietnamese
CJK Compatibility Ideographs	~1,000	Legacy compatibility
CJK Radicals	~300	Components/indexing

CJK characters are the largest single category in Unicode. The Han Unification principle merges characters that are considered the same across Chinese, Japanese, Korean, and Vietnamese traditions into single code points, despite visual variations between regional typefaces.

How Logographies Work

Each character carries meaning and (in Chinese) a pronunciation:

Character	Meaning	Mandarin	Japanese On/Kun	Korean
山	mountain	shan	san / yama	san
水	water	shui	sui / mizu	su
火	fire	huo	ka / hi	hwa
木	tree/wood	mu	moku / ki	mok

Chinese uses characters almost exclusively. Japanese mixes characters (kanji) with two syllabaries (hiragana and katakana) plus Latin letters (romaji). Korean historically used Chinese characters (hanja) alongside Hangul but now uses Hangul almost exclusively.

Mixed and Hybrid Systems

Many real-world writing practices combine multiple script types:

Language	Systems Used	Classification
Japanese	Kanji (logographic) + Hiragana (syllabary) + Katakana (syllabary) + Romaji (alphabet)	Mixed
Korean	Hangul (alphabetic syllable blocks) + occasional Hanja (logographic)	Primarily alphabetic
Hindi	Devanagari (abugida) + Arabic numerals	Primarily abugida
Arabic	Arabic consonants (abjad) + optional vowel marks + Arabic-Indic numerals	Primarily abjad

Unicode's Approach to Diversity

Unicode handles this diversity through several strategies:

Strategy	Example
Compositional encoding	Indic vowels as combining marks on consonant bases
Precomposed encoding	Ethiopic syllables, Hangul syllable blocks
Separate blocks per script	Armenian, Georgian, Thai each in dedicated blocks
Han Unification	Shared CJK code points across Chinese/Japanese/Korean
Properties and algorithms	Script property, bidirectional algorithm, line breaking

Key Takeaways

The world's writing systems fall into five major types: alphabets (consonants + vowels as equal letters), abjads (consonants primary, vowels optional), abugidas (consonant-vowel units with inherent vowel), syllabaries (one symbol per syllable), and logographies (symbols for words/morphemes).
Most scripts are not pure types — Japanese uses all five strategies, Korean is an alphabet arranged in syllable blocks, and Thaana is an abjad with mandatory vowels.
Unicode handles this diversity through both compositional (base + combining marks) and precomposed (single code point per unit) encoding strategies.
CJK characters constitute the largest portion of Unicode (97,000+ code points) due to the logographic nature of Chinese, Japanese, and Korean writing.
Understanding writing system typology is essential for correct text rendering, searching, sorting, and line breaking in internationalized software.
The Unicode Standard assigns a Script property to every character, enabling programmatic identification of which writing system a character belongs to.