📜 Script Stories

Writing Systems of the World

There are hundreds of writing systems in use around the world today, from alphabets and syllabaries to abjads and abugidas, and Unicode aims to encode all of them in a single standard. This overview explains the major types of writing systems, how they are classified, and which ones are currently supported or missing from Unicode.

·

Human writing systems are among the greatest inventions in history, and they come in far more varieties than most people realize. The Latin alphabet used by English is just one of many approaches to encoding spoken language in visual form. Across the world and throughout history, humans have developed alphabets, abjads, abugidas, syllabaries, and logographic systems — each with fundamentally different strategies for representing sound and meaning. Understanding these categories is essential for working with Unicode, which encodes all of them. This guide provides a comprehensive overview of the world's writing system types, with examples, comparisons, and Unicode encoding details.

The Five Major Types

Writing systems are classified by what each basic unit represents:

Type Unit Represents Vowels Example Scripts
Alphabet Individual consonants AND vowels Written as full letters Latin, Greek, Cyrillic, Armenian, Georgian
Abjad Primarily consonants Omitted or optional diacritics Arabic, Hebrew, Syriac, Thaana
Abugida Consonant-vowel syllables Inherent vowel modified by marks Devanagari, Thai, Ethiopic, Tibetan
Syllabary Whole syllables Integral to syllable sign Japanese Kana, Cherokee, Yi
Logography Words or morphemes N/A (meaning-based) Chinese characters, ancient Egyptian

Most real-world scripts are not pure examples of a single type. Japanese uses all five strategies simultaneously. English uses an alphabet but with logographic elements (& for "and"). Korean Hangul is an alphabet whose letters are grouped into syllable blocks. The classification is a spectrum, not a set of rigid boxes.

Alphabets

Definition

An alphabet writes both consonants and vowels as independent, co-equal letters. The word "alphabet" itself comes from the first two letters of the Greek alphabet: alpha (Α) and beta (Β).

Major Alphabets in Unicode

Script Unicode Block(s) Letters Languages
Latin Basic Latin, Latin Extended-A/B/Additional 26 base + hundreds of extended English, Spanish, French, Turkish, Vietnamese, ...
Greek Greek and Coptic 24 Greek
Cyrillic Cyrillic, Cyrillic Supplement/Extended 33 (Russian) / 74+ total Russian, Ukrainian, Bulgarian, Serbian, ...
Armenian Armenian 38 Armenian
Georgian Georgian, Georgian Extended 33 active Georgian
Hangul* Hangul Jamo, Hangul Syllables 24 jamo → 11,172 syllable blocks Korean

Hangul is sometimes classified as a featural alphabet* because its letter shapes systematically reflect the phonetic features of the sounds they represent (place and manner of articulation).

Characteristics

  • Vowels and consonants have equal status as letters
  • Word spelling consists of a linear sequence of letters
  • Typically 20–40 letters
  • Case distinctions (uppercase/lowercase) in many alphabets
  • Unicode encodes letters individually; rendering is linear

Abjads

Definition

An abjad writes consonants only. Vowels are either omitted entirely or indicated by optional diacritical marks. Readers reconstruct vowels from context and their knowledge of the language. The term "abjad" comes from the Arabic letter names: alif, ba, jim, dal.

Major Abjads in Unicode

Script Unicode Block Consonants Vowel Marking Direction
Arabic Arabic, Arabic Supplement/Extended 28 base Optional harakat (diacritics) Right-to-left
Hebrew Hebrew 22 Optional nikkud (diacritics) Right-to-left
Syriac Syriac 22 Dots/diacritics Right-to-left
Thaana Thaana 24 Obligatory diacritics Right-to-left

Note that Thaana (used for Dhivehi/Maldivian) is sometimes classified as an alphabet rather than an abjad because its vowel diacritics are obligatory, not optional.

How Abjad Vowel Marking Works

Consider the Arabic root k-t-b (كتب), meaning "write":

Form Arabic Vowel Marks Meaning
Unvoweled كتب None "wrote" / "books" / "was written" (context-dependent)
Fully voweled كَتَبَ fatHa on each consonant kataba — "he wrote"
Fully voweled كُتُب Damma on each kutub — "books"
Fully voweled كُتِبَ Mixed kutiba — "was written"

In practice, most Arabic text is written without vowel marks. Only the Quran, children's books, poetry, and texts for learners are fully voweled.

Unicode Encoding

In Unicode, abjad vowel marks are encoded as combining characters that follow the base consonant:

# Arabic: consonant + vowel mark (combining character)
# Ba + Fatḥa = "ba" sound
ba = "\u0628"        # ARABIC LETTER BA
fatha = "\u064E"     # ARABIC FATHAH (combining)
ba_with_fatha = ba + fatha
print(ba_with_fatha)  # بَ
print(len(ba_with_fatha))  # 2 code points, 1 grapheme cluster

Abugidas

Definition

An abugida (also called an alphasyllabary) writes consonant-vowel units. Each consonant letter has an inherent vowel (typically /a/), and other vowels are indicated by modifying the consonant sign — adding marks above, below, before, or after. The term comes from Ethiopic letter names: a, bu, gi, da.

Major Abugidas in Unicode

Script Unicode Block(s) Inherent Vowel Languages
Devanagari Devanagari /a/ Hindi, Sanskrit, Marathi, Nepali
Bengali Bengali /a/ or /o/ Bengali, Assamese
Tamil Tamil /a/ Tamil
Thai Thai /o/ (context-dependent) Thai
Tibetan Tibetan /a/ Tibetan, Dzongkha
Ethiopic Ethiopic /a/ (1st order) Amharic, Tigrinya
Khmer Khmer /a/ or /o/ Khmer (Cambodian)

How Abugidas Work

Using Devanagari (Hindi) as an example:

Written Code Points Sound Explanation
U+0915 /ka/ Consonant with inherent vowel /a/
कि U+0915 U+093F /ki/ Vowel sign appears BEFORE consonant
कु U+0915 U+0941 /ku/ Vowel sign appears BELOW consonant
की U+0915 U+0940 /ki:/ Long vowel sign AFTER consonant
क् U+0915 U+094D /k/ Virama "kills" inherent vowel

The virama (halant) is crucial — it suppresses the inherent vowel, allowing bare consonants and consonant clusters (conjuncts). Rendering engines must handle complex rules for when to show a virama and when to form a ligature.

Encoding Difference: Indic vs. Ethiopic

Indic abugidas use base consonant + combining vowel marks (compositional encoding). Ethiopic uses precomposed syllable characters (each consonant-vowel combination is a separate code point). Both are abugidas, but their Unicode encodings follow different strategies.

Syllabaries

Definition

A syllabary has one distinct symbol for each syllable in the language. There is no systematic relationship between the sign for "ka" and the signs for "ki" or "ta" — each is an independent symbol.

Major Syllabaries in Unicode

Script Unicode Block Syllables Languages
Hiragana Hiragana 46 base Japanese (native words)
Katakana Katakana 46 base Japanese (loanwords, emphasis)
Cherokee Cherokee 85 Cherokee
Yi Yi Syllables 1,165 Yi (Nuosu), China
Cypriot Cypriot Syllabary 55 Ancient Cypriot Greek
Linear B Linear B Syllabary 87 Mycenaean Greek

Japanese Kana

Japanese Hiragana and Katakana are the best-known modern syllabaries. Each has 46 base characters representing CV (consonant-vowel) syllables:

Hiragana Katakana Syllable
a
ka
sa
ta
na

Additional syllables are created through dakuten (゛, voicing mark) and handakuten (゜, semi-voicing mark): か (ka) → が (ga), は (ha) → ば (ba) → ぱ (pa).

Cherokee Syllabary

Created by Sequoyah in 1821, the Cherokee syllabary is one of the few modern writing systems invented by a single individual. Its 85 characters represent the syllables of the Cherokee language. Remarkably, Sequoyah was illiterate in English when he created the system — some letters resemble Latin letters but represent completely different sounds.

Logographies

Definition

A logographic system uses symbols that represent words or morphemes (meaningful units) rather than sounds. The best-known example is Chinese characters (hanzi/kanji/hanja).

CJK in Unicode

Block Family Code Points Languages
CJK Unified Ideographs 97,000+ (across multiple blocks) Chinese, Japanese, Korean, Vietnamese
CJK Compatibility Ideographs ~1,000 Legacy compatibility
CJK Radicals ~300 Components/indexing

CJK characters are the largest single category in Unicode. The Han Unification principle merges characters that are considered the same across Chinese, Japanese, Korean, and Vietnamese traditions into single code points, despite visual variations between regional typefaces.

How Logographies Work

Each character carries meaning and (in Chinese) a pronunciation:

Character Meaning Mandarin Japanese On/Kun Korean
mountain shan san / yama san
water shui sui / mizu su
fire huo ka / hi hwa
tree/wood mu moku / ki mok

Chinese uses characters almost exclusively. Japanese mixes characters (kanji) with two syllabaries (hiragana and katakana) plus Latin letters (romaji). Korean historically used Chinese characters (hanja) alongside Hangul but now uses Hangul almost exclusively.

Mixed and Hybrid Systems

Many real-world writing practices combine multiple script types:

Language Systems Used Classification
Japanese Kanji (logographic) + Hiragana (syllabary) + Katakana (syllabary) + Romaji (alphabet) Mixed
Korean Hangul (alphabetic syllable blocks) + occasional Hanja (logographic) Primarily alphabetic
Hindi Devanagari (abugida) + Arabic numerals Primarily abugida
Arabic Arabic consonants (abjad) + optional vowel marks + Arabic-Indic numerals Primarily abjad

Unicode's Approach to Diversity

Unicode handles this diversity through several strategies:

Strategy Example
Compositional encoding Indic vowels as combining marks on consonant bases
Precomposed encoding Ethiopic syllables, Hangul syllable blocks
Separate blocks per script Armenian, Georgian, Thai each in dedicated blocks
Han Unification Shared CJK code points across Chinese/Japanese/Korean
Properties and algorithms Script property, bidirectional algorithm, line breaking

Key Takeaways

  • The world's writing systems fall into five major types: alphabets (consonants + vowels as equal letters), abjads (consonants primary, vowels optional), abugidas (consonant-vowel units with inherent vowel), syllabaries (one symbol per syllable), and logographies (symbols for words/morphemes).
  • Most scripts are not pure types — Japanese uses all five strategies, Korean is an alphabet arranged in syllable blocks, and Thaana is an abjad with mandatory vowels.
  • Unicode handles this diversity through both compositional (base + combining marks) and precomposed (single code point per unit) encoding strategies.
  • CJK characters constitute the largest portion of Unicode (97,000+ code points) due to the logographic nature of Chinese, Japanese, and Korean writing.
  • Understanding writing system typology is essential for correct text rendering, searching, sorting, and line breaking in internationalized software.
  • The Unicode Standard assigns a Script property to every character, enabling programmatic identification of which writing system a character belongs to.

Script Stories içinde daha fazlası