Writing Systems of the World
There are hundreds of writing systems in use around the world today, from alphabets and syllabaries to abjads and abugidas, and Unicode aims to encode all of them in a single standard. This overview explains the major types of writing systems, how they are classified, and which ones are currently supported or missing from Unicode.
Human writing systems are among the greatest inventions in history, and they come in far more varieties than most people realize. The Latin alphabet used by English is just one of many approaches to encoding spoken language in visual form. Across the world and throughout history, humans have developed alphabets, abjads, abugidas, syllabaries, and logographic systems — each with fundamentally different strategies for representing sound and meaning. Understanding these categories is essential for working with Unicode, which encodes all of them. This guide provides a comprehensive overview of the world's writing system types, with examples, comparisons, and Unicode encoding details.
The Five Major Types
Writing systems are classified by what each basic unit represents:
| Type | Unit Represents | Vowels | Example Scripts |
|---|---|---|---|
| Alphabet | Individual consonants AND vowels | Written as full letters | Latin, Greek, Cyrillic, Armenian, Georgian |
| Abjad | Primarily consonants | Omitted or optional diacritics | Arabic, Hebrew, Syriac, Thaana |
| Abugida | Consonant-vowel syllables | Inherent vowel modified by marks | Devanagari, Thai, Ethiopic, Tibetan |
| Syllabary | Whole syllables | Integral to syllable sign | Japanese Kana, Cherokee, Yi |
| Logography | Words or morphemes | N/A (meaning-based) | Chinese characters, ancient Egyptian |
Most real-world scripts are not pure examples of a single type. Japanese uses all five strategies simultaneously. English uses an alphabet but with logographic elements (& for "and"). Korean Hangul is an alphabet whose letters are grouped into syllable blocks. The classification is a spectrum, not a set of rigid boxes.
Alphabets
Definition
An alphabet writes both consonants and vowels as independent, co-equal letters. The word "alphabet" itself comes from the first two letters of the Greek alphabet: alpha (Α) and beta (Β).
Major Alphabets in Unicode
| Script | Unicode Block(s) | Letters | Languages |
|---|---|---|---|
| Latin | Basic Latin, Latin Extended-A/B/Additional | 26 base + hundreds of extended | English, Spanish, French, Turkish, Vietnamese, ... |
| Greek | Greek and Coptic | 24 | Greek |
| Cyrillic | Cyrillic, Cyrillic Supplement/Extended | 33 (Russian) / 74+ total | Russian, Ukrainian, Bulgarian, Serbian, ... |
| Armenian | Armenian | 38 | Armenian |
| Georgian | Georgian, Georgian Extended | 33 active | Georgian |
| Hangul* | Hangul Jamo, Hangul Syllables | 24 jamo → 11,172 syllable blocks | Korean |
Hangul is sometimes classified as a featural alphabet* because its letter shapes systematically reflect the phonetic features of the sounds they represent (place and manner of articulation).
Characteristics
- Vowels and consonants have equal status as letters
- Word spelling consists of a linear sequence of letters
- Typically 20–40 letters
- Case distinctions (uppercase/lowercase) in many alphabets
- Unicode encodes letters individually; rendering is linear
Abjads
Definition
An abjad writes consonants only. Vowels are either omitted entirely or indicated by optional diacritical marks. Readers reconstruct vowels from context and their knowledge of the language. The term "abjad" comes from the Arabic letter names: alif, ba, jim, dal.
Major Abjads in Unicode
| Script | Unicode Block | Consonants | Vowel Marking | Direction |
|---|---|---|---|---|
| Arabic | Arabic, Arabic Supplement/Extended | 28 base | Optional harakat (diacritics) | Right-to-left |
| Hebrew | Hebrew | 22 | Optional nikkud (diacritics) | Right-to-left |
| Syriac | Syriac | 22 | Dots/diacritics | Right-to-left |
| Thaana | Thaana | 24 | Obligatory diacritics | Right-to-left |
Note that Thaana (used for Dhivehi/Maldivian) is sometimes classified as an alphabet rather than an abjad because its vowel diacritics are obligatory, not optional.
How Abjad Vowel Marking Works
Consider the Arabic root k-t-b (كتب), meaning "write":
| Form | Arabic | Vowel Marks | Meaning |
|---|---|---|---|
| Unvoweled | كتب | None | "wrote" / "books" / "was written" (context-dependent) |
| Fully voweled | كَتَبَ | fatHa on each consonant | kataba — "he wrote" |
| Fully voweled | كُتُب | Damma on each | kutub — "books" |
| Fully voweled | كُتِبَ | Mixed | kutiba — "was written" |
In practice, most Arabic text is written without vowel marks. Only the Quran, children's books, poetry, and texts for learners are fully voweled.
Unicode Encoding
In Unicode, abjad vowel marks are encoded as combining characters that follow the base consonant:
# Arabic: consonant + vowel mark (combining character)
# Ba + Fatḥa = "ba" sound
ba = "\u0628" # ARABIC LETTER BA
fatha = "\u064E" # ARABIC FATHAH (combining)
ba_with_fatha = ba + fatha
print(ba_with_fatha) # بَ
print(len(ba_with_fatha)) # 2 code points, 1 grapheme cluster
Abugidas
Definition
An abugida (also called an alphasyllabary) writes consonant-vowel units. Each consonant letter has an inherent vowel (typically /a/), and other vowels are indicated by modifying the consonant sign — adding marks above, below, before, or after. The term comes from Ethiopic letter names: a, bu, gi, da.
Major Abugidas in Unicode
| Script | Unicode Block(s) | Inherent Vowel | Languages |
|---|---|---|---|
| Devanagari | Devanagari | /a/ | Hindi, Sanskrit, Marathi, Nepali |
| Bengali | Bengali | /a/ or /o/ | Bengali, Assamese |
| Tamil | Tamil | /a/ | Tamil |
| Thai | Thai | /o/ (context-dependent) | Thai |
| Tibetan | Tibetan | /a/ | Tibetan, Dzongkha |
| Ethiopic | Ethiopic | /a/ (1st order) | Amharic, Tigrinya |
| Khmer | Khmer | /a/ or /o/ | Khmer (Cambodian) |
How Abugidas Work
Using Devanagari (Hindi) as an example:
| Written | Code Points | Sound | Explanation |
|---|---|---|---|
| क | U+0915 | /ka/ | Consonant with inherent vowel /a/ |
| कि | U+0915 U+093F | /ki/ | Vowel sign appears BEFORE consonant |
| कु | U+0915 U+0941 | /ku/ | Vowel sign appears BELOW consonant |
| की | U+0915 U+0940 | /ki:/ | Long vowel sign AFTER consonant |
| क् | U+0915 U+094D | /k/ | Virama "kills" inherent vowel |
The virama (halant) is crucial — it suppresses the inherent vowel, allowing bare consonants and consonant clusters (conjuncts). Rendering engines must handle complex rules for when to show a virama and when to form a ligature.
Encoding Difference: Indic vs. Ethiopic
Indic abugidas use base consonant + combining vowel marks (compositional encoding). Ethiopic uses precomposed syllable characters (each consonant-vowel combination is a separate code point). Both are abugidas, but their Unicode encodings follow different strategies.
Syllabaries
Definition
A syllabary has one distinct symbol for each syllable in the language. There is no systematic relationship between the sign for "ka" and the signs for "ki" or "ta" — each is an independent symbol.
Major Syllabaries in Unicode
| Script | Unicode Block | Syllables | Languages |
|---|---|---|---|
| Hiragana | Hiragana | 46 base | Japanese (native words) |
| Katakana | Katakana | 46 base | Japanese (loanwords, emphasis) |
| Cherokee | Cherokee | 85 | Cherokee |
| Yi | Yi Syllables | 1,165 | Yi (Nuosu), China |
| Cypriot | Cypriot Syllabary | 55 | Ancient Cypriot Greek |
| Linear B | Linear B Syllabary | 87 | Mycenaean Greek |
Japanese Kana
Japanese Hiragana and Katakana are the best-known modern syllabaries. Each has 46 base characters representing CV (consonant-vowel) syllables:
| Hiragana | Katakana | Syllable |
|---|---|---|
| あ | ア | a |
| か | カ | ka |
| さ | サ | sa |
| た | タ | ta |
| な | ナ | na |
Additional syllables are created through dakuten (゛, voicing mark) and handakuten (゜, semi-voicing mark): か (ka) → が (ga), は (ha) → ば (ba) → ぱ (pa).
Cherokee Syllabary
Created by Sequoyah in 1821, the Cherokee syllabary is one of the few modern writing systems invented by a single individual. Its 85 characters represent the syllables of the Cherokee language. Remarkably, Sequoyah was illiterate in English when he created the system — some letters resemble Latin letters but represent completely different sounds.
Logographies
Definition
A logographic system uses symbols that represent words or morphemes (meaningful units) rather than sounds. The best-known example is Chinese characters (hanzi/kanji/hanja).
CJK in Unicode
| Block Family | Code Points | Languages |
|---|---|---|
| CJK Unified Ideographs | 97,000+ (across multiple blocks) | Chinese, Japanese, Korean, Vietnamese |
| CJK Compatibility Ideographs | ~1,000 | Legacy compatibility |
| CJK Radicals | ~300 | Components/indexing |
CJK characters are the largest single category in Unicode. The Han Unification principle merges characters that are considered the same across Chinese, Japanese, Korean, and Vietnamese traditions into single code points, despite visual variations between regional typefaces.
How Logographies Work
Each character carries meaning and (in Chinese) a pronunciation:
| Character | Meaning | Mandarin | Japanese On/Kun | Korean |
|---|---|---|---|---|
| 山 | mountain | shan | san / yama | san |
| 水 | water | shui | sui / mizu | su |
| 火 | fire | huo | ka / hi | hwa |
| 木 | tree/wood | mu | moku / ki | mok |
Chinese uses characters almost exclusively. Japanese mixes characters (kanji) with two syllabaries (hiragana and katakana) plus Latin letters (romaji). Korean historically used Chinese characters (hanja) alongside Hangul but now uses Hangul almost exclusively.
Mixed and Hybrid Systems
Many real-world writing practices combine multiple script types:
| Language | Systems Used | Classification |
|---|---|---|
| Japanese | Kanji (logographic) + Hiragana (syllabary) + Katakana (syllabary) + Romaji (alphabet) | Mixed |
| Korean | Hangul (alphabetic syllable blocks) + occasional Hanja (logographic) | Primarily alphabetic |
| Hindi | Devanagari (abugida) + Arabic numerals | Primarily abugida |
| Arabic | Arabic consonants (abjad) + optional vowel marks + Arabic-Indic numerals | Primarily abjad |
Unicode's Approach to Diversity
Unicode handles this diversity through several strategies:
| Strategy | Example |
|---|---|
| Compositional encoding | Indic vowels as combining marks on consonant bases |
| Precomposed encoding | Ethiopic syllables, Hangul syllable blocks |
| Separate blocks per script | Armenian, Georgian, Thai each in dedicated blocks |
| Han Unification | Shared CJK code points across Chinese/Japanese/Korean |
| Properties and algorithms | Script property, bidirectional algorithm, line breaking |
Key Takeaways
- The world's writing systems fall into five major types: alphabets (consonants + vowels as equal letters), abjads (consonants primary, vowels optional), abugidas (consonant-vowel units with inherent vowel), syllabaries (one symbol per syllable), and logographies (symbols for words/morphemes).
- Most scripts are not pure types — Japanese uses all five strategies, Korean is an alphabet arranged in syllable blocks, and Thaana is an abjad with mandatory vowels.
- Unicode handles this diversity through both compositional (base + combining marks) and precomposed (single code point per unit) encoding strategies.
- CJK characters constitute the largest portion of Unicode (97,000+ code points) due to the logographic nature of Chinese, Japanese, and Korean writing.
- Understanding writing system typology is essential for correct text rendering, searching, sorting, and line breaking in internationalized software.
- The Unicode Standard assigns a Script property to every character, enabling programmatic identification of which writing system a character belongs to.
Script Stories의 더 많은 가이드
Arabic is the third most widely used writing system in the world, …
Devanagari is an abugida script used to write Hindi, Sanskrit, Marathi, and …
Greek is one of the oldest alphabetic writing systems and gave Unicode …
Cyrillic is used to write Russian, Ukrainian, Bulgarian, Serbian, and over 50 …
Hebrew is an abjad script written right-to-left, used for Biblical Hebrew, Modern …
Thai is an abugida script with no spaces between words, complex vowel …
Japanese is unique in using three scripts simultaneously — Hiragana, Katakana, and …
Hangul was invented in 1443 by King Sejong as a scientific alphabet …
Bengali is an abugida script with over 300 million speakers, used for …
Tamil is one of the oldest living writing systems, with a literary …
The Armenian alphabet was created in 405 AD by the monk Mesrop …
Georgian has three distinct historical scripts — Mkhedruli, Asomtavruli, and Nuskhuri — …
The Ethiopic script (Ge'ez) is an abugida used to write Amharic, Tigrinya, …
Unicode encodes dozens of historic and extinct scripts — from Cuneiform and …