Writing Systems of the World · 第 4 章
The Korean Hangul: An Alphabet Designed by a King
Hangul was created in 1443 by King Sejong the Great as a scientifically designed alphabet. This chapter explores its brilliant design, jamo composition, syllable blocks, and algorithmic Unicode encoding.
In 1443, King Sejong the Great of the Joseon Dynasty convened a royal committee and charged it with an extraordinary task: to invent a new writing system for the Korean language. The result, completed in 1446 and announced in a document called Hunminjeongeum ("The Proper Sounds for the Instruction of the People"), was Hangul — arguably the only major writing system in history whose creation can be attributed to a specific committee, a specific date, and a specific set of documented design principles. Today it serves as the national script of both Koreas, is considered one of the most linguistically rational writing systems ever devised, and presents Unicode with a unique algorithmic structure unlike any other script.
The Scientific Design of Hangul
Before Hangul, educated Koreans wrote using Classical Chinese characters (Hanja), a system that required years of study and effectively restricted literacy to the aristocratic class. King Sejong's motivation was explicitly democratizing: Hunminjeongeum states that the common people, unable to express their thoughts in Classical Chinese, "have much to say but cannot." Hangul was designed to be learnable in days, not years.
The design principles are remarkable for their linguistic sophistication:
Consonant shapes mimic the vocal tract. The letter ㄱ (g/k) is shaped like the back of the tongue touching the velum. ㄴ (n) represents the tongue tip touching the alveolar ridge. ㅁ (m) represents the closed lips. ㅅ (s) represents the teeth. ㅇ (ng/silent) represents the throat. More complex consonants are derived from these basic shapes by adding strokes: ㅋ (k, aspirated) is ㄱ plus a stroke; ㄲ (tensed k) doubles ㄱ.
Vowels encode cosmological principles. The three basic vowel strokes represent heaven (·), earth (ㅡ), and man (ㅣ) — drawn from Neo-Confucian cosmology. All other vowels are derived by combining these: ㅏ (a) = ㅣ + ·, ㅗ (o) = ㅡ + ·, and so on.
Syllable blocks. Rather than writing a linear sequence of letters like Latin or Arabic, Hangul groups the letters of each syllable into a square block. The syllable 한 (han) contains three letters — ㅎ, ㅏ, ㄴ — arranged spatially: ㅎ (initial consonant) in the upper left, ㅏ (vowel) on the right, ㄴ (final consonant) below.
Jamo: The Building Blocks
The individual letters of Hangul are called jamo (자모). There are:
- 19 initial consonants (초성, choseong): ㄱ ㄲ ㄴ ㄷ ㄸ ㄹ ㅁ ㅂ ㅃ ㅅ ㅆ ㅇ ㅈ ㅉ ㅊ ㅋ ㅌ ㅍ ㅎ
- 21 vowels (중성, jungseong): ㅏ ㅐ ㅑ ㅒ ㅓ ㅔ ㅕ ㅖ ㅗ ㅘ ㅙ ㅚ ㅛ ㅜ ㅝ ㅞ ㅟ ㅠ ㅡ ㅢ ㅣ
- 27 final consonants (종성, jongseong): ㄱ ㄲ ㄳ ㄴ ㄵ ㄶ ㄷ ㄹ ㄺ ㄻ ㄼ ㄽ ㄾ ㄿ ㅀ ㅁ ㅂ ㅄ ㅅ ㅆ ㅇ ㅈ ㅊ ㅋ ㅌ ㅍ ㅎ
Note that some jongseong values are consonant clusters (like ㄳ = ㄱ+ㅅ, ㄵ = ㄴ+ㅈ), which appear as the final consonant of a syllable block.
Unicode Hangul Blocks
Unicode encodes Hangul across three primary blocks:
| Block | Range | Characters | Description |
|---|---|---|---|
| Hangul Jamo | U+1100–U+11FF | 256 | Archaic and modern jamo (combining) |
| Hangul Compatibility Jamo | U+3130–U+318F | 94 | Non-combining jamo for display |
| Hangul Jamo Extended-A | U+A960–U+A97F | 29 | Old Korean initial consonants |
| Hangul Syllables | U+AC00–U+D7A3 | 11,172 | All modern precomposed syllables |
| Hangul Jamo Extended-B | U+D7B0–U+D7FF | 72 | Old Korean jamo |
The Hangul Syllables block (U+AC00–U+D7A3) is perhaps the most algorithmically elegant section of the entire Unicode standard.
The Algorithmic Formula: Deriving Code Points
Modern Korean uses a subset of all possible jamo combinations. The 11,172 precomposed syllable code points in the Hangul Syllables block are organized according to a precise mathematical formula.
Given the indices: - L = initial consonant index (0–18, for 19 choseong values) - V = vowel index (0–20, for 21 jungseong values) - T = final consonant index (0–27, where 0 = no final consonant)
The code point for a syllable is:
code_point = 0xAC00 + (L × 21 + V) × 28 + T
For example, the syllable 한 (han): - ㅎ is initial consonant index 18 - ㅏ is vowel index 0 - ㄴ is final consonant index 4
0xAC00 + (18 × 21 + 0) × 28 + 4
= 0xAC00 + (378) × 28 + 4
= 0xAC00 + 10584 + 4
= 0xAC00 + 10588
= 0xAC00 + 0x295C
= 0xD55C
Indeed, U+D55C is 한. This means any Unicode implementation can decompose a precomposed Hangul syllable into its constituent jamo entirely through arithmetic — no lookup tables needed. This is why Unicode's canonical decomposition for Hangul is defined algorithmically rather than by explicit mappings in the Unicode Character Database.
Compatibility Jamo vs. Combining Jamo
The Hangul Compatibility Jamo block (U+3130–U+318F) exists for historical reasons. These are non-combining forms — each code point represents a jamo as a standalone character for display purposes (e.g., on Korean keyboards, dictionary entries, or labels). They should not be used to compose syllables via string concatenation; that role belongs to the combining jamo in U+1100–U+11FF.
This distinction matters for: - Input method editors: IMEs typically work with compatibility jamo for display and combining jamo for internal composition - Normalization: NFC/NFD rules apply to combining jamo but not compatibility jamo - Search: Searching for ㄱ (U+3131, compatibility) will not find syllables containing the combining jamo U+1100
Old Korean and Archaic Jamo
Classical Korean used additional consonant and vowel sounds no longer present in modern Korean. The Hunminjeongeum document itself uses letters that fell out of use within a century or two. Unicode encodes these archaic jamo in Hangul Jamo Extended-A and Extended-B, and the regular Hangul Jamo block contains slots for archaic forms as well.
Scholars studying Old Korean literature, historical linguistics, and classical texts require these archaic forms. Digital editions of the Joseon Annals and other historical documents increasingly use Unicode's archaic Hangul to represent the original text faithfully.
Hangul in Modern Computing
Korean computing presents some distinctive challenges:
Syllable composition during input: As a user types jamo on a Korean keyboard, the IME must compose syllables in real time. Typing ㅎ → ㅏ → ㄴ should produce the single syllable block 한, not three separate jamo. This requires the IME to maintain state about the current syllable being composed.
Text segmentation: Unlike Chinese, Korean uses spaces between words — but the boundaries of "words" in Korean grammar are complex, and spacing errors are common in informal writing. NLP tools for Korean must handle both correctly spaced and run-together text.
Sorting (collation): Korean collation is straightforward for modern text: sort by code point order in the Syllables block, which already follows the phonetic order of initial consonant → vowel → final consonant. But for mixed modern/archaic Korean, collation becomes more complex.
Hanja coexistence: Modern Korean uses Hangul almost exclusively, but Hanja (Chinese characters) still appear in formal documents, newspapers, personal names, and academic writing. Korean systems must support full CJK rendering alongside Hangul.
The story of Hangul is ultimately a story about intentional design triumphing over historical accident. In an era when most writing systems evolved organically over millennia, one monarch and his scholars sat down and engineered a phonetically precise, geometrically elegant, and computationally elegant script. Five and a half centuries later, its mathematical elegance helps a Unicode algorithm decompose it with a single arithmetic formula.