📜 Script Stories

Korean Hangul System

Hangul was invented in 1443 by King Sejong as a scientific alphabet where syllable blocks are algorithmically composed from individual jamo (consonants and vowels), a structure Unicode mirrors with both jamo and precomposed syllable encodings. This guide tells the story of Hangul, explains its unique Unicode encoding, and covers Korean text processing.

Published 2023-08-28 · Updated 2025-02-17

Hangul is widely regarded as one of the most scientifically designed writing systems in human history. Created in 1443 by King Sejong the Great of the Joseon Dynasty, Hangul was purpose-built so that "a wise man can learn it in a morning and even a foolish man can learn it in ten days." Unlike scripts that evolved organically over millennia, Hangul was invented with systematic phonological principles — and Unicode's encoding of Hangul reflects this algorithmic design with remarkable elegance. This guide tells the story of Hangul, explains how Unicode encodes it, and covers the technical details of Korean text processing.

The Invention of Hangul

Before Hangul, Korean was written using Chinese characters (hanja), which were poorly suited to Korean grammar and phonology. Literary Chinese was the language of the court and educated elite, leaving the majority of the population effectively illiterate.

In 1443, King Sejong and a team of scholars at the Hall of Worthies (Jiphyeonjeon) created a new alphabet described in the document Hunminjeongeum ("The Correct Sounds for the Instruction of the People"), published in 1446. The script was revolutionary for several reasons:

Featural design: The shapes of consonant letters are based on the position of the tongue, lips, and throat during pronunciation
Systematic vowels: Vowel letters are composed from three elements representing heaven (dot/short stroke), earth (horizontal line), and human (vertical line)
Syllable blocks: Individual letters (jamo) are arranged into square blocks representing syllables, giving text a visual density comparable to Chinese characters

The Structure of Hangul

Jamo: The Building Blocks

Hangul jamo (자모, letters) consist of consonants and vowels:

14 Basic Consonants:

Jamo	Name	Sound	Unicode (Compatibility)
ㄱ	giyeok	g/k	U+3131
ㄴ	nieun	n	U+3134
ㄷ	digeut	d/t	U+3137
ㄹ	rieul	r/l	U+3139
ㅁ	mieum	m	U+3141
ㅂ	bieup	b/p	U+3142
ㅅ	siot	s	U+3145
ㅇ	ieung	ng/silent	U+3147
ㅈ	jieut	j	U+3148
ㅊ	chieut	ch	U+314A
ㅋ	kieuk	k	U+314B
ㅌ	tieut	t	U+314C
ㅍ	pieup	p	U+314D
ㅎ	hieut	h	U+314E

5 Double Consonants: ㄲ, ㄸ, ㅃ, ㅆ, ㅉ (tense/fortis versions)

10 Basic Vowels:

Jamo	Name	Sound	Unicode (Compatibility)
ㅏ	a	/a/	U+314F
ㅑ	ya	/ja/	U+3151
ㅓ	eo	/ʌ/	U+3153
ㅕ	yeo	/jʌ/	U+3155
ㅗ	o	/o/	U+3157
ㅛ	yo	/jo/	U+3159
ㅜ	u	/u/	U+315B
ㅠ	yu	/ju/	U+315D
ㅡ	eu	/ɯ/	U+3161
ㅣ	i	/i/	U+3163

11 Compound Vowels: ㅐ, ㅒ, ㅔ, ㅖ, ㅘ, ㅙ, ㅚ, ㅝ, ㅞ, ㅟ, ㅢ

Syllable Block Composition

Every Korean syllable follows one of two patterns:

CV (Consonant + Vowel): 가 = ㄱ + ㅏ
CVC (Consonant + Vowel + Consonant): 한 = ㅎ + ㅏ + ㄴ

The leading consonant is the choseong (initial), the vowel is the jungseong (medial), and the optional trailing consonant is the jongseong (final). When there is no initial consonant sound, the silent consonant ㅇ serves as a placeholder.

The visual layout of the block depends on the vowel shape:

Vowel Type	Layout	Example
Vertical vowel (ㅏ, ㅓ, ㅣ...)	Consonant left, vowel right	가 (ㄱ + ㅏ)
Horizontal vowel (ㅗ, ㅜ, ㅡ)	Consonant top, vowel bottom	고 (ㄱ + ㅗ)
Compound vowel (ㅘ, ㅝ...)	Mixed arrangement	과 (ㄱ + ㅘ)

When a final consonant (jongseong) is present, it occupies the bottom of the block: 한 = ㅎ (top) + ㅏ (right) + ㄴ (bottom).

Unicode Encoding of Hangul

Unicode provides three separate encodings for Hangul, each serving a different purpose:

1. Precomposed Hangul Syllables (U+AC00 – U+D7A3)

This is the primary block for modern Korean text. It contains 11,172 precomposed syllable blocks — every possible combination of:

19 leading consonants (choseong)
21 medial vowels (jungseong)
28 trailing consonants (jongseong, including "no trailing consonant")

19 × 21 × 28 = 11,172 syllables.

The block is algorithmically organized, meaning you can compute the components of any syllable from its code point:

def decompose_hangul(syllable: str) -> tuple[int, int, int]:
    # Decompose a precomposed Hangul syllable into LVT indices.
    code = ord(syllable) - 0xAC00
    if not (0 <= code < 11172):
        raise ValueError("Not a Hangul syllable")

    trail = code % 28       # Jongseong index (0 = no trailing)
    code = code // 28
    vowel = code % 21       # Jungseong index
    lead = code // 21       # Choseong index

    return lead, vowel, trail

def compose_hangul(lead: int, vowel: int, trail: int = 0) -> str:
    # Compose a Hangul syllable from LVT indices.
    code = 0xAC00 + (lead * 21 + vowel) * 28 + trail
    return chr(code)

# Example: 한 = lead ㅎ (18) + vowel ㅏ (0) + trail ㄴ (4)
print(decompose_hangul("한"))  # (18, 0, 4)
print(compose_hangul(18, 0, 4))  # 한

This algorithmic structure is unique in Unicode — no other script has such a mathematically regular encoding.

2. Hangul Jamo (U+1100 – U+11FF)

This block contains the conjoining jamo — individual consonant and vowel letters that rendering engines combine into syllable blocks on the fly:

Range	Content	Count
U+1100 – U+1112	Leading consonants (choseong)	19
U+1161 – U+1175	Medial vowels (jungseong)	21
U+11A8 – U+11C2	Trailing consonants (jongseong)	27
U+1113 – U+115F	Old Korean leading consonants	Historical
U+1176 – U+11A7	Old Korean medial vowels	Historical
U+11C3 – U+11FF	Old Korean trailing consonants	Historical

When conjoining jamo appear in sequence (L + V or L + V + T), the rendering engine forms them into a syllable block visually. This encoding is essential for representing Old Korean text that uses archaic jamo not covered by the 11,172 precomposed syllables.

3. Hangul Compatibility Jamo (U+3130 – U+318F)

This block contains individual jamo for display as standalone letters (e.g., in dictionaries, linguistics texts, or keyboard labeling). Unlike conjoining jamo, these do not combine into syllable blocks during rendering.

Block	Range	Purpose
Hangul Compatibility Jamo	U+3130 – U+318F	Standalone display
Hangul Jamo Extended-A	U+A960 – U+A97F	Old Korean choseong
Hangul Jamo Extended-B	U+D7B0 – U+D7FF	Old Korean jungseong/jongseong

Normalization: NFC vs. NFD

The existence of both precomposed syllables and conjoining jamo means the same Korean text can be represented in two ways:

import unicodedata

# NFC: precomposed (standard for Korean text)
nfc = "한글"
print([f"U+{ord(c):04X}" for c in nfc])
# ['U+D55C', 'U+AE00']  — 2 precomposed syllable code points

# NFD: decomposed into conjoining jamo
nfd = unicodedata.normalize("NFD", nfc)
print([f"U+{ord(c):04X}" for c in nfd])
# ['U+1112', 'U+1161', 'U+11AB', 'U+1100', 'U+1173', 'U+11AF']
# ㅎ + ㅏ + ㄴ + ㄱ + ㅡ + ㄹ  — 6 conjoining jamo

# Both render identically: 한글
print(nfc == nfd)  # False — different code points!
print(nfc == unicodedata.normalize("NFC", nfd))  # True

Always normalize Korean text to NFC for storage and comparison. NFC is the standard form used by Korean operating systems, websites, and databases. macOS file systems notoriously use NFD, which causes filename comparison issues with Korean files.

Korean Text Processing

Jamo Extraction

Extracting individual jamo from precomposed syllables is a common operation for Korean search, phonetic analysis, and input:

CHOSEONG = "ㄱㄲㄴㄷㄸㄹㅁㅂㅃㅅㅆㅇㅈㅉㅊㅋㅌㅍㅎ"
JUNGSEONG = "ㅏㅐㅑㅒㅓㅔㅕㅖㅗㅘㅙㅚㅛㅜㅝㅞㅟㅠㅡㅢㅣ"
JONGSEONG = ("", "ㄱ", "ㄲ", "ㄳ", "ㄴ", "ㄵ", "ㄶ", "ㄷ", "ㄹ",
             "ㄺ", "ㄻ", "ㄼ", "ㄽ", "ㄾ", "ㄿ", "ㅀ", "ㅁ",
             "ㅂ", "ㅄ", "ㅅ", "ㅆ", "ㅇ", "ㅈ", "ㅊ", "ㅋ",
             "ㅌ", "ㅍ", "ㅎ")

def extract_jamo(text: str) -> str:
    # Extract individual jamo from Korean text.
    result = []
    for char in text:
        code = ord(char) - 0xAC00
        if 0 <= code < 11172:
            lead = code // (21 * 28)
            vowel = (code // 28) % 21
            trail = code % 28
            result.append(CHOSEONG[lead])
            result.append(JUNGSEONG[vowel])
            if trail > 0:
                result.append(JONGSEONG[trail])
        else:
            result.append(char)
    return "".join(result)

print(extract_jamo("한글"))  # ㅎㅏㄴㄱㅡㄹ

Initial Consonant Search (초성 검색)

A uniquely Korean feature is choseong search — searching by typing only the initial consonants of each syllable. For example, typing "ㅎㄱ" matches "한글" because ㅎ is the initial of 한 and ㄱ is the initial of 글:

def get_choseong(text: str) -> str:
    # Extract only initial consonants from Korean text.
    result = []
    for char in text:
        code = ord(char) - 0xAC00
        if 0 <= code < 11172:
            lead = code // (21 * 28)
            result.append(CHOSEONG[lead])
        else:
            result.append(char)
    return "".join(result)

def choseong_matches(query: str, target: str) -> bool:
    # Check if a choseong query matches the target string.
    target_choseong = get_choseong(target)
    return query in target_choseong

print(choseong_matches("ㅎㄱ", "한글"))  # True
print(choseong_matches("ㄷㅎ", "대한민국"))  # True

This feature is implemented in virtually every Korean search engine, address book, and autocomplete system.

Sorting Korean Text

Korean collation sorts by syllable block in dictionary order: first by choseong, then jungseong, then jongseong. Because precomposed Hangul syllables (U+AC00–U+D7A3) are arranged in exactly this order, simple code point sorting produces correct Korean dictionary order — a direct benefit of the algorithmic encoding:

words = ["바나나", "가나다", "사과", "나무"]
sorted_words = sorted(words)
print(sorted_words)  # ['가나다', '나무', '바나나', '사과'] — correct!

JavaScript Considerations

// Hangul syllable detection
function isHangulSyllable(char) {
  const code = char.codePointAt(0);
  return code >= 0xAC00 && code <= 0xD7A3;
}

// Decompose syllable
function decomposeHangul(syllable) {
  const code = syllable.codePointAt(0) - 0xAC00;
  const trail = code % 28;
  const vowel = Math.floor(code / 28) % 21;
  const lead = Math.floor(code / (28 * 21));
  return { lead, vowel, trail };
}

// String length is straightforward — each precomposed syllable = 1 code unit
console.log("한글".length); // 2

macOS NFD Problem

Apple's HFS+ and APFS file systems store filenames in a variant of NFD normalization. This means a file named "한글.txt" created on macOS is stored as a sequence of conjoining jamo, not precomposed syllables. When this filename is transferred to Windows or Linux (which expect NFC), comparison and lookup can fail:

import os
import unicodedata

# Filenames from macOS may be in NFD
for name in os.listdir("."):
    normalized = unicodedata.normalize("NFC", name)
    if name != normalized:
        print(f"NFD filename detected: {name!r}")
        print(f"  NFC equivalent: {normalized!r}")

Always normalize filenames to NFC when processing Korean text across platforms.

Key Takeaways

Hangul is a featural alphabet invented in 1443, where letter shapes encode phonetic features and jamo combine into syllable blocks.
Unicode provides three encodings: precomposed syllables (11,172 at U+AC00–U+D7A3), conjoining jamo (U+1100–U+11FF), and compatibility jamo (U+3130–U+318F).
The precomposed block is algorithmically structured: syllable = 0xAC00 + (lead
21 + vowel) * 28 + trail — enabling decomposition/composition without lookup tables.
NFC normalization is essential for Korean text — macOS uses NFD for filenames, causing cross-platform comparison issues.
Choseong search (초성 검색) is a distinctly Korean text feature that relies on extracting initial consonants from the algorithmic encoding.
Simple code point sorting produces correct Korean dictionary order, thanks to the mathematical arrangement of the precomposed syllable block.