🧱 Block Explorer

Hangul Block

The Hangul Syllables block (U+AC00–U+D7A3) contains 11,172 precomposed Korean syllable blocks algorithmically derived from 19 initial consonants, 21 vowels, and 28 final consonants. This guide explains the structure of Korean Hangul in Unicode, how syllable composition works, and how to handle Korean text in software.

·

Korean is written in Hangul, one of the most systematically designed writing systems ever created. Unlike alphabets that evolved organically over centuries, Hangul was invented in 1443 by King Sejong the Great with a clear structural logic that maps directly onto the Unicode blocks that represent it today. Understanding the Hangul Unicode blocks means understanding how a syllabic alphabet can be both phonetic and algorithmically composable.

The Three Hangul Blocks

Unicode organizes Hangul across three distinct blocks, each serving a different purpose:

Block Range Count Purpose
Hangul Jamo U+1100–U+11FF 256 Combining Jamo for composition
Hangul Compatibility Jamo U+3130–U+318F 96 Standalone display Jamo
Hangul Syllables U+AC00–U+D7AF 11,172 Precomposed syllable blocks

Hangul Jamo: The Building Blocks (U+1100–U+11FF)

Hangul Jamo are the individual phonetic components — consonants and vowels — that combine to form syllable blocks. The block is divided into three classes:

  • Choseong (leading consonants): U+1100–U+1112 — 19 consonants used at the start of a syllable (ㄱ, ㄴ, ㄷ, ...)
  • Jungseong (vowels): U+1161–U+1175 — 21 vowels forming the nucleus (ㅏ, ㅐ, ㅑ, ...)
  • Jongseong (trailing consonants): U+11A8–U+11C2 — 28 consonants (including the null coda) used at the end

These Jamo are combining characters meant for algorithmic composition. When a renderer sees a choseong followed by a jungseong (and optionally a jongseong), it stacks them visually into a single syllable block.

Example: 한 is composed as: - ᄒ U+1112 (Choseong Hieuh) - ᅡ U+1161 (Jungseong A) - ᆫ U+11AB (Jongseong Nieun)

The Algorithmic Composition Formula

The most remarkable feature of the Hangul Syllables block is that every one of its 11,172 code points can be derived mathematically. Unicode defines the syllable index as:

SIndex = (LIndex × 21 + VIndex) × 28 + TIndex
SyllableCodePoint = U+AC00 + SIndex

Where: - LIndex = index of the leading consonant (0–18, 19 possible) - VIndex = index of the vowel (0–20, 21 possible) - TIndex = index of the trailing consonant (0–27, where 0 means no coda)

Total syllables: 19 × 21 × 28 = 11,172

This formula works in both directions. To decompose a precomposed syllable like 글 (U+AE00):

SIndex = U+AE00 - U+AC00 = 512
TIndex = 512 % 28 = 0   (no trailing consonant... wait, let's try 글)

For 글 (U+AE00): - SIndex = 0xAE00 − 0xAC00 = 512 - TIndex = 512 mod 28 = 8 → ᆯ (Rieul) - LVIndex = 512 / 28 = 18 - VIndex = 18 mod 21 = 18 → ᅳ (Eu) - LIndex = 18 / 21 = 0 → ᄀ (Kiyeok)

So 글 = ᄀ + ᅳ + ᆯ, which spells the syllable geul meaning "letter" or "writing."

Hangul Compatibility Jamo (U+3130–U+318F)

This block exists for compatibility with legacy Korean encodings like KS X 1001. While the Jamo in U+1100–U+11FF are combining characters that only render correctly when used in sequence, the Compatibility Jamo are standalone characters that display as individual letters. They are commonly used:

  • In dictionaries and educational materials to show isolated consonants and vowels
  • In keyboard input method displays
  • For labeling consonant and vowel charts

Key examples: - ㄱ U+3131 (Hangul Letter Kiyeok) — standalone form of the consonant - ㅏ U+3161 (Hangul Letter A) — standalone vowel

Note that ㄱ (U+3131) and ᄀ (U+1100) look identical but are different code points with different properties. Compatibility Jamo have the Unicode property Hangul_Syllable_Type=NA and do not participate in algorithmic syllable composition.

Hangul Syllables (U+AC00–U+D7AF)

The Hangul Syllables block contains all 11,172 precomposed syllable blocks in modern Korean. These are the characters you typically see in everyday Korean text. The block is sorted in dictionary order: all syllables beginning with ᄀ come first, then ᄁ, and so on through the 19 leading consonants.

Some frequently encountered syllables:

Character Code Point Romanization Meaning
U+AC00 ga go; house (in some contexts)
U+B098 na I; me
U+C0AC sa four; person; death
U+D55C han Korean; great; one
U+AE00 geul letter; writing

Normalization and NFC vs NFD

Hangul normalization is a key topic in text processing. Unicode defines two equivalent representations for most Korean syllables:

  • NFC (Composed): A single precomposed code point like 한 (U+D55C)
  • NFD (Decomposed): Three combining Jamo like ᄒ U+1112 + ᅡ U+1161 + ᆫ U+11AB

Both represent the same syllable but are different byte sequences. String comparison and search algorithms must normalize to the same form before comparing. In Python:

# import unicodedata
# nfc = unicodedata.normalize('NFC', '\\u1112\\u1161\\u11AB')
# nfd = unicodedata.normalize('NFD', '\\uD55C')
# nfc == nfd  # False until both are normalized to same form

Historical Jamo (U+1160–U+11FF extended and U+A960–U+A97F)

Beyond modern Korean, Unicode also encodes historical Jamo used in Old Korean texts. These include archaic consonants and vowels no longer used in contemporary writing. The Hangul Jamo Extended-A (U+A960–U+A97F) and Extended-B (U+D7B0–U+D7FF) blocks cover these historical forms, supporting scholarly work in classical Korean literature.

Practical Tips for Developers

When working with Hangul in code:

  1. Always normalize to NFC before storing or comparing Korean text
  2. Use len() carefully — a composed syllable is one code point, but its NFD form is 2–3 code points
  3. Regex character classes like [가-힣] match the entire Hangul Syllables block
  4. Sorting Korean text requires locale-aware collation, not simple code point ordering
  5. The syllable composition algorithm can be used to validate whether a sequence of Jamo forms a valid syllable

The elegance of Hangul's Unicode representation reflects the elegance of the script itself — a perfectly logical, mathematically expressible system for phonetic writing.

More in Block Explorer

Basic Latin (ASCII) Block

The Basic Latin block (U+0000–U+007F) is the first Unicode block and covers …

Latin-1 Supplement Block

The Latin-1 Supplement block (U+0080–U+00FF) extends ASCII with accented Latin characters for …

General Punctuation Block

The General Punctuation block (U+2000–U+206F) contains typographic spaces, dashes, quotation marks, and …

Mathematical Operators Block

The Mathematical Operators block (U+2200–U+22FF) contains 256 symbols covering set theory, logic, …

Arrows Block

The Arrows block (U+2190–U+21FF) contains 112 arrow characters including simple directional arrows, …

Dingbats Block

The Dingbats block (U+2700–U+27BF) was created to encode the Zapf Dingbats typeface …

Miscellaneous Symbols Block

The Miscellaneous Symbols block (U+2600–U+26FF) is one of Unicode's most eclectic, containing …

CJK Unified Ideographs Overview

The CJK Unified Ideographs block (U+4E00–U+9FFF) is one of the largest Unicode …

Emoji Blocks Overview

Emoji in Unicode span multiple blocks across the Supplementary Multilingual Plane, including …

Currency Symbols Block

The Currency Symbols block (U+20A0–U+20CF) contains dedicated Unicode characters for currencies that …

Box Drawing & Block Elements Blocks

The Box Drawing block (U+2500–U+257F) and Block Elements block (U+2580–U+259F) provide characters …

Enclosed Alphanumerics Block

The Enclosed Alphanumerics block (U+2460–U+24FF) contains circled numbers, parenthesized numbers and letters, …

Geometric Shapes Blocks

The Geometric Shapes block (U+25A0–U+25FF) and related blocks contain squares, circles, triangles, …

Musical Symbols Block

The Musical Symbols block (U+1D100–U+1D1FF) is a Supplementary Multilingual Plane block containing …