文字を構成要素にマッピングする処理。正規分解は意味を保持し（é → e + ◌́）、互換分解は意味が変わる場合があります（ﬁ → fi）。

What is Unicode 正規化?

Unicodeテキストを標準的な正規形に変換するプロセス。4つの形式：NFC（合成）、NFD（分解）、NFKC（互換合成）、NFKD（互換分解）。

Unicode 標準

Hangul Jamo

The individual consonant and vowel components (jamo) of the Korean Hangul writing system. Unicode encodes both precomposed Hangul syllables (U+AC00–U+D7A3) and decomposed jamo (U+1100–U+11FF).

What is Hangul Jamo?

Hangul is the alphabetic writing system of the Korean language, invented in 1443 by King Sejong the Great. Unlike logographic Chinese characters, Hangul is fully phonetic: each syllable is composed of individual phonetic units called jamo. A jamo is a single consonant or vowel element — similar in concept to a Latin letter — but Hangul syllables are written as compact two-dimensional blocks rather than as linear sequences.

Unicode encodes Hangul across three distinct blocks, each serving a different purpose.

The Three Unicode Hangul Blocks

1. Hangul Jamo (U+1100–U+11FF)

This block contains the individual jamo components in their "combining" form: 19 initial consonants (choseong), 21 vowels (jungseong), and 28 final consonants (jongseong, including a null final). These code points are the raw building blocks. They are not normally displayed in isolation; their purpose is algorithmic syllable composition. A Unicode-conformant renderer receiving a choseong, jungseong, and optional jongseong in sequence will compose and render them as a single syllable block.

2. Hangul Compatibility Jamo (U+3130–U+318F)

This block provides jamo in their standalone "compatibility" form, suitable for display as individual characters — for example, in alphabetical lists, keyboard labels, or dictionary entries. These are distinct from the composing jamo in U+1100. They cannot be algorithmically combined into syllable blocks and are intended for display contexts only. Mixing them with composing jamo can cause unexpected rendering.

3. Hangul Syllables (U+AC00–U+D7AF)

This is the largest of the three blocks, containing all 11,172 precomposed modern Hangul syllable blocks. Every legal combination of initial consonant, vowel, and optional final consonant has a dedicated code point. The block is algorithmically structured: given a syllable's code point S, you can compute its components exactly.

Algorithmic Composition and Decomposition

Unicode defines a precise algorithm for mapping between precomposed syllables (U+AC00 range) and their jamo components (U+1100 range):

# Hangul syllable composition
HANGUL_BASE = 0xAC00
CHOSEONG_COUNT = 19   # initial consonants
JUNGSEONG_COUNT = 21  # vowels
JONGSEONG_COUNT = 28  # final consonants (including null)

def compose_hangul(lead: int, vowel: int, trail: int = 0) -> str:
    # Compose a Hangul syllable from jamo indices (0-based).
    code_point = (
        HANGUL_BASE
        + (lead * JUNGSEONG_COUNT + vowel) * JONGSEONG_COUNT
        + trail
    )
    return chr(code_point)

def decompose_hangul(syllable: str) -> tuple[int, int, int]:
    # Decompose a Hangul syllable to (lead, vowel, trail) indices.
    index = ord(syllable) - HANGUL_BASE
    trail = index % JONGSEONG_COUNT
    vowel = (index // JONGSEONG_COUNT) % JUNGSEONG_COUNT
    lead = index // (JUNGSEONG_COUNT * JONGSEONG_COUNT)
    return lead, vowel, trail

This algorithm underpins NFD/NFC normalization for Korean text and enables efficient Korean text processing without exhaustive lookup tables.

Quick Facts

Property	Value
Invented	1443, King Sejong the Great
Hangul Jamo block	U+1100–U+11FF (combining jamo)
Compatibility Jamo block	U+3130–U+318F (standalone display)
Hangul Syllables block	U+AC00–U+D7AF (11,172 precomposed)
Components	19 initial + 21 vowel + 28 final (incl. null) = 11,172 syllables
Normalization	NFD decomposes U+AC00 syllables to U+1100 jamo
Unicode algorithm	Defined in Chapter 3 of the Unicode Standard

Unicode 標準のその他の用語

CJK（漢字・かな・ハングル）

中国語・日本語・韓国語 — Unicodeにおける統合漢字ブロックと関連スクリプトをまとめた総称。CJK統合漢字は20,992文字以上を含みます。

Han Unification

The process of mapping Chinese, Japanese, and Korean ideographs that share a …

ISO 10646 / 万国文字集合

Unicodeと同期している国際標準（ISO/IEC 10646）で、同じ文字目録とコードポイントを定義しますが、Unicodeの追加アルゴリズムやプロパティは含みません。

Unicode

あらゆる文字システムのすべての文字に固有の番号（コードポイント）を割り当てる普遍的文字エンコーディング規格。バージョン16.0には154,998個の割り当て済み文字が含まれます。

Unicode Standard Annex (UAX)

Normative or informative documents that are integral parts of the Unicode Standard. …

Unicode Technical Report (UTR)

Informational documents published by the Unicode Consortium covering specific topics like security …

Unicode コンソーシアム

Unicode標準を開発・維持する非営利団体。Apple・Google・Microsoft・Metaなど多くの企業が会員です。

Unicode スカラー値

サロゲートコードポイント（U+D800〜U+DFFF）を除くすべてのコードポイント。実際の文字を表すことができる有効な値の集合で、合計1,112,064個です。

Unicode バージョン

新しい文字・文字体系・機能を追加するUnicode標準の主要リリース。現在のバージョンはUnicode 16.0（2025年9月）です。

Unicode 安定性ポリシー

一度割り当てられた文字のコードポイントと名前は絶対に変更されないことを保証するポリシー。プロパティは改訂される場合がありますが、割り当ては永続的です。

← 用語集へ