Китайский, японский и корейский — собирательный термин для унифицированного блока иероглифов хань и связанных письменностей в Unicode. CJK Unified Ideographs содержит 20 992+ символов.

What is Декомпозиция?

Отображение символа на его компоненты. Каноническая декомпозиция сохраняет значение (é → e + ́); совместимая может изменить его (ﬁ → fi).

What is Нормализация?

Процесс преобразования текста Unicode в стандартную каноническую форму. Четыре формы: NFC (скомпонованная), NFD (декомпонованная), NFKC (совместимая скомпонованная), NFKD (совместимая декомпонованная).

Стандарт Unicode

Hangul Jamo

The individual consonant and vowel components (jamo) of the Korean Hangul writing system. Unicode encodes both precomposed Hangul syllables (U+AC00–U+D7A3) and decomposed jamo (U+1100–U+11FF).

What is Hangul Jamo?

Hangul is the alphabetic writing system of the Korean language, invented in 1443 by King Sejong the Great. Unlike logographic Chinese characters, Hangul is fully phonetic: each syllable is composed of individual phonetic units called jamo. A jamo is a single consonant or vowel element — similar in concept to a Latin letter — but Hangul syllables are written as compact two-dimensional blocks rather than as linear sequences.

Unicode encodes Hangul across three distinct blocks, each serving a different purpose.

The Three Unicode Hangul Blocks

1. Hangul Jamo (U+1100–U+11FF)

This block contains the individual jamo components in their "combining" form: 19 initial consonants (choseong), 21 vowels (jungseong), and 28 final consonants (jongseong, including a null final). These code points are the raw building blocks. They are not normally displayed in isolation; their purpose is algorithmic syllable composition. A Unicode-conformant renderer receiving a choseong, jungseong, and optional jongseong in sequence will compose and render them as a single syllable block.

2. Hangul Compatibility Jamo (U+3130–U+318F)

This block provides jamo in their standalone "compatibility" form, suitable for display as individual characters — for example, in alphabetical lists, keyboard labels, or dictionary entries. These are distinct from the composing jamo in U+1100. They cannot be algorithmically combined into syllable blocks and are intended for display contexts only. Mixing them with composing jamo can cause unexpected rendering.

3. Hangul Syllables (U+AC00–U+D7AF)

This is the largest of the three blocks, containing all 11,172 precomposed modern Hangul syllable blocks. Every legal combination of initial consonant, vowel, and optional final consonant has a dedicated code point. The block is algorithmically structured: given a syllable's code point S, you can compute its components exactly.

Algorithmic Composition and Decomposition

Unicode defines a precise algorithm for mapping between precomposed syllables (U+AC00 range) and their jamo components (U+1100 range):

# Hangul syllable composition
HANGUL_BASE = 0xAC00
CHOSEONG_COUNT = 19   # initial consonants
JUNGSEONG_COUNT = 21  # vowels
JONGSEONG_COUNT = 28  # final consonants (including null)

def compose_hangul(lead: int, vowel: int, trail: int = 0) -> str:
    # Compose a Hangul syllable from jamo indices (0-based).
    code_point = (
        HANGUL_BASE
        + (lead * JUNGSEONG_COUNT + vowel) * JONGSEONG_COUNT
        + trail
    )
    return chr(code_point)

def decompose_hangul(syllable: str) -> tuple[int, int, int]:
    # Decompose a Hangul syllable to (lead, vowel, trail) indices.
    index = ord(syllable) - HANGUL_BASE
    trail = index % JONGSEONG_COUNT
    vowel = (index // JONGSEONG_COUNT) % JUNGSEONG_COUNT
    lead = index // (JUNGSEONG_COUNT * JONGSEONG_COUNT)
    return lead, vowel, trail

This algorithm underpins NFD/NFC normalization for Korean text and enables efficient Korean text processing without exhaustive lookup tables.

Quick Facts

Property	Value
Invented	1443, King Sejong the Great
Hangul Jamo block	U+1100–U+11FF (combining jamo)
Compatibility Jamo block	U+3130–U+318F (standalone display)
Hangul Syllables block	U+AC00–U+D7AF (11,172 precomposed)
Components	19 initial + 21 vowel + 28 final (incl. null) = 11,172 syllables
Normalization	NFD decomposes U+AC00 syllables to U+1100 jamo
Unicode algorithm	Defined in Chapter 3 of the Unicode Standard

Связанные термины

CJK Декомпозиция Нормализация

Ещё в Стандарт Unicode

Basic Multilingual Plane (BMP)

Плоскость 0 (U+0000–U+FFFF), содержащая наиболее употребительные символы, включая латиницу, греческий, кириллицу, CJK, …

CJK

Китайский, японский и корейский — собирательный термин для унифицированного блока иероглифов хань …

Han Unification

The process of mapping Chinese, Japanese, and Korean ideographs that share a …

ISO 10646 / Universal Character Set

Международный стандарт (ISO/IEC 10646), синхронизированный с Unicode, определяющий тот же репертуар символов …

Unicode

Универсальный стандарт кодирования символов, присваивающий уникальный номер (code point) каждому символу во …

Unicode Character Database (UCD)

Машиночитаемая коллекция файлов данных, определяющих все свойства символов Unicode, включая UnicodeData.txt, Blocks.txt, …

Unicode Standard Annex (UAX)

Normative or informative documents that are integral parts of the Unicode Standard. …

Unicode Technical Report (UTR)

Informational documents published by the Unicode Consortium covering specific topics like security …

Абстрактный символ

Единица информации для организации, управления или представления текстовых данных — концептуальная сущность …

Версия Unicode

Основные выпуски стандарта Unicode, каждый добавляющий новые символы, письменности и функции. Текущая …

← Вернуться к глоссарию