Han Unification
Embed This Widget
Add the script tag and a data attribute to embed this widget.
Embed via iframe for maximum compatibility.
<iframe src="https://unicodefyi.com/iframe/glossary/han-unification/" width="420" height="400" frameborder="0" style="border:0;border-radius:10px;max-width:100%" loading="lazy"></iframe>
Paste this URL in WordPress, Medium, or any oEmbed-compatible platform.
https://unicodefyi.com/glossary/han-unification/
Add a dynamic SVG badge to your README or docs.
[](https://unicodefyi.com/glossary/han-unification/)
Use the native HTML custom element.
The process of mapping Chinese, Japanese, and Korean ideographs that share a common historical origin to a single Unicode code point, despite regional glyph variations.
What is Han Unification?
Han Unification, formally known as CJK Unified Ideographs, is the process by which the Unicode Consortium assigned single code points to Chinese, Japanese, and Korean (CJK) characters that share the same historical origin and abstract meaning, even though their printed forms can differ significantly across regions. The result is the CJK Unified Ideographs block (U+4E00–U+9FFF) and several extension blocks (Extension A through Extension I), together containing over 98,000 ideographs.
The core principle is straightforward: if two characters in Chinese, Japanese, and Korean descend from the same historical Chinese character and carry the same semantic meaning, they are unified into a single code point. A reader in Beijing, Tokyo, or Seoul would recognize the same abstract concept, even if their local typographic tradition renders the strokes slightly differently.
The Controversy: Source Separation vs. Unification
Han Unification has been one of the most debated decisions in Unicode history. Critics, especially from Japan, argue that the policy conflates characters that Japanese users consider distinct. The classic example is the character for "grass radical" — in Japanese printing it traditionally appears as a three-stroke form, while in Chinese it appears as a four-stroke form. Unifying them onto one code point means selecting which glyph to display falls entirely to the font, not to the text itself.
Proponents argue that encoding every regional glyph variant as a separate code point would multiply the size of the character set many times over and that Han Unification mirrors how Latin-script readers accept that the same letter looks different in different typefaces.
The controversy gave rise to formal mechanisms to handle legitimate distinctions:
- IRG (Ideographic Rapporteur Group): The ISO/IEC working group that advises Unicode on CJK matters, comprising national representatives from China, Taiwan, Japan, Korea, Vietnam, and others. The IRG reviews proposals for new ideographs, verifies source references, and mediates unification disputes.
- IVD (Ideographic Variation Database): A Unicode-registered database of Variation Sequences that allow text to specify which regional glyph variant is intended. A base character followed by a Variation Selector (U+E0100–U+E01EF for ideographic variation) unambiguously selects a specific glyph. For example, the sequence U+82F1 U+E0101 requests a specific Japanese form of the character for "England/hero" that differs from the default Chinese form.
Technical Implications
When working with CJK text programmatically, Han Unification has several practical consequences:
- Font selection is semantically significant: A document using Chinese characters must use a Chinese-locale font to render correctly. The same bytes rendered with a Japanese font may display noticeably different glyphs.
- Locale metadata matters: The
langattribute in HTML (lang="ja"vs.lang="zh") tells the browser which font to prefer, directly affecting how unified ideographs appear. - Variation sequences must be preserved: Text processing pipelines that strip non-printing characters can inadvertently destroy intentional glyph disambiguation encoded via variation selectors.
Extension Blocks
Unicode has added CJK extensions as more historical and regional characters were identified:
| Block | Range | Count |
|---|---|---|
| CJK Unified Ideographs | U+4E00–U+9FFF | 20,902 |
| CJK Extension A | U+3400–U+4DBF | 6,592 |
| CJK Extension B | U+20000–U+2A6DF | 42,718 |
| CJK Extension C–I | Various | ~35,000+ |
Quick Facts
| Property | Value |
|---|---|
| Primary block | CJK Unified Ideographs U+4E00–U+9FFF |
| Total unified ideographs | ~98,000+ across all extension blocks |
| Governing body | Ideographic Rapporteur Group (IRG) |
| Glyph disambiguation mechanism | Ideographic Variation Database (IVD) |
| Variation selectors range | U+E0100–U+E01EF (Supplemental) |
| Key controversy | Regional glyph differences unified onto single code points |
| Related CSS property | lang attribute triggers font selection |
関連用語
Unicode 標準 のその他の用語
中国語・日本語・韓国語 — Unicodeにおける統合漢字ブロックと関連スクリプトをまとめた総称。CJK統合漢字は20,992文字以上を含みます。
The individual consonant and vowel components (jamo) of the Korean Hangul writing …
Unicodeと同期している国際標準(ISO/IEC 10646)で、同じ文字目録とコードポイントを定義しますが、Unicodeの追加アルゴリズムやプロパティは含みません。
あらゆる文字システムのすべての文字に固有の番号(コードポイント)を割り当てる普遍的文字エンコーディング規格。バージョン16.0には154,998個の割り当て済み文字が含まれます。
Normative or informative documents that are integral parts of the Unicode Standard. …
Informational documents published by the Unicode Consortium covering specific topics like security …
Unicode標準を開発・維持する非営利団体。Apple・Google・Microsoft・Metaなど多くの企業が会員です。
サロゲートコードポイント(U+D800〜U+DFFF)を除くすべてのコードポイント。実際の文字を表すことができる有効な値の集合で、合計1,112,064個です。
新しい文字・文字体系・機能を追加するUnicode標準の主要リリース。現在のバージョンはUnicode 16.0(2025年9月)です。
一度割り当てられた文字のコードポイントと名前は絶対に変更されないことを保証するポリシー。プロパティは改訂される場合がありますが、割り当ては永続的です。