What is ブロック?

名前付きの連続したコードポイント範囲（例：基本ラテン = U+0000〜U+007F）。Unicode 16.0は336個のブロックを定義し、すべてのコードポイントはちょうど1つのブロックに属します。

What is スクリプト?

文字が属する文字体系（例：ラテン、キリル、漢字）。Unicode 16.0は168個のスクリプトを定義し、Scriptプロパティはセキュリティと混在スクリプト検出に重要です。

What is 基本多言語面 (BMP)?

第0面（U+0000〜U+FFFF）で、ラテン・ギリシャ・キリル・CJK・アラビア文字やほとんどの記号など、最もよく使われる文字を含みます。この面の文字は1つのUTF-16コード単位に収まります。

Unicode 標準

CJK（漢字・かな・ハングル）

中国語・日本語・韓国語 — Unicodeにおける統合漢字ブロックと関連スクリプトをまとめた総称。CJK統合漢字は20,992文字以上を含みます。

2021-12-20 · Updated 2024-11-28

What is CJK?

CJK stands for Chinese, Japanese, and Korean — the three major East Asian languages whose writing systems share a common inventory of logographic characters known as CJK Unified Ideographs. In Unicode, CJK refers specifically to the process of encoding these shared ideographs in a single unified code block rather than encoding separate sets for each language.

The term is sometimes extended to CJKV (adding Vietnamese) since Vietnamese historically used Chữ Nôm, a writing system also based on Chinese ideographs.

The Han Unification Decision

The central and controversial decision in Unicode's CJK handling is Han unification: assigning a single code point to an ideograph that appears in two or more of the CJK writing traditions, even if those traditions render it with slightly different glyph shapes.

For example, the ideograph meaning "country/nation" is: - Chinese: 国 (Simplified) or 國 (Traditional) - Japanese: 国 (Kanji) - Korean: 국 (Hanja, pronounced "guk")

Unicode assigns both Simplified 国 (U+56FD) and Traditional 國 (U+570B) as separate code points because they have distinct shapes. But many other ideographs that differ only in minor glyph details between countries are unified into a single code point, with font selection determining the rendering variant.

CJK Code Blocks in Unicode

Block	Range	Count	Contents
CJK Unified Ideographs	U+4E00–U+9FFF	~20,000	Core Han characters
CJK Extension A	U+3400–U+4DBF	6,592	Rare characters
CJK Extension B	U+20000–U+2A6DF	42,720	Rare/historic (Plane 2)
CJK Extension C	U+2A700–U+2B73F	4,154	Rare
CJK Extension D	U+2B740–U+2B81F	222	Rare
CJK Extension E	U+2B820–U+2CEAF	5,762	Very rare
CJK Extension F	U+2CEB0–U+2EBEF	7,473	Very rare
CJK Extension G	U+30000–U+3134F	4,939	Plane 3, added Unicode 13
CJK Extension H	U+31350–U+323AF	4,192	Plane 3, added Unicode 15
CJK Extension I	U+2EBF0–U+2EE5F	622	Added Unicode 15.1
CJK Compatibility Ideographs	U+F900–U+FAFF	512	Compatibility (avoid)
CJK Radicals Supplement	U+2E80–U+2EFF	128	Dictionary radicals

Total CJK ideographs across all extensions: over 90,000 as of Unicode 16.0.

Hanzi, Kanji, Hanja — One Code Point

The unification means that U+4E2D (中, "middle/China") is simultaneously: - Chinese: 中 (zhōng) - Japanese: 中 (naka/chū) - Korean: 중 (jung)

Which glyph is rendered depends on the font and language context. A Chinese font renders 中 in the standard Chinese style; a Japanese font renders it in the Japanese kanji style. The Unicode Standard includes a set of source separation rules that determined when two apparently similar ideographs should be unified versus kept separate.

CJK Radicals and Strokes

CJK ideographs are built from components called radicals (部首, bùshǒu). Unicode encodes:

Kangxi Radicals (U+2F00–U+2FDF): 214 traditional radicals used in Chinese dictionaries
CJK Radicals Supplement (U+2E80–U+2EFF): Alternative and simplified radical forms
Stroke order: Not encoded in Unicode; defined by external standards per language

Working with CJK in Code

import unicodedata

char = "中"
print(unicodedata.name(char))           # CJK UNIFIED IDEOGRAPH-4E2D
print(unicodedata.category(char))       # Lo (Letter, other)
print(hex(ord(char)))                   # 0x4e2d

# CJK range check
def is_cjk_unified(char: str) -> bool:
    cp = ord(char)
    return 0x4E00 <= cp <= 0x9FFF  # Core block only

def is_cjk_any(char: str) -> bool:
    cp = ord(char)
    ranges = [
        (0x4E00, 0x9FFF),   # Core
        (0x3400, 0x4DBF),   # Extension A
        (0x20000, 0x2A6DF), # Extension B
        (0x2A700, 0x2CEAF), # Extensions C-F
        (0x30000, 0x3134F), # Extension G
    ]
    return any(start <= cp <= end for start, end in ranges)

print(is_cjk_unified("中"))  # True
print(is_cjk_unified("A"))   # False

The Han Unification Controversy

Han unification remains controversial among East Asian scholars and users. Critics argue: - Unifying characters obscures important cultural and linguistic distinctions - Font selection should not determine meaning or provenance - Japanese and Chinese variants of the "same" ideograph can differ in stroke count and form

Defenders argue: - Most unified characters are genuinely identical in meaning and origin - Localization (font + language tags) correctly handles rendering differences - Encoding all variants separately would require tens of thousands more code points

The Unicode Standard's response: the CJK Compatibility Ideographs block (U+F900–U+FAFF) provides pre-unified code points for round-trip compatibility with East Asian legacy encodings. However, these are compatibility characters — they normalize to the unified forms under NFKC/NFKD.

Quick Facts

Property	Value
CJK stands for	Chinese, Japanese, Korean
Core block	U+4E00–U+9FFF (~20,000 ideographs)
Total CJK ideographs (v16.0)	~90,000+ across all extensions
Key decision	Han unification
Largest single extension	Extension B (42,720 in Plane 2)
Newest extension	Extension I (Unicode 15.1)
Script property in UCD	Han
Radical count (Kangxi)	214
Controversial aspect	Han unification merges national variants

Unicode 標準のその他の用語

Han Unification

The process of mapping Chinese, Japanese, and Korean ideographs that share a …

Hangul Jamo

The individual consonant and vowel components (jamo) of the Korean Hangul writing …

ISO 10646 / 万国文字集合

Unicodeと同期している国際標準（ISO/IEC 10646）で、同じ文字目録とコードポイントを定義しますが、Unicodeの追加アルゴリズムやプロパティは含みません。

Unicode

あらゆる文字システムのすべての文字に固有の番号（コードポイント）を割り当てる普遍的文字エンコーディング規格。バージョン16.0には154,998個の割り当て済み文字が含まれます。

Unicode Standard Annex (UAX)

Normative or informative documents that are integral parts of the Unicode Standard. …

Unicode Technical Report (UTR)

Informational documents published by the Unicode Consortium covering specific topics like security …

Unicode コンソーシアム

Unicode標準を開発・維持する非営利団体。Apple・Google・Microsoft・Metaなど多くの企業が会員です。

Unicode スカラー値

サロゲートコードポイント（U+D800〜U+DFFF）を除くすべてのコードポイント。実際の文字を表すことができる有効な値の集合で、合計1,112,064個です。

Unicode バージョン

新しい文字・文字体系・機能を追加するUnicode標準の主要リリース。現在のバージョンはUnicode 16.0（2025年9月）です。

Unicode 安定性ポリシー

一度割り当てられた文字のコードポイントと名前は絶対に変更されないことを保証するポリシー。プロパティは改訂される場合がありますが、割り当ては永続的です。

← 用語集へ