CJK(中日韩)
Embed This Widget
Add the script tag and a data attribute to embed this widget.
Embed via iframe for maximum compatibility.
<iframe src="https://unicodefyi.com/iframe/glossary/cjk/" width="420" height="400" frameborder="0" style="border:0;border-radius:10px;max-width:100%" loading="lazy"></iframe>
Paste this URL in WordPress, Medium, or any oEmbed-compatible platform.
https://unicodefyi.com/glossary/cjk/
Add a dynamic SVG badge to your README or docs.
[](https://unicodefyi.com/glossary/cjk/)
Use the native HTML custom element.
中日韩——Unicode中统一汉字区块及相关文字系统的统称,CJK统一表意文字包含20,992个以上字符。
What is CJK?
CJK stands for Chinese, Japanese, and Korean — the three major East Asian languages whose writing systems share a common inventory of logographic characters known as CJK Unified Ideographs. In Unicode, CJK refers specifically to the process of encoding these shared ideographs in a single unified code block rather than encoding separate sets for each language.
The term is sometimes extended to CJKV (adding Vietnamese) since Vietnamese historically used Chữ Nôm, a writing system also based on Chinese ideographs.
The Han Unification Decision
The central and controversial decision in Unicode's CJK handling is Han unification: assigning a single code point to an ideograph that appears in two or more of the CJK writing traditions, even if those traditions render it with slightly different glyph shapes.
For example, the ideograph meaning "country/nation" is: - Chinese: 国 (Simplified) or 國 (Traditional) - Japanese: 国 (Kanji) - Korean: 국 (Hanja, pronounced "guk")
Unicode assigns both Simplified 国 (U+56FD) and Traditional 國 (U+570B) as separate code points because they have distinct shapes. But many other ideographs that differ only in minor glyph details between countries are unified into a single code point, with font selection determining the rendering variant.
CJK Code Blocks in Unicode
| Block | Range | Count | Contents |
|---|---|---|---|
| CJK Unified Ideographs | U+4E00–U+9FFF | ~20,000 | Core Han characters |
| CJK Extension A | U+3400–U+4DBF | 6,592 | Rare characters |
| CJK Extension B | U+20000–U+2A6DF | 42,720 | Rare/historic (Plane 2) |
| CJK Extension C | U+2A700–U+2B73F | 4,154 | Rare |
| CJK Extension D | U+2B740–U+2B81F | 222 | Rare |
| CJK Extension E | U+2B820–U+2CEAF | 5,762 | Very rare |
| CJK Extension F | U+2CEB0–U+2EBEF | 7,473 | Very rare |
| CJK Extension G | U+30000–U+3134F | 4,939 | Plane 3, added Unicode 13 |
| CJK Extension H | U+31350–U+323AF | 4,192 | Plane 3, added Unicode 15 |
| CJK Extension I | U+2EBF0–U+2EE5F | 622 | Added Unicode 15.1 |
| CJK Compatibility Ideographs | U+F900–U+FAFF | 512 | Compatibility (avoid) |
| CJK Radicals Supplement | U+2E80–U+2EFF | 128 | Dictionary radicals |
Total CJK ideographs across all extensions: over 90,000 as of Unicode 16.0.
Hanzi, Kanji, Hanja — One Code Point
The unification means that U+4E2D (中, "middle/China") is simultaneously: - Chinese: 中 (zhōng) - Japanese: 中 (naka/chū) - Korean: 중 (jung)
Which glyph is rendered depends on the font and language context. A Chinese font renders 中 in the standard Chinese style; a Japanese font renders it in the Japanese kanji style. The Unicode Standard includes a set of source separation rules that determined when two apparently similar ideographs should be unified versus kept separate.
CJK Radicals and Strokes
CJK ideographs are built from components called radicals (部首, bùshǒu). Unicode encodes:
- Kangxi Radicals (U+2F00–U+2FDF): 214 traditional radicals used in Chinese dictionaries
- CJK Radicals Supplement (U+2E80–U+2EFF): Alternative and simplified radical forms
- Stroke order: Not encoded in Unicode; defined by external standards per language
Working with CJK in Code
import unicodedata
char = "中"
print(unicodedata.name(char)) # CJK UNIFIED IDEOGRAPH-4E2D
print(unicodedata.category(char)) # Lo (Letter, other)
print(hex(ord(char))) # 0x4e2d
# CJK range check
def is_cjk_unified(char: str) -> bool:
cp = ord(char)
return 0x4E00 <= cp <= 0x9FFF # Core block only
def is_cjk_any(char: str) -> bool:
cp = ord(char)
ranges = [
(0x4E00, 0x9FFF), # Core
(0x3400, 0x4DBF), # Extension A
(0x20000, 0x2A6DF), # Extension B
(0x2A700, 0x2CEAF), # Extensions C-F
(0x30000, 0x3134F), # Extension G
]
return any(start <= cp <= end for start, end in ranges)
print(is_cjk_unified("中")) # True
print(is_cjk_unified("A")) # False
The Han Unification Controversy
Han unification remains controversial among East Asian scholars and users. Critics argue: - Unifying characters obscures important cultural and linguistic distinctions - Font selection should not determine meaning or provenance - Japanese and Chinese variants of the "same" ideograph can differ in stroke count and form
Defenders argue: - Most unified characters are genuinely identical in meaning and origin - Localization (font + language tags) correctly handles rendering differences - Encoding all variants separately would require tens of thousands more code points
The Unicode Standard's response: the CJK Compatibility Ideographs block (U+F900–U+FAFF) provides pre-unified code points for round-trip compatibility with East Asian legacy encodings. However, these are compatibility characters — they normalize to the unified forms under NFKC/NFKD.
Quick Facts
| Property | Value |
|---|---|
| CJK stands for | Chinese, Japanese, Korean |
| Core block | U+4E00–U+9FFF (~20,000 ideographs) |
| Total CJK ideographs (v16.0) | ~90,000+ across all extensions |
| Key decision | Han unification |
| Largest single extension | Extension B (42,720 in Plane 2) |
| Newest extension | Extension I (Unicode 15.1) |
| Script property in UCD | Han |
| Radical count (Kangxi) | 214 |
| Controversial aspect | Han unification merges national variants |
相关术语
Unicode 标准 中的更多内容
The process of mapping Chinese, Japanese, and Korean ideographs that share a …
The individual consonant and vowel components (jamo) of the Korean Hangul writing …
与Unicode同步的国际标准(ISO/IEC 10646),定义相同的字符集和码位,但不包含Unicode额外的算法和属性。
为每种书写系统中的每个字符分配唯一编号(码位)的通用字符编码标准,16.0版本包含154,998个已分配字符。
Normative or informative documents that are integral parts of the Unicode Standard. …
Informational documents published by the Unicode Consortium covering specific topics like security …
定义所有Unicode字符属性的机器可读数据文件集合,包括UnicodeData.txt、Blocks.txt、Scripts.txt等。
除代理码位(U+D800–U+DFFF)之外的所有码位,是可表示实际字符的有效值集合,共1,112,064个。
Unicode标准的主要版本,每次发布均新增字符、文字系统和功能,当前版本为Unicode 16.0(2025年9月)。
保证字符一旦分配,其码位和名称永不更改的策略。属性可以精化,但分配是永久性的。