보충 평면 / 아스트랄 평면
평면 1~16(U+10000~U+10FFFF)으로, 이모지, 고대 문자, CJK 확장, 악보 등을 포함합니다. UTF-16에서는 서로게이트 쌍이 필요합니다.
What are Supplementary Planes?
Supplementary planes are the 16 Unicode planes numbered 1 through 16 — all planes beyond Plane 0 (the Basic Multilingual Plane). They cover code points U+10000 through U+10FFFF, and they contain characters that did not fit in the original 16-bit BMP design: historic writing systems, rare CJK ideographs, all modern emoji, musical notation, mathematical symbols, and large private use areas.
The term "supplementary character" refers to any character with a code point in these planes. Supplementary characters require special handling in UTF-16 (surrogate pairs) and use 4 bytes in UTF-8.
The Supplementary Planes
| Plane | Range | Name | Notable Contents |
|---|---|---|---|
| 1 | U+10000–U+1FFFF | SMP (Supplementary Multilingual Plane) | Emoji, Linear B, Gothic, Mathematical Alphanumerics |
| 2 | U+20000–U+2FFFF | SIP (Supplementary Ideographic Plane) | CJK Unified Ideographs Extensions B–F |
| 3 | U+30000–U+3FFFF | TIP (Tertiary Ideographic Plane) | CJK Extension G (added Unicode 13.0) |
| 4–13 | U+40000–U+DFFFF | (Unassigned) | No characters assigned |
| 14 | U+E0000–U+EFFFF | SSP (Supplementary Special-purpose Plane) | Tags, Variation Selectors Supplement |
| 15 | U+F0000–U+FFFFF | SPUA-A (Supplementary PUA-A) | Private use |
| 16 | U+100000–U+10FFFF | SPUA-B (Supplementary PUA-B) | Private use |
Plane 1: Where Emoji Live
The Supplementary Multilingual Plane (SMP) is the most practically important supplementary plane. It contains:
- Emoji (U+1F300–U+1FAFF): All major emoji ranges
- Historic scripts: Linear B (first European writing system), Gothic, Ugaritic, Cuneiform, Egyptian Hieroglyphs, Old Persian
- Musical notation: U+1D100–U+1D1FF
- Mathematical Alphanumeric Symbols: U+1D400–U+1D7FF (𝒜, 𝔹, ℂ...)
- Playing Cards, Dominos, Mahjong: Miscellaneous symbols
Plane 2: The CJK Overflow
The Supplementary Ideographic Plane (SIP) was created specifically because the ~20,000 CJK ideographs in the BMP were insufficient for full CJK coverage. Plane 2 adds extensions B through F, bringing the total CJK ideograph count above 90,000. These characters are primarily needed for rare classical Chinese texts, historical documents, and personal name kanji not in common use.
Encoding Supplementary Characters
UTF-16 Surrogate Pairs
Because UTF-16 code units are 16-bit and supplementary code points exceed 16-bit range, UTF-16 encodes them as two 16-bit surrogate code units:
Encoding U+1F600 (😀) in UTF-16:
Step 1: Subtract 0x10000: 0x1F600 - 0x10000 = 0xF600
Step 2: High 10 bits: 0x3D → add 0xD800 = 0xD83D (high surrogate)
Step 3: Low 10 bits: 0x200 → add 0xDC00 = 0xDE00 (low surrogate)
Result: 0xD83D 0xDE00
UTF-8
UTF-8 uses 4 bytes for supplementary characters:
U+1F600 → 0xF0 0x9F 0x98 0x80
UTF-32
UTF-32 stores every code point in 4 bytes — no special handling needed for supplementary characters.
Impact on Programming
# Python 3: str handles all Unicode natively
s = "😀" # U+1F600, Plane 1
len(s) # 1 — correct
# ord() and chr() work for all code points
print(ord("😀")) # 128512
print(chr(128512)) # 😀
print(f"U+{ord('😀'):06X}") # U+01F600
// JavaScript must use surrogate-aware methods
const emoji = "😀";
emoji.length; // 2 (UTF-16 surrogate pair!)
emoji.codePointAt(0); // 128512 (correct code point)
String.fromCodePoint(128512); // "😀"
[...emoji].length; // 1 (spread uses code points)
Quick Facts
| Property | Value |
|---|---|
| Planes covered | 1–16 |
| Code point range | U+10000–U+10FFFF |
| Total code points | 1,048,576 |
| UTF-16 encoding | Surrogate pairs (2 code units) |
| UTF-8 encoding | 4 bytes |
| Most important plane | Plane 1 (SMP) — emoji and historic scripts |
| Largest CJK source | Plane 2 (SIP) |
| Completely unassigned planes | 4–13 |
관련 용어
유니코드 표준의 더 많은 용어
한중일 — 유니코드에서 통합 한자 블록 및 관련 문자 체계를 아우르는 집합적 …
The process of mapping Chinese, Japanese, and Korean ideographs that share a …
The individual consonant and vowel components (jamo) of the Korean Hangul writing …
유니코드와 동기화된 국제 표준(ISO/IEC 10646)으로, 동일한 문자 목록과 코드 포인트를 정의하지만 유니코드의 …
모든 문자 체계의 모든 문자에 고유 번호(코드 포인트)를 부여하는 범용 문자 인코딩 …
Normative or informative documents that are integral parts of the Unicode Standard. …
Informational documents published by the Unicode Consortium covering specific topics like security …
평면 0(U+0000~U+FFFF)으로, 라틴, 그리스, 키릴, CJK, 아랍 문자 및 대부분의 기호 등 …
어느 유니코드 버전에서도 문자가 할당되지 않은 코드 포인트로, Cn(미할당)으로 분류됩니다. 향후 버전에서 …
내부 사용을 위해 영구 예약된 코드 포인트(총 66개): 각 평면의 U+FDD0~U+FDEF 및 …