Unicode 標準

補助面 / アストラル面

第1〜16面(U+10000〜U+10FFFF)で、絵文字・歴史的文字・CJK拡張・楽譜などを含みます。UTF-16ではサロゲートペアが必要です。

· 更新日

What are Supplementary Planes?

Supplementary planes are the 16 Unicode planes numbered 1 through 16 — all planes beyond Plane 0 (the Basic Multilingual Plane). They cover code points U+10000 through U+10FFFF, and they contain characters that did not fit in the original 16-bit BMP design: historic writing systems, rare CJK ideographs, all modern emoji, musical notation, mathematical symbols, and large private use areas.

The term "supplementary character" refers to any character with a code point in these planes. Supplementary characters require special handling in UTF-16 (surrogate pairs) and use 4 bytes in UTF-8.

The Supplementary Planes

Plane Range Name Notable Contents
1 U+10000–U+1FFFF SMP (Supplementary Multilingual Plane) Emoji, Linear B, Gothic, Mathematical Alphanumerics
2 U+20000–U+2FFFF SIP (Supplementary Ideographic Plane) CJK Unified Ideographs Extensions B–F
3 U+30000–U+3FFFF TIP (Tertiary Ideographic Plane) CJK Extension G (added Unicode 13.0)
4–13 U+40000–U+DFFFF (Unassigned) No characters assigned
14 U+E0000–U+EFFFF SSP (Supplementary Special-purpose Plane) Tags, Variation Selectors Supplement
15 U+F0000–U+FFFFF SPUA-A (Supplementary PUA-A) Private use
16 U+100000–U+10FFFF SPUA-B (Supplementary PUA-B) Private use

Plane 1: Where Emoji Live

The Supplementary Multilingual Plane (SMP) is the most practically important supplementary plane. It contains:

  • Emoji (U+1F300–U+1FAFF): All major emoji ranges
  • Historic scripts: Linear B (first European writing system), Gothic, Ugaritic, Cuneiform, Egyptian Hieroglyphs, Old Persian
  • Musical notation: U+1D100–U+1D1FF
  • Mathematical Alphanumeric Symbols: U+1D400–U+1D7FF (𝒜, 𝔹, ℂ...)
  • Playing Cards, Dominos, Mahjong: Miscellaneous symbols

Plane 2: The CJK Overflow

The Supplementary Ideographic Plane (SIP) was created specifically because the ~20,000 CJK ideographs in the BMP were insufficient for full CJK coverage. Plane 2 adds extensions B through F, bringing the total CJK ideograph count above 90,000. These characters are primarily needed for rare classical Chinese texts, historical documents, and personal name kanji not in common use.

Encoding Supplementary Characters

UTF-16 Surrogate Pairs

Because UTF-16 code units are 16-bit and supplementary code points exceed 16-bit range, UTF-16 encodes them as two 16-bit surrogate code units:

Encoding U+1F600 (😀) in UTF-16:
  Step 1: Subtract 0x10000:  0x1F600 - 0x10000 = 0xF600
  Step 2: High 10 bits: 0x3D → add 0xD800 = 0xD83D (high surrogate)
  Step 3: Low 10 bits: 0x200 → add 0xDC00 = 0xDE00 (low surrogate)
  Result: 0xD83D 0xDE00

UTF-8

UTF-8 uses 4 bytes for supplementary characters:

U+1F600 → 0xF0 0x9F 0x98 0x80

UTF-32

UTF-32 stores every code point in 4 bytes — no special handling needed for supplementary characters.

Impact on Programming

# Python 3: str handles all Unicode natively
s = "😀"  # U+1F600, Plane 1
len(s)     # 1 — correct

# ord() and chr() work for all code points
print(ord("😀"))         # 128512
print(chr(128512))       # 😀
print(f"U+{ord('😀'):06X}")  # U+01F600
// JavaScript must use surrogate-aware methods
const emoji = "😀";
emoji.length;              // 2 (UTF-16 surrogate pair!)
emoji.codePointAt(0);     // 128512 (correct code point)
String.fromCodePoint(128512); // "😀"
[...emoji].length;         // 1 (spread uses code points)

Quick Facts

Property Value
Planes covered 1–16
Code point range U+10000–U+10FFFF
Total code points 1,048,576
UTF-16 encoding Surrogate pairs (2 code units)
UTF-8 encoding 4 bytes
Most important plane Plane 1 (SMP) — emoji and historic scripts
Largest CJK source Plane 2 (SIP)
Completely unassigned planes 4–13

関連用語

Unicode 標準 のその他の用語

CJK(漢字・かな・ハングル)

中国語・日本語・韓国語 — Unicodeにおける統合漢字ブロックと関連スクリプトをまとめた総称。CJK統合漢字は20,992文字以上を含みます。

Han Unification

The process of mapping Chinese, Japanese, and Korean ideographs that share a …

Hangul Jamo

The individual consonant and vowel components (jamo) of the Korean Hangul writing …

ISO 10646 / 万国文字集合

Unicodeと同期している国際標準(ISO/IEC 10646)で、同じ文字目録とコードポイントを定義しますが、Unicodeの追加アルゴリズムやプロパティは含みません。

Unicode

あらゆる文字システムのすべての文字に固有の番号(コードポイント)を割り当てる普遍的文字エンコーディング規格。バージョン16.0には154,998個の割り当て済み文字が含まれます。

Unicode Standard Annex (UAX)

Normative or informative documents that are integral parts of the Unicode Standard. …

Unicode Technical Report (UTR)

Informational documents published by the Unicode Consortium covering specific topics like security …

Unicode コンソーシアム

Unicode標準を開発・維持する非営利団体。Apple・Google・Microsoft・Metaなど多くの企業が会員です。

Unicode スカラー値

サロゲートコードポイント(U+D800〜U+DFFF)を除くすべてのコードポイント。実際の文字を表すことができる有効な値の集合で、合計1,112,064個です。

Unicode バージョン

新しい文字・文字体系・機能を追加するUnicode標準の主要リリース。現在のバージョンはUnicode 16.0(2025年9月)です。