What is Basic Multilingual Plane (BMP)?

Plan 0 (U+0000–U+FFFF), contenant les caractères les plus courants : latin, grec, cyrillique, CJK, arabe et la plupart des symboles. Les caractères ici tiennent dans une seule unité de code UTF-16.

Un bloc contigu de 65 536 points de code. Unicode compte 17 plans (0–16) : le plan 0 est le BMP, le plan 1 est le SMP (emoji, écritures historiques), le plan 2 est le SIP (extensions CJK).

What is Paire de substitution?

Deux unités de code de 16 bits (un substitut haut U+D800–U+DBFF + substitut bas U+DC00–U+DFFF) qui ensemble encodent un caractère supplémentaire en UTF-16. 😀 = D83D DE00.

Norme Unicode

Plan supplémentaire

Plans 1 à 16 (U+10000–U+10FFFF), contenant les emoji, les écritures historiques, les extensions CJK et la notation musicale. Nécessite des paires de substitution en UTF-16.

2021-06-10 · Updated 2024-05-30

What are Supplementary Planes?

Supplementary planes are the 16 Unicode planes numbered 1 through 16 — all planes beyond Plane 0 (the Basic Multilingual Plane). They cover code points U+10000 through U+10FFFF, and they contain characters that did not fit in the original 16-bit BMP design: historic writing systems, rare CJK ideographs, all modern emoji, musical notation, mathematical symbols, and large private use areas.

The term "supplementary character" refers to any character with a code point in these planes. Supplementary characters require special handling in UTF-16 (surrogate pairs) and use 4 bytes in UTF-8.

The Supplementary Planes

Plane	Range	Name	Notable Contents
1	U+10000–U+1FFFF	SMP (Supplementary Multilingual Plane)	Emoji, Linear B, Gothic, Mathematical Alphanumerics
2	U+20000–U+2FFFF	SIP (Supplementary Ideographic Plane)	CJK Unified Ideographs Extensions B–F
3	U+30000–U+3FFFF	TIP (Tertiary Ideographic Plane)	CJK Extension G (added Unicode 13.0)
4–13	U+40000–U+DFFFF	(Unassigned)	No characters assigned
14	U+E0000–U+EFFFF	SSP (Supplementary Special-purpose Plane)	Tags, Variation Selectors Supplement
15	U+F0000–U+FFFFF	SPUA-A (Supplementary PUA-A)	Private use
16	U+100000–U+10FFFF	SPUA-B (Supplementary PUA-B)	Private use

Plane 1: Where Emoji Live

The Supplementary Multilingual Plane (SMP) is the most practically important supplementary plane. It contains:

Emoji (U+1F300–U+1FAFF): All major emoji ranges
Historic scripts: Linear B (first European writing system), Gothic, Ugaritic, Cuneiform, Egyptian Hieroglyphs, Old Persian
Musical notation: U+1D100–U+1D1FF
Mathematical Alphanumeric Symbols: U+1D400–U+1D7FF (𝒜, 𝔹, ℂ...)
Playing Cards, Dominos, Mahjong: Miscellaneous symbols

Plane 2: The CJK Overflow

The Supplementary Ideographic Plane (SIP) was created specifically because the ~20,000 CJK ideographs in the BMP were insufficient for full CJK coverage. Plane 2 adds extensions B through F, bringing the total CJK ideograph count above 90,000. These characters are primarily needed for rare classical Chinese texts, historical documents, and personal name kanji not in common use.

Encoding Supplementary Characters

UTF-16 Surrogate Pairs

Because UTF-16 code units are 16-bit and supplementary code points exceed 16-bit range, UTF-16 encodes them as two 16-bit surrogate code units:

Encoding U+1F600 (😀) in UTF-16:
  Step 1: Subtract 0x10000:  0x1F600 - 0x10000 = 0xF600
  Step 2: High 10 bits: 0x3D → add 0xD800 = 0xD83D (high surrogate)
  Step 3: Low 10 bits: 0x200 → add 0xDC00 = 0xDE00 (low surrogate)
  Result: 0xD83D 0xDE00

UTF-8

UTF-8 uses 4 bytes for supplementary characters:

U+1F600 → 0xF0 0x9F 0x98 0x80

UTF-32

UTF-32 stores every code point in 4 bytes — no special handling needed for supplementary characters.

Impact on Programming

# Python 3: str handles all Unicode natively
s = "😀"  # U+1F600, Plane 1
len(s)     # 1 — correct

# ord() and chr() work for all code points
print(ord("😀"))         # 128512
print(chr(128512))       # 😀
print(f"U+{ord('😀'):06X}")  # U+01F600

// JavaScript must use surrogate-aware methods
const emoji = "😀";
emoji.length;              // 2 (UTF-16 surrogate pair!)
emoji.codePointAt(0);     // 128512 (correct code point)
String.fromCodePoint(128512); // "😀"
[...emoji].length;         // 1 (spread uses code points)

Quick Facts

Property	Value
Planes covered	1–16
Code point range	U+10000–U+10FFFF
Total code points	1,048,576
UTF-16 encoding	Surrogate pairs (2 code units)
UTF-8 encoding	4 bytes
Most important plane	Plane 1 (SMP) — emoji and historic scripts
Largest CJK source	Plane 2 (SIP)
Completely unassigned planes	4–13

Termes associés

Basic Multilingual Plane (BMP) Plan Paire de substitution

Plus dans Norme Unicode

Basic Multilingual Plane (BMP)

Plan 0 (U+0000–U+FFFF), contenant les caractères les plus courants : latin, grec, …

Caractère abstrait

Unité d'information utilisée pour organiser, contrôler ou représenter des données textuelles — …

Caractère affecté

Point de code auquel un caractère a été attribué dans une version …

CJK

Chinois, Japonais et Coréen — le terme collectif pour le bloc des …

Consortium Unicode

Organisation à but non lucratif qui développe et maintient le standard Unicode. …

Espace de code

La plage complète des points de code Unicode possibles : U+0000 à …

Han Unification

The process of mapping Chinese, Japanese, and Korean ideographs that share a …

Hangul Jamo

The individual consonant and vowel components (jamo) of the Korean Hangul writing …

ISO 10646 / Universal Character Set

Norme internationale (ISO/IEC 10646) synchronisée avec Unicode, définissant le même répertoire de …

Non-caractère

Points de code définitivement réservés à un usage interne (66 au total) …

← Retour au glossaire