Unicode 標準

基本多言語面 (BMP)

第0面(U+0000〜U+FFFF)で、ラテン・ギリシャ・キリル・CJK・アラビア文字やほとんどの記号など、最もよく使われる文字を含みます。この面の文字は1つのUTF-16コード単位に収まります。

· Updated

What is the Basic Multilingual Plane?

The Basic Multilingual Plane (BMP) is Plane 0 of the Unicode code space, covering code points U+0000 through U+FFFF — a range of exactly 65,536 positions. It was designed to hold all the characters needed for modern text in the world's actively used scripts, and it largely succeeded: the Latin, Greek, Cyrillic, Arabic, Hebrew, Devanagari, CJK, and dozens of other scripts all fit within the BMP.

The BMP's boundaries matter beyond just organization. Because BMP code points fit in a single 16-bit value (0x0000–0xFFFF), they can be stored as a single code unit in UTF-16, and they are the direct ancestors of UCS-2, the predecessor to UTF-16.

What Lives in the BMP

The BMP is organized into blocks — contiguous ranges assigned to specific scripts or purposes. Notable regions include:

Range Contents
U+0000–U+007F Basic Latin (ASCII)
U+0080–U+00FF Latin-1 Supplement
U+0370–U+03FF Greek and Coptic
U+0400–U+04FF Cyrillic
U+0600–U+06FF Arabic
U+0900–U+097F Devanagari
U+3040–U+309F Hiragana
U+30A0–U+30FF Katakana
U+4E00–U+9FFF CJK Unified Ideographs (core)
U+AC00–U+D7AF Hangul Syllables (11,172 precomposed)
U+E000–U+F8FF Private Use Area
U+D800–U+DFFF Surrogate range (not real characters)
U+FFF0–U+FFFF Specials (including U+FFFD replacement character)

The Surrogate Hole

One important quirk: the range U+D800–U+DFFF (2,048 code points) is permanently reserved for surrogates — the mechanism UTF-16 uses to encode characters above U+FFFF. These code points can never be assigned to real characters. You will sometimes see UTF-16 described as covering the "BMP minus surrogates."

BMP vs Supplementary Characters

Any character with a code point above U+FFFF is a supplementary character and requires special handling in encodings optimized for the BMP:

Encoding BMP character Supplementary character
UTF-8 1–3 bytes 4 bytes
UTF-16 1 code unit (2 bytes) 2 code units (4 bytes, surrogate pair)
UTF-32 1 code unit (4 bytes) 1 code unit (4 bytes, no difference)

In UTF-16, supplementary characters require a surrogate pair — two 16-bit code units working together. Most emoji fall into Plane 1 (U+1F000+) and are therefore supplementary.

Historical Significance

Early Unicode architects hoped that 65,536 code points would be enough for all world languages forever. They were wrong. By Unicode 2.0, it was clear that CJK ideographs alone would eventually overflow the BMP, and the standard was extended to 17 planes. This is why legacy systems built on UCS-2 (a fixed-width 16-bit encoding) failed: they could only represent BMP characters.

Common Pitfalls

UCS-2 vs UTF-16: UCS-2 encodes only the BMP using fixed 2-byte units. UTF-16 extends UCS-2 with surrogate pairs for supplementary characters. Many old systems claiming "Unicode support" actually only support UCS-2 (BMP-only).

Emoji in JavaScript: Because JavaScript strings are UTF-16, emoji (Plane 1) have .length of 2, not 1. Iterating with spread or Array.from() corrects this.

"🎉".length      // 2 (two UTF-16 code units)
[..."🎉"].length // 1 (one Unicode code point)

Quick Facts

Property Value
Code point range U+0000–U+FFFF
Total positions 65,536
Plane number 0
Also known as Plane 0, BMP
UTF-16 code units needed 1 (for all non-surrogate BMP chars)
Surrogate range (excluded) U+D800–U+DFFF (2,048 points)
Characters assigned (approx.) ~55,000
Predecessor encoding UCS-2 (BMP-only, no surrogates)

関連用語

Unicode 標準 のその他の用語

CJK(漢字・かな・ハングル)

中国語・日本語・韓国語 — Unicodeにおける統合漢字ブロックと関連スクリプトをまとめた総称。CJK統合漢字は20,992文字以上を含みます。

Han Unification

The process of mapping Chinese, Japanese, and Korean ideographs that share a …

Hangul Jamo

The individual consonant and vowel components (jamo) of the Korean Hangul writing …

ISO 10646 / 万国文字集合

Unicodeと同期している国際標準(ISO/IEC 10646)で、同じ文字目録とコードポイントを定義しますが、Unicodeの追加アルゴリズムやプロパティは含みません。

Unicode

あらゆる文字システムのすべての文字に固有の番号(コードポイント)を割り当てる普遍的文字エンコーディング規格。バージョン16.0には154,998個の割り当て済み文字が含まれます。

Unicode Standard Annex (UAX)

Normative or informative documents that are integral parts of the Unicode Standard. …

Unicode Technical Report (UTR)

Informational documents published by the Unicode Consortium covering specific topics like security …

Unicode コンソーシアム

Unicode標準を開発・維持する非営利団体。Apple・Google・Microsoft・Metaなど多くの企業が会員です。

Unicode スカラー値

サロゲートコードポイント(U+D800〜U+DFFF)を除くすべてのコードポイント。実際の文字を表すことができる有効な値の集合で、合計1,112,064個です。

Unicode バージョン

新しい文字・文字体系・機能を追加するUnicode標準の主要リリース。現在のバージョンはUnicode 16.0(2025年9月)です。