What is 割り当て済み文字?

Unicodeバージョンで文字の指定を受けたコードポイント。Unicode 16.0時点で、1,114,112個のうち154,998個のコードポイントが割り当て済みです。

What is コードポイント?

Unicodeコード空間（U+0000〜U+10FFFF）内の数値で、U+XXXXと表記します。すべてのコードポイントが文字に割り当てられているわけではありません。

What is 一般カテゴリー?

すべてのコードポイントを30個のカテゴリ（Lu・Ll・Nd・Soなど）の1つに分類する体系で、7つの主要クラス（文字・記号・数字・句読点・記号・区切り・その他）にグループ化されています。

Unicode 標準

未割り当てコードポイント

どのUnicodeバージョンでも文字が割り当てられていないコードポイントで、Cn（未割り当て）に分類されます。将来のバージョンで割り当てられる可能性があります。

2021-10-18 · Updated 2024-03-06

What is an Unassigned Code Point?

An unassigned code point is a position in the Unicode code space (U+0000–U+10FFFF) that has not yet been given a character assignment in the current version of the Unicode Standard. The code point exists in the address space but has no official character, name, or properties beyond its default values.

Unassigned code points constitute the majority of the Unicode code space — approximately 819,000 of the 1,114,112 total positions (about 73.5%) in Unicode 16.0. This large reserve ensures that Unicode can accommodate new scripts, symbols, and characters discovered or invented in the future.

General Category: Cn

Unassigned code points receive the General Category value Cn — "Category Not Assigned." This is distinct from:

Co (Private Use): Code points in the PUA, permanently set aside for user-defined characters
Cs (Surrogate): The U+D800–U+DFFF range, permanently reserved for UTF-16 mechanics
Cn (Unassigned): Code points not yet used for any purpose

import unicodedata

# Cn = "not assigned" (default category for unassigned)
print(unicodedata.category("\u0378"))   # Cn — reserved/unassigned in Greek block
print(unicodedata.category("\uE001"))   # Co — Private Use
print(unicodedata.category("\uD800"))   # Cs — Surrogate
print(unicodedata.category("A"))        # Lu — assigned (Uppercase Letter)

Default Property Values

For unassigned code points, the Unicode Standard defines default property values:

Property	Default for Cn
General Category	Cn (Not Assigned)
Canonical Combining Class	0
Bidi Class	depends on code point range
Decomposition	none
Case	no case
Name	(none — raises ValueError in Python)

import unicodedata

cp = "\u0378"
try:
    print(unicodedata.name(cp))
except ValueError:
    print("No name — unassigned code point")  # prints this

Unassigned vs Reserved vs Noncharacter

These terms are often conflated:

Term	Meaning	Future assignment?
Unassigned	No character yet; Cn category	Yes — may be assigned
Reserved	Deliberately withheld for future use	Yes — intended for future use
Noncharacter	66 specific code points, permanent	No — never to be assigned
Private Use	PUA ranges for user-defined characters	No — permanently private

In practice, "unassigned" and "reserved" are often used interchangeably, since all unassigned non-PUA, non-surrogate, non-noncharacter code points are effectively reserved for future use.

Handling Unassigned Code Points

Software should handle unassigned code points gracefully. The Unicode Standard recommends:

Accept unassigned code points in input without raising errors (they may be assigned in future versions)
Pass through unassigned code points unchanged in text processing
Do not map unassigned code points to replacement characters except in specific conformance scenarios

# Robust code point classifier
import unicodedata

def describe_code_point(cp: int) -> str:
    if cp > 0x10FFFF:
        return "out of Unicode range"
    char = chr(cp)
    cat = unicodedata.category(char)
    if cat == "Cn":
        if 0xFDD0 <= cp <= 0xFDEF or (cp & 0xFFFF) in (0xFFFE, 0xFFFF):
            return "noncharacter"
        if 0xD800 <= cp <= 0xDFFF:
            return "surrogate"
        return "unassigned"
    elif cat == "Co":
        return "private use"
    else:
        return f"assigned ({unicodedata.name(char, 'unnamed')})"

Distribution of Unassigned Code Points

Unassigned code points are not evenly distributed. Major unassigned regions include:

Planes 4–13 (U+40000–U+DFFFF): Entirely unassigned — 655,360 code points
Within Plane 2 and 3: Gaps between CJK extension ranges
Within the BMP: Scattered positions within named blocks

Quick Facts

Property	Value
General category	Cn (Not Assigned)
Approximate count (v16.0)	~819,000
Percentage of code space	~73.5%
Can become assigned?	Yes — in future Unicode versions
Default bidi class	AL, R, or L depending on range
Entirely unassigned planes	4–13
Should software reject them?	No — pass through gracefully

Unicode 標準のその他の用語

CJK（漢字・かな・ハングル）

中国語・日本語・韓国語 — Unicodeにおける統合漢字ブロックと関連スクリプトをまとめた総称。CJK統合漢字は20,992文字以上を含みます。

Han Unification

The process of mapping Chinese, Japanese, and Korean ideographs that share a …

Hangul Jamo

The individual consonant and vowel components (jamo) of the Korean Hangul writing …

ISO 10646 / 万国文字集合

Unicodeと同期している国際標準（ISO/IEC 10646）で、同じ文字目録とコードポイントを定義しますが、Unicodeの追加アルゴリズムやプロパティは含みません。

Unicode

あらゆる文字システムのすべての文字に固有の番号（コードポイント）を割り当てる普遍的文字エンコーディング規格。バージョン16.0には154,998個の割り当て済み文字が含まれます。

Unicode Standard Annex (UAX)

Normative or informative documents that are integral parts of the Unicode Standard. …

Unicode Technical Report (UTR)

Informational documents published by the Unicode Consortium covering specific topics like security …

Unicode コンソーシアム

Unicode標準を開発・維持する非営利団体。Apple・Google・Microsoft・Metaなど多くの企業が会員です。

Unicode スカラー値

サロゲートコードポイント（U+D800〜U+DFFF）を除くすべてのコードポイント。実際の文字を表すことができる有効な値の集合で、合計1,112,064個です。

Unicode バージョン

新しい文字・文字体系・機能を追加するUnicode標準の主要リリース。現在のバージョンはUnicode 16.0（2025年9月）です。

← 用語集へ