保留码位
为未来标准化预留的码位,有别于非字符(永久保留)和私用区域(用户可分配)。
What is a Reserved Code Point?
A reserved code point is a position in the Unicode code space that has not yet been assigned to any character and is not permanently designated for a specific purpose (like noncharacters or private use). The Unicode Consortium holds these positions in reserve for potential future character assignments. As new scripts, symbols, and characters are added in future Unicode versions, they are taken from the pool of reserved code points.
Reserved code points are distinct from: - Unassigned code points: Often used interchangeably with "reserved," but technically "unassigned" means not yet having a character assignment, while "reserved" may imply more deliberate designation - Noncharacters: 66 code points permanently reserved and never to be assigned characters - Private Use Area: Permanently designated for user-defined characters
Current State
As of Unicode 16.0 (154,998 assigned characters), approximately 819,000 code points are unassigned — a vast majority of the 1,114,112 total code space. The Unicode Consortium has far more space than it currently needs:
Total code space: 1,114,112
Assigned characters: 154,998 (~13.9%)
Private Use Area: 137,468 (~12.4%)
Surrogates: 2,048 ( ~0.2%)
Noncharacters: 66 ( ~0.01%)
Available (unassigned): ~819,000 (~73.5%)
Where Reserved Code Points Appear
Reserved code points are scattered throughout the code space, not concentrated in one region. Some patterns:
- Gaps within blocks: A block may have some code points assigned and others reserved (e.g., the Greek block has specific reserved positions where uncommon letters were not initially added)
- Entire sub-ranges: Planes 4–13 (U+40000–U+DFFFF) are entirely unassigned
- Within the BMP: Scattered positions within named blocks
Handling Reserved Code Points
Applications should treat reserved code points gracefully:
import unicodedata
def classify_code_point(cp: int) -> str:
char = chr(cp)
category = unicodedata.category(char)
# Cn = Unassigned (reserved/not yet assigned)
if category == "Cn":
return "unassigned/reserved"
elif category == "Co":
return "private use"
elif category == "Cs":
return "surrogate"
else:
return f"assigned ({category})"
print(classify_code_point(0x0041)) # assigned (Lu)
print(classify_code_point(0xE001)) # private use
print(classify_code_point(0xD800)) # surrogate
print(classify_code_point(0x0378)) # unassigned/reserved
Stability Guarantee
A core Unicode stability policy states that reserved code points may become assigned in future versions, but: - An assigned code point is never unassigned - A code point is never reassigned to a different character - The properties of reserved code points may change when they are assigned
This means software written today that skips or rejects reserved code points may need updating when those points are assigned in a future Unicode version.
The U+0378 Example
U+0378 is an example of a reserved code point within the Greek block (U+0370–U+03FF). The Greek block contains letters and symbols, but U+0378 and U+0379 have no assigned characters. They were skipped in the original Greek assignments and remain reserved pending any future need.
Quick Facts
| Property | Value |
|---|---|
| General category | Cn (Unassigned) |
| Approximate count | ~819,000 (Unicode 16.0) |
| Percentage of code space | ~73.5% |
| Can become assigned? | Yes — in future Unicode versions |
| Ever removed once assigned? | No — stability policy prohibits this |
| Entirely unassigned planes | Planes 4–13 |
| Can be used privately? | Not recommended — use PUA instead |
相关术语
Unicode 标准 中的更多内容
中日韩——Unicode中统一汉字区块及相关文字系统的统称,CJK统一表意文字包含20,992个以上字符。
The process of mapping Chinese, Japanese, and Korean ideographs that share a …
The individual consonant and vowel components (jamo) of the Korean Hangul writing …
与Unicode同步的国际标准(ISO/IEC 10646),定义相同的字符集和码位,但不包含Unicode额外的算法和属性。
为每种书写系统中的每个字符分配唯一编号(码位)的通用字符编码标准,16.0版本包含154,998个已分配字符。
Normative or informative documents that are integral parts of the Unicode Standard. …
Informational documents published by the Unicode Consortium covering specific topics like security …
定义所有Unicode字符属性的机器可读数据文件集合,包括UnicodeData.txt、Blocks.txt、Scripts.txt等。
除代理码位(U+D800–U+DFFF)之外的所有码位,是可表示实际字符的有效值集合,共1,112,064个。
Unicode标准的主要版本,每次发布均新增字符、文字系统和功能,当前版本为Unicode 16.0(2025年9月)。