未分配码位
Embed This Widget
Add the script tag and a data attribute to embed this widget.
Embed via iframe for maximum compatibility.
<iframe src="https://unicodefyi.com/iframe/glossary/unassigned-code-point/" width="420" height="400" frameborder="0" style="border:0;border-radius:10px;max-width:100%" loading="lazy"></iframe>
Paste this URL in WordPress, Medium, or any oEmbed-compatible platform.
https://unicodefyi.com/glossary/unassigned-code-point/
Add a dynamic SVG badge to your README or docs.
[](https://unicodefyi.com/glossary/unassigned-code-point/)
Use the native HTML custom element.
在任何Unicode版本中均未分配字符的码位,归类为Cn(未分配),可能在未来版本中被分配。
What is an Unassigned Code Point?
An unassigned code point is a position in the Unicode code space (U+0000–U+10FFFF) that has not yet been given a character assignment in the current version of the Unicode Standard. The code point exists in the address space but has no official character, name, or properties beyond its default values.
Unassigned code points constitute the majority of the Unicode code space — approximately 819,000 of the 1,114,112 total positions (about 73.5%) in Unicode 16.0. This large reserve ensures that Unicode can accommodate new scripts, symbols, and characters discovered or invented in the future.
General Category: Cn
Unassigned code points receive the General Category value Cn — "Category Not Assigned." This is distinct from:
- Co (Private Use): Code points in the PUA, permanently set aside for user-defined characters
- Cs (Surrogate): The U+D800–U+DFFF range, permanently reserved for UTF-16 mechanics
- Cn (Unassigned): Code points not yet used for any purpose
import unicodedata
# Cn = "not assigned" (default category for unassigned)
print(unicodedata.category("\u0378")) # Cn — reserved/unassigned in Greek block
print(unicodedata.category("\uE001")) # Co — Private Use
print(unicodedata.category("\uD800")) # Cs — Surrogate
print(unicodedata.category("A")) # Lu — assigned (Uppercase Letter)
Default Property Values
For unassigned code points, the Unicode Standard defines default property values:
| Property | Default for Cn |
|---|---|
| General Category | Cn (Not Assigned) |
| Canonical Combining Class | 0 |
| Bidi Class | depends on code point range |
| Decomposition | none |
| Case | no case |
| Name | (none — raises ValueError in Python) |
import unicodedata
cp = "\u0378"
try:
print(unicodedata.name(cp))
except ValueError:
print("No name — unassigned code point") # prints this
Unassigned vs Reserved vs Noncharacter
These terms are often conflated:
| Term | Meaning | Future assignment? |
|---|---|---|
| Unassigned | No character yet; Cn category | Yes — may be assigned |
| Reserved | Deliberately withheld for future use | Yes — intended for future use |
| Noncharacter | 66 specific code points, permanent | No — never to be assigned |
| Private Use | PUA ranges for user-defined characters | No — permanently private |
In practice, "unassigned" and "reserved" are often used interchangeably, since all unassigned non-PUA, non-surrogate, non-noncharacter code points are effectively reserved for future use.
Handling Unassigned Code Points
Software should handle unassigned code points gracefully. The Unicode Standard recommends:
- Accept unassigned code points in input without raising errors (they may be assigned in future versions)
- Pass through unassigned code points unchanged in text processing
- Do not map unassigned code points to replacement characters except in specific conformance scenarios
# Robust code point classifier
import unicodedata
def describe_code_point(cp: int) -> str:
if cp > 0x10FFFF:
return "out of Unicode range"
char = chr(cp)
cat = unicodedata.category(char)
if cat == "Cn":
if 0xFDD0 <= cp <= 0xFDEF or (cp & 0xFFFF) in (0xFFFE, 0xFFFF):
return "noncharacter"
if 0xD800 <= cp <= 0xDFFF:
return "surrogate"
return "unassigned"
elif cat == "Co":
return "private use"
else:
return f"assigned ({unicodedata.name(char, 'unnamed')})"
Distribution of Unassigned Code Points
Unassigned code points are not evenly distributed. Major unassigned regions include:
- Planes 4–13 (U+40000–U+DFFFF): Entirely unassigned — 655,360 code points
- Within Plane 2 and 3: Gaps between CJK extension ranges
- Within the BMP: Scattered positions within named blocks
Quick Facts
| Property | Value |
|---|---|
| General category | Cn (Not Assigned) |
| Approximate count (v16.0) | ~819,000 |
| Percentage of code space | ~73.5% |
| Can become assigned? | Yes — in future Unicode versions |
| Default bidi class | AL, R, or L depending on range |
| Entirely unassigned planes | 4–13 |
| Should software reject them? | No — pass through gracefully |
相关术语
Unicode 标准 中的更多内容
中日韩——Unicode中统一汉字区块及相关文字系统的统称,CJK统一表意文字包含20,992个以上字符。
The process of mapping Chinese, Japanese, and Korean ideographs that share a …
The individual consonant and vowel components (jamo) of the Korean Hangul writing …
与Unicode同步的国际标准(ISO/IEC 10646),定义相同的字符集和码位,但不包含Unicode额外的算法和属性。
为每种书写系统中的每个字符分配唯一编号(码位)的通用字符编码标准,16.0版本包含154,998个已分配字符。
Normative or informative documents that are integral parts of the Unicode Standard. …
Informational documents published by the Unicode Consortium covering specific topics like security …
定义所有Unicode字符属性的机器可读数据文件集合,包括UnicodeData.txt、Blocks.txt、Scripts.txt等。
除代理码位(U+D800–U+DFFF)之外的所有码位,是可表示实际字符的有效值集合,共1,112,064个。
Unicode标准的主要版本,每次发布均新增字符、文字系统和功能,当前版本为Unicode 16.0(2025年9月)。