Unicode Standard

Unassigned Code Point

A code point not yet assigned a character in any Unicode version, categorized as Cn (Unassigned). May be assigned in future versions.

· Updated

What is an Unassigned Code Point?

An unassigned code point is a position in the Unicode code space (U+0000–U+10FFFF) that has not yet been given a character assignment in the current version of the Unicode Standard. The code point exists in the address space but has no official character, name, or properties beyond its default values.

Unassigned code points constitute the majority of the Unicode code space — approximately 819,000 of the 1,114,112 total positions (about 73.5%) in Unicode 16.0. This large reserve ensures that Unicode can accommodate new scripts, symbols, and characters discovered or invented in the future.

General Category: Cn

Unassigned code points receive the General Category value Cn — "Category Not Assigned." This is distinct from:

  • Co (Private Use): Code points in the PUA, permanently set aside for user-defined characters
  • Cs (Surrogate): The U+D800–U+DFFF range, permanently reserved for UTF-16 mechanics
  • Cn (Unassigned): Code points not yet used for any purpose
import unicodedata

# Cn = "not assigned" (default category for unassigned)
print(unicodedata.category("\u0378"))   # Cn — reserved/unassigned in Greek block
print(unicodedata.category("\uE001"))   # Co — Private Use
print(unicodedata.category("\uD800"))   # Cs — Surrogate
print(unicodedata.category("A"))        # Lu — assigned (Uppercase Letter)

Default Property Values

For unassigned code points, the Unicode Standard defines default property values:

Property Default for Cn
General Category Cn (Not Assigned)
Canonical Combining Class 0
Bidi Class depends on code point range
Decomposition none
Case no case
Name (none — raises ValueError in Python)
import unicodedata

cp = "\u0378"
try:
    print(unicodedata.name(cp))
except ValueError:
    print("No name — unassigned code point")  # prints this

Unassigned vs Reserved vs Noncharacter

These terms are often conflated:

Term Meaning Future assignment?
Unassigned No character yet; Cn category Yes — may be assigned
Reserved Deliberately withheld for future use Yes — intended for future use
Noncharacter 66 specific code points, permanent No — never to be assigned
Private Use PUA ranges for user-defined characters No — permanently private

In practice, "unassigned" and "reserved" are often used interchangeably, since all unassigned non-PUA, non-surrogate, non-noncharacter code points are effectively reserved for future use.

Handling Unassigned Code Points

Software should handle unassigned code points gracefully. The Unicode Standard recommends:

  • Accept unassigned code points in input without raising errors (they may be assigned in future versions)
  • Pass through unassigned code points unchanged in text processing
  • Do not map unassigned code points to replacement characters except in specific conformance scenarios
# Robust code point classifier
import unicodedata

def describe_code_point(cp: int) -> str:
    if cp > 0x10FFFF:
        return "out of Unicode range"
    char = chr(cp)
    cat = unicodedata.category(char)
    if cat == "Cn":
        if 0xFDD0 <= cp <= 0xFDEF or (cp & 0xFFFF) in (0xFFFE, 0xFFFF):
            return "noncharacter"
        if 0xD800 <= cp <= 0xDFFF:
            return "surrogate"
        return "unassigned"
    elif cat == "Co":
        return "private use"
    else:
        return f"assigned ({unicodedata.name(char, 'unnamed')})"

Distribution of Unassigned Code Points

Unassigned code points are not evenly distributed. Major unassigned regions include:

  • Planes 4–13 (U+40000–U+DFFFF): Entirely unassigned — 655,360 code points
  • Within Plane 2 and 3: Gaps between CJK extension ranges
  • Within the BMP: Scattered positions within named blocks

Quick Facts

Property Value
General category Cn (Not Assigned)
Approximate count (v16.0) ~819,000
Percentage of code space ~73.5%
Can become assigned? Yes — in future Unicode versions
Default bidi class AL, R, or L depending on range
Entirely unassigned planes 4–13
Should software reject them? No — pass through gracefully

Related Terms

More in Unicode Standard