What is Kod noktası?

Unicode kod alanındaki sayısal değer (U+0000 ile U+10FFFF arası), U+XXXX şeklinde yazılır. Tüm kod noktaları karakterlere atanmış değildir.

What is Karakter olmayan?

Dahili kullanım için kalıcı olarak ayrılmış kod noktaları (toplam 66): U+FDD0–U+FDEF ve her düzlem için U+nFFFE/U+nFFFF. Metinde geçerlidir ancak harici olarak paylaşılmamalıdır.

What is Atanmamış kod noktası?

Henüz hiçbir Unicode sürümünde bir karaktere atanmamış kod noktası, Cn (Atanmamış) olarak kategorize edilir. Gelecek sürümlerde atanabilir.

Unicode Standardı

Atanmış karakter

Bir Unicode sürümünde karakter ataması yapılmış kod noktası. Unicode 16.0 itibariyle, 1.114.112 olası kod noktasından 154.998'i atanmıştır.

2021-08-09 · Updated 2024-11-04

What is an Assigned Character?

An assigned character is a Unicode code point that has been given a formal designation in the Unicode Standard — it represents a specific character, symbol, or abstract entity with an official name, category, and set of properties. The Unicode Consortium adds new assigned characters in each version of the standard; once a code point is assigned, it is permanently assigned and the assignment is never revoked or changed.

As of Unicode 16.0, approximately 154,998 code points are assigned characters out of the 1,114,112 total code space.

What Makes a Code Point "Assigned"

A code point is assigned when the Unicode Consortium:

Gives it a normative character name (e.g., LATIN SMALL LETTER A, SNOWMAN, GRINNING FACE)
Assigns it a General Category (letter, number, punctuation, symbol, etc.)
Defines its relevant character properties in the Unicode Character Database

The formal assignment appears in UnicodeData.txt — the primary UCD file. Any code point with an entry in that file (other than range sentinels) is assigned.

Categories of Assigned Characters

Assigned characters are not all printable glyphs. The Unicode Standard assigns code points to:

Category	Examples	General Category Code
Letters	A, a, α, あ, 字	Lu, Ll, Lo...
Digits	0–9, ², ③	Nd, Nl, No
Punctuation	. , ! « »	Po, Ps, Pe...
Symbols	€, ©, ★, ☃, 😀	So, Sm, Sc, Sk
Marks	combining acute ◌́	Mn, Mc, Me
Separators	space, line separator	Zs, Zl, Zp
Control codes	U+0009 TAB, U+000A LF	Cc
Format characters	U+200C ZWNJ, U+FEFF BOM	Cf

Even control characters like TAB, LF, and NULL (U+0000) are assigned characters — they have official names and category Cc (Control).

Checking Assignment Status

import unicodedata

def is_assigned(char: str) -> bool:
    # Cn = Unassigned; Co = Private Use; Cs = Surrogate
    # All other categories indicate assigned characters
    cat = unicodedata.category(char)
    return cat not in ("Cn",)  # Cn = not assigned

# Assigned characters
print(is_assigned("A"))      # True — Lu (Uppercase Letter)
print(is_assigned("😀"))     # True — So (Other Symbol)
print(is_assigned("\t"))     # True — Cc (Control)
print(is_assigned("\uE001")) # True — Co (Private Use — assigned category, user-defined)

# Unassigned
print(is_assigned("\u0378")) # False — Cn (Unassigned)

# Get character name
print(unicodedata.name("A"))      # LATIN CAPITAL LETTER A
print(unicodedata.name("😀"))     # GRINNING FACE
print(unicodedata.name("\t"))     # HORIZONTAL TABULATION

Stability of Assignments

The Unicode Stability Policy guarantees that once a code point is assigned: - Its character name is permanent (corrections become formal aliases, not replacements) - Its General Category will not change in ways that break normalization or sorting - Its decomposition mapping will not change - The code point will never be unassigned or reassigned to a different character

This stability is essential for backward compatibility: text files created with Unicode 1.0 can still be read correctly with Unicode 16.0 implementations.

Growth Over Versions

Version	Year	Assigned Characters
1.0	1991	7,129
3.0	1999	49,194
5.0	2006	99,024
8.0	2015	120,737
12.0	2019	137,994
15.0	2022	149,186
16.0	2024	154,998

Quick Facts

Property	Value
Total assigned (v16.0)	154,998
Percentage of code space	~13.9%
Earliest assigned	U+0000–U+007F (ASCII, Unicode 1.0)
General category for unassigned	Cn
Stability guarantee	Never unassigned or reassigned
Primary source file	UnicodeData.txt
Includes control characters?	Yes (U+0000–U+001F, U+007F, U+0080–U+009F)

İlgili Terimler

Kod noktası Karakter olmayan Atanmamış kod noktası

Unicode Standardı içinde daha fazlası

Atanmamış kod noktası

Henüz hiçbir Unicode sürümünde bir karaktere atanmamış kod noktası, Cn (Atanmamış) olarak …

Ayrılmış kod noktası

Gelecekteki standardizasyon için ayrılmış kod noktası; kalıcı olarak ayrılan noncharacter'lardan ve kullanıcı …

Basic Multilingual Plane (BMP)

Düzlem 0 (U+0000–U+FFFF), Latin, Yunan, Kiril, CJK, Arap ve çoğu sembol dahil …

CJK

Çince, Japonca ve Korece — Unicode'da birleştirilmiş Han ideograf bloğu ve ilgili …

Düzlem

65.536 kod noktasından oluşan bitişik blok. Unicode'da 17 düzlem vardır (0–16): Düzlem …

Ek düzlem

Düzlem 1–16 (U+10000–U+10FFFF), emoji, tarihi yazılar, CJK uzantıları ve müzik notasyonu içerir. …

Han Unification

The process of mapping Chinese, Japanese, and Korean ideographs that share a …

Hangul Jamo

The individual consonant and vowel components (jamo) of the Korean Hangul writing …

ISO 10646 / Universal Character Set

Unicode ile senkronize edilmiş, aynı karakter repertuvarını ve kod noktalarını tanımlayan ancak …

Karakter olmayan

Dahili kullanım için kalıcı olarak ayrılmış kod noktaları (toplam 66): U+FDD0–U+FDEF ve …

← Sözlüğe Geri Dön