Alias de nom
Noms alternatifs pour les caractères, les noms Unicode ne pouvant pas changer selon la politique de stabilité. Utilisés pour les corrections, abréviations et figments.
What Are Name Aliases?
A Name Alias is an alternate, officially recognized name for a Unicode character. While every assigned character has a formal Name property (or a generated name like <CJK UNIFIED IDEOGRAPH-4E2D>), some characters have additional aliases for several reasons:
- Correction: The formal name contains a historical error that cannot be changed (Unicode names are immutable once published), so a corrected name is provided as a
Correctionalias. - Control code names: Characters in the C0 and C1 control ranges (U+0000–U+001F, U+007F–U+009F) have formal names like
NULLor no readable name at all; their familiar abbreviations (NUL,LF,CR,DEL) are registered asControlaliases. - Abbreviations: Widely used short names like
ZWSP(for ZERO WIDTH SPACE) orBOM(for BYTE ORDER MARK). - Figments: Names that appeared in published Unicode data due to errors and were then retracted.
The BOM Case Study
One of the most instructive examples is U+FEFF:
- Formal name:
ZERO WIDTH NO-BREAK SPACE - Name alias (Abbreviation):
ZWNBSP - Name alias (Alternate):
BYTE ORDER MARK - Name alias (Abbreviation):
BOM
The name ZERO WIDTH NO-BREAK SPACE is the historical, immutable name. The BOM function—indicating byte order in UTF-16/UTF-32 streams—was added later, but Unicode names cannot be changed. The alias BYTE ORDER MARK documents the actual common use.
import unicodedata
# unicodedata.name() returns the formal name only
print(unicodedata.name("\uFEFF"))
# ZERO WIDTH NO-BREAK SPACE
# unicodedata.lookup() works with both formal names and aliases
bom_by_alias = unicodedata.lookup("BYTE ORDER MARK")
print(f"U+{ord(bom_by_alias):04X}")
# U+FEFF
# Control character aliases
nul = unicodedata.lookup("NUL") # U+0000
cr = unicodedata.lookup("CARRIAGE RETURN") # U+000D
lf = unicodedata.lookup("LINE FEED") # U+000A
print(ord(nul), ord(cr), ord(lf))
# 0 13 10
Alias Types
The Unicode Standard defines five alias types:
| Type | Description | Example |
|---|---|---|
correction |
Fixes a published name error | U+FE18 → correct name |
control |
C0/C1 familiar abbreviation | U+0009 → TAB |
figment |
Erroneous name, retracted | U+E000 entry |
alternate |
Alternative widely-used name | U+FEFF → BYTE ORDER MARK |
abbreviation |
Short form of the name | U+FEFF → BOM |
Quick Facts
| Property | Value |
|---|---|
| Unicode property name | Name_Alias |
| Short alias | na1 (for first alias) |
Python unicodedata.lookup() |
Supports aliases since Python 3.x |
Python unicodedata.name() |
Returns formal name only |
| Immutability of formal Name | Names cannot change; aliases provide corrections |
| Spec reference | Unicode Standard Annex #44, NameAliases.txt |
Termes associés
Plus dans Propriétés
Plage contiguë nommée de points de code (par ex. Basic Latin = …
Propriété déterminant le comportement d'un caractère dans un texte bidirectionnel (LTR, RTL, …
Classification de chaque point de code dans l'une des 30 catégories (Lu, …
Valeur numérique (0–254) contrôlant l'ordre des marques combinantes lors de la décomposition …
Règles de conversion des caractères entre majuscules, minuscules et casse de titre. …
La décomposition d'un caractère en ses éléments constitutifs. La décomposition canonique préserve …
Unicode property (UAX#11) classifying characters as Narrow, Wide, Fullwidth, Halfwidth, Ambiguous, or …
Le « caractère » perçu par l'utilisateur — ce qui ressemble à …
Caractères ne devant avoir aucun effet visible et pouvant être ignorés par …
Unicode property controlling how Arabic and Syriac characters connect to adjacent characters. …