Thuộc tính

Bí danh tên

Tên thay thế cho các ký tự, vì tên Unicode không thể thay đổi theo chính sách ổn định. Được sử dụng để sửa lỗi, viết tắt và các ký tự không chính xác.

· Updated

What Are Name Aliases?

A Name Alias is an alternate, officially recognized name for a Unicode character. While every assigned character has a formal Name property (or a generated name like <CJK UNIFIED IDEOGRAPH-4E2D>), some characters have additional aliases for several reasons:

  • Correction: The formal name contains a historical error that cannot be changed (Unicode names are immutable once published), so a corrected name is provided as a Correction alias.
  • Control code names: Characters in the C0 and C1 control ranges (U+0000–U+001F, U+007F–U+009F) have formal names like NULL or no readable name at all; their familiar abbreviations (NUL, LF, CR, DEL) are registered as Control aliases.
  • Abbreviations: Widely used short names like ZWSP (for ZERO WIDTH SPACE) or BOM (for BYTE ORDER MARK).
  • Figments: Names that appeared in published Unicode data due to errors and were then retracted.

The BOM Case Study

One of the most instructive examples is U+FEFF:

  • Formal name: ZERO WIDTH NO-BREAK SPACE
  • Name alias (Abbreviation): ZWNBSP
  • Name alias (Alternate): BYTE ORDER MARK
  • Name alias (Abbreviation): BOM

The name ZERO WIDTH NO-BREAK SPACE is the historical, immutable name. The BOM function—indicating byte order in UTF-16/UTF-32 streams—was added later, but Unicode names cannot be changed. The alias BYTE ORDER MARK documents the actual common use.

import unicodedata

# unicodedata.name() returns the formal name only
print(unicodedata.name("\uFEFF"))
# ZERO WIDTH NO-BREAK SPACE

# unicodedata.lookup() works with both formal names and aliases
bom_by_alias = unicodedata.lookup("BYTE ORDER MARK")
print(f"U+{ord(bom_by_alias):04X}")
# U+FEFF

# Control character aliases
nul = unicodedata.lookup("NUL")          # U+0000
cr  = unicodedata.lookup("CARRIAGE RETURN")  # U+000D
lf  = unicodedata.lookup("LINE FEED")    # U+000A
print(ord(nul), ord(cr), ord(lf))
# 0 13 10

Alias Types

The Unicode Standard defines five alias types:

Type Description Example
correction Fixes a published name error U+FE18 → correct name
control C0/C1 familiar abbreviation U+0009 → TAB
figment Erroneous name, retracted U+E000 entry
alternate Alternative widely-used name U+FEFF → BYTE ORDER MARK
abbreviation Short form of the name U+FEFF → BOM

Quick Facts

Property Value
Unicode property name Name_Alias
Short alias na1 (for first alias)
Python unicodedata.lookup() Supports aliases since Python 3.x
Python unicodedata.name() Returns formal name only
Immutability of formal Name Names cannot change; aliases provide corrections
Spec reference Unicode Standard Annex #44, NameAliases.txt

Thuật ngữ liên quan

Thêm trong Thuộc tính