What is นโยบายความเสถียร Unicode?

การรับประกันว่าเมื่อกำหนดอักขระแล้ว จุดรหัสและชื่อจะไม่มีวันเปลี่ยนแปลง คุณสมบัติอาจได้รับการปรับปรุงแต่การกำหนดนั้นถาวร

คุณสมบัติ

นามแฝงชื่อ

ชื่อทางเลือกสำหรับอักขระ เนื่องจากชื่อ Unicode ไม่สามารถเปลี่ยนแปลงได้ตามนโยบายความเสถียร ใช้สำหรับการแก้ไข คำย่อ และชื่อที่กำหนดใหม่

2022-04-19 · Updated 2024-12-09

What Are Name Aliases?

A Name Alias is an alternate, officially recognized name for a Unicode character. While every assigned character has a formal Name property (or a generated name like <CJK UNIFIED IDEOGRAPH-4E2D>), some characters have additional aliases for several reasons:

Correction: The formal name contains a historical error that cannot be changed (Unicode names are immutable once published), so a corrected name is provided as a Correction alias.
Control code names: Characters in the C0 and C1 control ranges (U+0000–U+001F, U+007F–U+009F) have formal names like NULL or no readable name at all; their familiar abbreviations (NUL, LF, CR, DEL) are registered as Control aliases.
Abbreviations: Widely used short names like ZWSP (for ZERO WIDTH SPACE) or BOM (for BYTE ORDER MARK).
Figments: Names that appeared in published Unicode data due to errors and were then retracted.

The BOM Case Study

One of the most instructive examples is U+FEFF:

Formal name: ZERO WIDTH NO-BREAK SPACE
Name alias (Abbreviation): ZWNBSP
Name alias (Alternate): BYTE ORDER MARK
Name alias (Abbreviation): BOM

The name ZERO WIDTH NO-BREAK SPACE is the historical, immutable name. The BOM function—indicating byte order in UTF-16/UTF-32 streams—was added later, but Unicode names cannot be changed. The alias BYTE ORDER MARK documents the actual common use.

import unicodedata

# unicodedata.name() returns the formal name only
print(unicodedata.name("\uFEFF"))
# ZERO WIDTH NO-BREAK SPACE

# unicodedata.lookup() works with both formal names and aliases
bom_by_alias = unicodedata.lookup("BYTE ORDER MARK")
print(f"U+{ord(bom_by_alias):04X}")
# U+FEFF

# Control character aliases
nul = unicodedata.lookup("NUL")          # U+0000
cr  = unicodedata.lookup("CARRIAGE RETURN")  # U+000D
lf  = unicodedata.lookup("LINE FEED")    # U+000A
print(ord(nul), ord(cr), ord(lf))
# 0 13 10

Alias Types

The Unicode Standard defines five alias types:

Type	Description	Example
`correction`	Fixes a published name error	U+FE18 → correct name
`control`	C0/C1 familiar abbreviation	U+0009 → `TAB`
`figment`	Erroneous name, retracted	U+E000 entry
`alternate`	Alternative widely-used name	U+FEFF → `BYTE ORDER MARK`
`abbreviation`	Short form of the name	U+FEFF → `BOM`

Quick Facts

Property	Value
Unicode property name	`Name_Alias`
Short alias	`na1` (for first alias)
Python `unicodedata.lookup()`	Supports aliases since Python 3.x
Python `unicodedata.name()`	Returns formal name only
Immutability of formal Name	Names cannot change; aliases provide corrections
Spec reference	Unicode Standard Annex #44, `NameAliases.txt`

คำศัพท์ที่เกี่ยวข้อง

นโยบายความเสถียร Unicode

เพิ่มเติมใน คุณสมบัติ

East Asian Width

Unicode property (UAX#11) classifying characters as Narrow, Wide, Fullwidth, Halfwidth, Ambiguous, or …

Joining Type

Unicode property controlling how Arabic and Syriac characters connect to adjacent characters. …

Script Extensions

Unicode property listing all scripts that use a character, broader than the …

กลุ่มกราฟีม

อักขระที่ผู้ใช้รับรู้ได้ — สิ่งที่รู้สึกเหมือนหน่วยเดียว อาจประกอบด้วยหลายจุดรหัส (ฐาน + เครื่องหมายรวม หรือลำดับ emoji ZWJ) 👩‍💻 = …

การแมปตัวพิมพ์

กฎสำหรับแปลงอักขระระหว่างตัวพิมพ์ใหญ่ ตัวพิมพ์เล็ก และตัวพิมพ์หัวเรื่อง อาจขึ้นอยู่กับ locale (ปัญหาตัว I ในภาษาตุรกี) และอาจเป็นแบบหนึ่ง-ต่อ-หลาย (ß → SS)

การแยกส่วน

การแมปอักขระเป็นส่วนประกอบย่อย การแยกส่วนแบบ canonical รักษาความหมาย (é → e + ́) ในขณะที่การแยกส่วนแบบ compatibility อาจเปลี่ยนความหมาย …

คลาสการรวม

ค่าตัวเลข (0–254) ที่ควบคุมลำดับของเครื่องหมายรวมระหว่างการแยกส่วนแบบ canonical กำหนดว่าเครื่องหมายรวมใดสามารถเรียงลำดับใหม่ได้

ความสมมูลความเข้ากันได้

ลำดับอักขระสองชุดที่มีเนื้อหาเชิงนามธรรมเดียวกันแต่อาจแตกต่างในรูปลักษณ์ กว้างกว่าความเท่าเทียมแบบ canonical ตัวอย่าง: ﬁ ≈ fi, ² ≈ 2

ความสมมูลมาตรฐาน

ลำดับอักขระสองชุดที่มีความหมายเหมือนกันและควรถือว่าเท่าเทียมกัน ตัวอย่าง: é (U+00E9) ≡ e + ◌́ (U+0065 + U+0301)

คุณสมบัติการสะท้อน

อักขระที่รูปร่างควรสะท้อนในแนวนอนในบริบท RTL ตัวอย่าง: ( → ), [ → ], { → }, …

← กลับไปยังอภิธานศัพท์