Age 属性
字符首次被分配时所在的Unicode版本,有助于判断各系统和软件版本的字符支持情况。
What Is the Age Property?
The Age property records the Unicode version in which a character was first assigned a code point. This property allows developers and researchers to determine which version of Unicode introduced a particular character—essential for compatibility testing, font planning, and detecting whether a string contains characters newer than a supported Unicode version.
Age values are version strings like 1.1, 2.0, 3.0, ..., 15.1. A code point that has never been assigned has an implied age of Unassigned. The property is immutable: a character's Age never changes once assigned, even if its properties (name, category) are later corrected.
Checking Age in Python
Python's unicodedata module does not expose Age directly, but the unicodedata.unidata_version string tells you which Unicode version the module implements. To check a character's introduction version you can use the unicodedata module's character lookup or a third-party library:
import unicodedata
import sys
# The Unicode version Python's unicodedata module implements
print(unicodedata.unidata_version) # e.g., "15.1.0"
print(sys.version)
# For Age lookups, use the 'unicodedata2' or 'unicodedataplus' package
# pip install unicodedataplus
try:
import unicodedataplus as udp
for char in ["A", "€", "😀", "\U0001F600"]:
age = udp.age(char)
name = unicodedata.name(char, "<unnamed>")
print(f" U+{ord(char):04X} Age={age:6} {name}")
except ImportError:
print("Install unicodedataplus for Age support")
# U+0041 Age=1.1 LATIN CAPITAL LETTER A
# U+20AC Age=2.1 EURO SIGN
# U+1F600 Age=6.1 GRINNING FACE
Why Age Matters
Emoji support timelines: Emoji characters have been added across many Unicode versions. Age tells you whether a character requires Unicode 6.0+ (face emoji), 8.0+ (skin tone modifiers), or 13.0+ (newer additions). Mobile operating systems typically lag Unicode releases by 12–24 months.
Legacy system compatibility: A database or file format limited to Unicode 3.0 cannot store CJK Extension B characters (added in 4.1) or any emoji. Age detection lets you validate input against a maximum supported version.
Font auditing: Font engineers use Age to prioritize glyph development—newer characters with high Age values may not yet be covered by commonly deployed fonts.
Notable Age Milestones
| Version | Year | Notable Additions |
|---|---|---|
| 1.1 | 1993 | Latin, Greek, Cyrillic, CJK core, Hebrew, Arabic |
| 2.1 | 1998 | Euro sign €, many combining marks |
| 4.1 | 2005 | CJK Extension B (42,711 characters) |
| 6.0 | 2010 | Emoji (first large batch, 722 characters) |
| 8.0 | 2015 | Skin tone modifiers |
| 13.0 | 2020 | 55 new emoji, Chorasmian, Yezidi scripts |
| 15.1 | 2023 | Latest release |
Quick Facts
| Property | Value |
|---|---|
| Unicode property name | Age |
| Type | Enumerated (version string) |
| Python built-in | No (use unicodedataplus or regex package) |
| Immutability | Age never changes after assignment |
| Unassigned value | Unassigned (implied) |
| Spec reference | Unicode Standard Annex #44, DerivedAge.txt |
相关术语
字符属性 中的更多内容
Unicode property (UAX#11) classifying characters as Narrow, Wide, Fullwidth, Halfwidth, Ambiguous, or …
Unicode property controlling how Arabic and Syriac characters connect to adjacent characters. …
Unicode property listing all scripts that use a character, broader than the …
将每个码位归入30个类别(Lu、Ll、Nd、So等)之一的分类体系,分为7大类:字母、标记、数字、标点、符号、分隔符和其他。
具有相同抽象内容但外观可能不同的两个字符序列,比规范等价更宽泛,例如fi ≈ fi,² ≈ 2。
将字符映射为其组成部分的过程。规范分解保留语义(é → e + ◌́),兼容分解可能改变语义(fi → fi)。
命名的连续码位范围(如基本拉丁文 = U+0000–U+007F)。Unicode 16.0定义了336个区块,每个码位恰好属于一个区块。
决定字符在双向文本中(LTR、RTL、弱、中性)行为方式的属性,由Unicode双向算法用于确定显示顺序。
由于稳定性策略规定Unicode名称不可更改,因此提供字符的备用名称,用于更正、缩写和别名。
将字符在大写、小写和标题大小写之间转换的规则,可能因区域设置而异(土耳其语I问题),也存在一对多映射(ß → SS)。