Phiên bản Unicode
Các phiên bản chính của Tiêu chuẩn Unicode, mỗi phiên bản bổ sung ký tự, chữ viết và tính năng mới. Phiên bản hiện tại là Unicode 16.0 (tháng 9 năm 2025).
What is a Unicode Version?
A Unicode version is a numbered, dated release of the Unicode Standard that may add new characters, scripts, blocks, emoji, or property values, and may update algorithms, documentation, or normalization tables. Each version is identified by a major.minor numbering scheme (e.g., 15.0, 15.1, 16.0), and versions are published on a roughly annual cycle.
Understanding Unicode versions matters for software developers because: - An application may not support characters added in versions newer than its Unicode library - Character properties can change between minor versions (though stability policies limit this) - Emoji added in a new version will display as boxes ("tofu") on devices with older Unicode support
Version History Highlights
| Version | Year | Notable Additions |
|---|---|---|
| 1.0 | 1991 | First release — 7,129 characters, 24 scripts |
| 2.0 | 1996 | Extended to 17 planes; surrogate pairs standardized |
| 3.0 | 1999 | 49,194 characters; Braille, Ethiopic, Cherokee |
| 4.0 | 2003 | Cypriot, Gothic, Shavian |
| 5.0 | 2006 | NKo, Balinese, Coptic |
| 6.0 | 2010 | Emoji officially added (722 characters) |
| 7.0 | 2014 | Pictographic symbols, chess symbols |
| 8.0 | 2015 | Skin tone modifiers for emoji |
| 10.0 | 2017 | Bitcoin sign (₿), 56 emoji |
| 12.0 | 2019 | 61 new emoji including gender-neutral options |
| 13.0 | 2020 | 55 new emoji; Chorasmian, Dives Akuru scripts |
| 14.0 | 2021 | 37 new emoji; Toto, Vithkuqi scripts |
| 15.0 | 2022 | 20 new emoji; Kawi, Nag Mundari |
| 15.1 | 2023 | Minor update; 627 new CJK ideographs |
| 16.0 | 2024 | 154,998 total characters; Garay, Sunuwar scripts |
Major vs Minor Versions
Unicode uses a two-level versioning scheme:
- Major versions (e.g., 15.0 → 16.0): Can add new scripts, new blocks, new characters, and may update normalization data
- Minor versions (e.g., 15.0 → 15.1): More limited — typically add characters to existing blocks, fix documentation errors, or update property values for already-assigned characters. The Unicode Stability Policy restricts what can change in minor versions.
Checking Unicode Version in Code
import sys
import unicodedata
# Python's supported Unicode version
print(sys.version) # includes Python version
print(unicodedata.unidata_version) # e.g., "15.0.0"
# Check if a character was assigned in the current version
name = unicodedata.name("😀", "UNKNOWN")
print(name) # "GRINNING FACE"
// JavaScript environments don't expose Unicode version directly
// but Intl.Collator and RegExp /u flag use the engine's Unicode version
// In V8 (Node.js), check:
// node --version alongside ICU version in process.versions.icu
console.log(process.versions.unicode); // e.g., "15.0"
Version Compatibility Issues
The "tofu" problem: Emoji added in Unicode 15.0 display as a box or question mark on devices that shipped before version 15.0 support was added to their OS. This is most common on older Android versions.
Library lag: Python 3.11 uses Unicode 15.0; Python 3.13 uses Unicode 15.1. If your application processes text with characters from a version newer than your Python's Unicode library, those characters may be misclassified.
Normalization stability: Once a character's decomposition is defined, it cannot change. This means normalization form conversions are version-stable for all previously assigned characters.
Quick Facts
| Property | Value |
|---|---|
| Current version | 16.0 (September 2024) |
| Total assigned (v16.0) | 154,998 characters |
| New scripts in v16.0 | Garay, Sunuwar, Gurung Khema, Kirat Rai, Ol Onal |
| Release cadence | Approximately annual |
| Version format | major.minor (e.g., 15.1) |
| Stability guarantee | Assigned characters never removed or reassigned |
| Emoji count (v16.0) | ~3,790 (with all variations/sequences) |
Thuật ngữ liên quan
Thêm trong Tiêu chuẩn Unicode
Mặt phẳng 0 (U+0000–U+FFFF), chứa các ký tự được sử dụng phổ …
Đảm bảo rằng một khi ký tự được gán, điểm mã và …
Trung Quốc, Nhật Bản và Hàn Quốc — thuật ngữ tập thể …
Bất kỳ điểm mã nào ngoại trừ các điểm mã surrogate (U+D800–U+DFFF). …
The process of mapping Chinese, Japanese, and Korean ideographs that share a …
The individual consonant and vowel components (jamo) of the Korean Hangul writing …
Tổ chức phi lợi nhuận phát triển và duy trì Tiêu chuẩn …
Tiêu chuẩn quốc tế (ISO/IEC 10646) được đồng bộ hóa với Unicode, …
Toàn bộ phạm vi các điểm mã Unicode có thể có: U+0000 …
Các điểm mã U+D800–U+DFFF được dành riêng cho các cặp thay thế …