Unicode Version
Major releases of the Unicode Standard, each adding new characters, scripts, and features. The current version is Unicode 16.0 (September 2025).
What is a Unicode Version?
A Unicode version is a numbered, dated release of the Unicode Standard that may add new characters, scripts, blocks, emoji, or property values, and may update algorithms, documentation, or normalization tables. Each version is identified by a major.minor numbering scheme (e.g., 15.0, 15.1, 16.0), and versions are published on a roughly annual cycle.
Understanding Unicode versions matters for software developers because: - An application may not support characters added in versions newer than its Unicode library - Character properties can change between minor versions (though stability policies limit this) - Emoji added in a new version will display as boxes ("tofu") on devices with older Unicode support
Version History Highlights
| Version | Year | Notable Additions |
|---|---|---|
| 1.0 | 1991 | First release — 7,129 characters, 24 scripts |
| 2.0 | 1996 | Extended to 17 planes; surrogate pairs standardized |
| 3.0 | 1999 | 49,194 characters; Braille, Ethiopic, Cherokee |
| 4.0 | 2003 | Cypriot, Gothic, Shavian |
| 5.0 | 2006 | NKo, Balinese, Coptic |
| 6.0 | 2010 | Emoji officially added (722 characters) |
| 7.0 | 2014 | Pictographic symbols, chess symbols |
| 8.0 | 2015 | Skin tone modifiers for emoji |
| 10.0 | 2017 | Bitcoin sign (₿), 56 emoji |
| 12.0 | 2019 | 61 new emoji including gender-neutral options |
| 13.0 | 2020 | 55 new emoji; Chorasmian, Dives Akuru scripts |
| 14.0 | 2021 | 37 new emoji; Toto, Vithkuqi scripts |
| 15.0 | 2022 | 20 new emoji; Kawi, Nag Mundari |
| 15.1 | 2023 | Minor update; 627 new CJK ideographs |
| 16.0 | 2024 | 154,998 total characters; Garay, Sunuwar scripts |
Major vs Minor Versions
Unicode uses a two-level versioning scheme:
- Major versions (e.g., 15.0 → 16.0): Can add new scripts, new blocks, new characters, and may update normalization data
- Minor versions (e.g., 15.0 → 15.1): More limited — typically add characters to existing blocks, fix documentation errors, or update property values for already-assigned characters. The Unicode Stability Policy restricts what can change in minor versions.
Checking Unicode Version in Code
import sys
import unicodedata
# Python's supported Unicode version
print(sys.version) # includes Python version
print(unicodedata.unidata_version) # e.g., "15.0.0"
# Check if a character was assigned in the current version
name = unicodedata.name("😀", "UNKNOWN")
print(name) # "GRINNING FACE"
// JavaScript environments don't expose Unicode version directly
// but Intl.Collator and RegExp /u flag use the engine's Unicode version
// In V8 (Node.js), check:
// node --version alongside ICU version in process.versions.icu
console.log(process.versions.unicode); // e.g., "15.0"
Version Compatibility Issues
The "tofu" problem: Emoji added in Unicode 15.0 display as a box or question mark on devices that shipped before version 15.0 support was added to their OS. This is most common on older Android versions.
Library lag: Python 3.11 uses Unicode 15.0; Python 3.13 uses Unicode 15.1. If your application processes text with characters from a version newer than your Python's Unicode library, those characters may be misclassified.
Normalization stability: Once a character's decomposition is defined, it cannot change. This means normalization form conversions are version-stable for all previously assigned characters.
Quick Facts
| Property | Value |
|---|---|
| Current version | 16.0 (September 2024) |
| Total assigned (v16.0) | 154,998 characters |
| New scripts in v16.0 | Garay, Sunuwar, Gurung Khema, Kirat Rai, Ol Onal |
| Release cadence | Approximately annual |
| Version format | major.minor (e.g., 15.1) |
| Stability guarantee | Assigned characters never removed or reassigned |
| Emoji count (v16.0) | ~3,790 (with all variations/sequences) |
Related Terms
More in Unicode Standard
A unit of information used for organizing, controlling, or representing textual data …
A code point that has been given a character designation in a …
Plane 0 (U+0000–U+FFFF), containing the most commonly used characters including Latin, Greek, …
Chinese, Japanese, and Korean — the collective term for the unified Han …
A numerical value in the Unicode code space (U+0000 to U+10FFFF), written …
The complete range of possible Unicode code points: U+0000 to U+10FFFF (1,114,112 …
The minimal unit of encoding: an 8-bit byte in UTF-8, a 16-bit …
The process of mapping Chinese, Japanese, and Korean ideographs that share a …
The individual consonant and vowel components (jamo) of the Korean Hangul writing …
International standard (ISO/IEC 10646) synchronized with Unicode, defining the same character repertoire …