Unicode Versions Timeline
Unicode has released major versions regularly since 1.0 in 1991, with each release adding thousands of new characters, emoji, and scripts from around the world. This timeline covers every Unicode version, its key additions, and how the standard has grown to cover over 140,000 characters.
Every version of the Unicode Standard has expanded the range of human writing that computers can represent. From the modest 7,161 characters of Unicode 1.0 in 1991 to the 154,998 characters of Unicode 16.0 in 2024, each release tells a story about which communities gained digital representation and which technical problems were solved. This guide provides a comprehensive chronological reference of every major Unicode release.
How Unicode Versioning Works
The Unicode Consortium publishes the standard using a major.minor versioning scheme. Major versions (e.g., 15.0, 16.0) typically add new scripts, characters, and emoji. Minor versions (e.g., 15.1) are maintenance releases that add smaller batches of characters or fix errata without introducing major new features.
Each release includes:
- Character additions -- new code point assignments
- Script additions -- entirely new writing systems
- Property updates -- changes to character properties (category, bidirectional class, etc.)
- Algorithm updates -- revisions to normalization, collation, segmentation, and other algorithms
- Emoji additions -- new emoji characters and sequences (since Unicode 6.0)
- Stability guarantees -- once a code point is assigned, its core identity never changes
The full release cycle typically takes 12 months, with beta review periods allowing implementers to prepare.
The Early Years: Establishing the Foundation (1991 -- 1999)
Unicode 1.0 -- October 1991
| Metric | Value |
|---|---|
| Total Characters | 7,161 |
| Scripts | 24 |
| Planes Used | 1 (BMP only) |
The first release of the Unicode Standard. Published as a two-volume printed book. Covered the major scripts of the modern world: Latin, Greek, Cyrillic, Arabic, Hebrew, Devanagari, and other Indic scripts, Thai, Lao, and the massive CJK Unified Ideographs block (20,902 characters).
The original design assumed that 16 bits (65,536 code points) would be sufficient for all of the world's characters -- an assumption that would be revisited within five years.
Unicode 1.0.1 -- June 1992
| Metric | Value |
|---|---|
| Total Characters | 28,359 |
| Change | +21,198 characters |
A major update that added the initial set of Hangul syllables (used for Korean) and CJK Compatibility Ideographs. This was considered part of Unicode 1.0 in practice.
Unicode 1.1 -- June 1993
| Metric | Value |
|---|---|
| Total Characters | 34,168 |
| Change | +5,809 characters |
Synchronized with ISO/IEC 10646-1:1993, completing the merger between the Unicode and ISO character set efforts. Added Tibetan and additional CJK characters.
Unicode 2.0 -- July 1996
| Metric | Value |
|---|---|
| Total Characters | 38,950 |
| Change | +4,782 characters |
| Major Change | Surrogate mechanism; code space expanded to 17 planes (1,114,112 code points) |
A pivotal release. Recognized that 65,536 code points were not enough and introduced the surrogate pair mechanism for UTF-16, expanding the addressable space to over one million code points. Also completely replaced the original Hangul block with 11,172 algorithmically composed Hangul syllables -- the only time Unicode broke backward compatibility.
UTF-16 was formalized as the primary encoding form (replacing the original UCS-2).
Unicode 2.1 -- May 1998
| Metric | Value |
|---|---|
| Total Characters | 38,952 |
| Change | +2 characters |
A minimal update that added the Euro Sign (U+20AC) -- one of the most requested characters of its era, needed for the launch of the European single currency in 1999. Also added Object Replacement Character (U+FFFC).
Unicode 3.0 -- September 1999
| Metric | Value |
|---|---|
| Total Characters | 49,259 |
| Change | +10,307 characters |
| New Scripts | Cherokee, Ethiopic, Khmer, Mongolian, Myanmar, Ogham, Runic, Sinhala, Thaana, Canadian Aboriginal Syllabics, Yi |
A major expansion focused on living scripts that had been missing from Unicode. Brought comprehensive coverage of South and Southeast Asian scripts. Also introduced the Unicode Character Database (UCD) as a machine-readable resource, which became the standard reference for implementers.
The Supplementary Planes Open Up (2001 -- 2005)
Unicode 3.1 -- March 2001
| Metric | Value |
|---|---|
| Total Characters | 94,205 |
| Change | +44,946 characters |
| New Scripts | Deseret, Gothic, Old Italic |
| Major Addition | CJK Unified Ideographs Extension B (42,711 characters) |
| Milestone | First characters assigned outside the BMP |
A landmark release that finally used the supplementary planes made possible by Unicode 2.0. CJK Extension B alone added nearly 43,000 rare and historic ideographs to Plane 2 (the Supplementary Ideographic Plane). Plane 1 (the Supplementary Multilingual Plane) received its first historic scripts and musical symbols.
Unicode 3.2 -- March 2002
| Metric | Value |
|---|---|
| Total Characters | 95,221 |
| Change | +1,016 characters |
| New Scripts | Buhid, Hanunoo, Tagalog, Tagbanwa |
Added four Philippine scripts and introduced several important format characters including the Byte Order Mark (U+FEFF) as an official character (it had previously existed as ZERO WIDTH NO-BREAK SPACE).
Unicode 4.0 -- April 2003
| Metric | Value |
|---|---|
| Total Characters | 96,447 |
| Change | +1,226 characters |
| New Scripts | Cypriot, Limbu, Linear B, Osmanya, Shavian, Tai Le, Ugaritic |
Continued the push into ancient and historic scripts with Linear B (the oldest deciphered Greek writing system, dating to ~1450 BCE) and Cypriot (an ancient syllabary from Cyprus). Also added Osmanya, a script invented in the 1920s for the Somali language.
Unicode 4.1 -- March 2005
| Metric | Value |
|---|---|
| Total Characters | 97,720 |
| Change | +1,273 characters |
| New Scripts | Buginese, Coptic (separated from Greek), Glagolitic, Kharoshthi, New Tai Lue, Old Persian, Syloti Nagri, Tifinagh |
Notable for separating Coptic from the Greek script block (previously they shared code points) and adding Old Persian cuneiform, the writing system of the Achaemenid Empire.
Global Coverage Accelerates (2006 -- 2012)
Unicode 5.0 -- July 2006
| Metric | Value |
|---|---|
| Total Characters | 99,089 |
| Change | +1,369 characters |
| New Scripts | Balinese, Cuneiform, N'Ko, Phags-pa, Phoenician |
Crossed the 99,000 character threshold. Added Sumerian/Akkadian Cuneiform (one of the oldest writing systems, ~3400 BCE) and N'Ko (a modern script for Manding languages in West Africa, invented in 1949).
Unicode 5.1 -- April 2008
| Metric | Value |
|---|---|
| Total Characters | 100,713 |
| Change | +1,624 characters |
| New Scripts | Carian, Cham, Kayah Li, Lepcha, Lycian, Lydian, Ol Chiki, Rejang, Saurashtra, Sundanese, Vai |
| Milestone | Passed 100,000 characters |
Broke the 100,000-character barrier. Added eleven new scripts in a single release -- the most ever at that time -- including several ancient Anatolian scripts (Carian, Lycian, Lydian) and minority scripts from South and Southeast Asia.
Unicode 5.2 -- October 2009
| Metric | Value |
|---|---|
| Total Characters | 107,361 |
| Change | +6,648 characters |
| New Scripts | Avestan, Bamum, Egyptian Hieroglyphs, Imperial Aramaic, Inscriptional Pahlavi, Inscriptional Parthian, Javanese, Kaithi, Lisu, Meetei Mayek, Old South Arabian, Old Turkic, Samaritan, Tai Tham, Tai Viet |
A massive release with 15 new scripts. The standout addition was Egyptian Hieroglyphs (1,071 characters) -- encoding a 5,000-year-old writing system was a powerful symbol of Unicode's commitment to comprehensive coverage. Also added several Central Asian historic scripts and additional Southeast Asian scripts.
Unicode 6.0 -- October 2010
| Metric | Value |
|---|---|
| Total Characters | 109,449 |
| Change | +2,088 characters |
| New Scripts | Mandaic, Batak, Brahmi |
| Major Addition | 722 emoji characters |
| New Symbol | Indian Rupee Sign (U+20B9) |
The release that changed everything. Officially added emoji to the Unicode Standard, encoding 722 characters previously used on Japanese mobile phones. This brought Unicode into mainstream public awareness and triggered a cultural phenomenon. Also added the Indian Rupee Sign, requested by the Indian government after they adopted a new currency symbol in 2010.
Unicode 6.1 -- January 2012
| Metric | Value |
|---|---|
| Total Characters | 110,181 |
| Change | +732 characters |
| New Scripts | Chakma, Meroitic Cursive, Meroitic Hieroglyphs, Miao, Sharada, Sora Sompeng, Takri |
Added scripts from the ancient Sudanese kingdom of Meroe and several South Asian scripts. Turkish Lira Sign (U+20BA) was also added.
The Emoji Explosion (2012 -- 2020)
Unicode 6.2 -- September 2012
| Metric | Value |
|---|---|
| Total Characters | 110,187 |
| Change | +6 characters |
The smallest release in Unicode history. Its primary contribution was the Turkish Lira Sign (U+20BA). Also added five other characters.
Unicode 6.3 -- September 2013
| Metric | Value |
|---|---|
| Total Characters | 110,187 |
| Change | 0 new characters (property updates only) |
A maintenance release that added no new characters but introduced important bidirectional formatting characters and updated character properties for improved Arabic and Hebrew text processing.
Unicode 7.0 -- June 2014
| Metric | Value |
|---|---|
| Total Characters | 113,021 |
| Change | +2,834 characters |
| New Scripts | Bassa Vah, Caucasian Albanian, Duployan, Elbasan, Grantha, Khojki, Khudawadi, Linear A, Mahajani, Manichaean, Mende Kikakui, Modi, Mro, Nabataean, Old Hungarian, Old North Arabian, Pahawh Hmong, Palmyrene, Pau Cin Hau, Psalter Pahlavi, Siddham, Tirhuta, Warang Citi |
| New Emoji | ~250 new emoji |
A major release with 23 new scripts -- the all-time record for a single version. Added the Russian Ruble Sign (U+20BD) and a large batch of new emoji. Also included Linear A, the undeciphered writing system of the Minoan civilization.
Unicode 8.0 -- June 2015
| Metric | Value |
|---|---|
| Total Characters | 120,737 |
| Change | +7,716 characters |
| New Scripts | Ahom, Anatolian Hieroglyphs, Hatran, Multani, Old Hungarian, SignWriting |
| Major Addition | Emoji skin tone modifiers (Fitzpatrick scale) |
| Major Addition | CJK Unified Ideographs Extension E (5,762 characters) |
| New Symbol | Lari Sign (Georgian currency, U+20BE) |
Introduced emoji skin tone modifiers, allowing five skin tone variants for human-form emoji. This was a significant step toward emoji diversity and representation. Also added a large CJK extension and Anatolian Hieroglyphs.
Unicode 9.0 -- June 2016
| Metric | Value |
|---|---|
| Total Characters | 128,172 |
| Change | +7,500 characters |
| New Scripts | Adlam, Bhaiksuki, Marchen, Newa, Osage, Tangut |
| New Emoji | 72 new emoji |
Added Adlam, a script invented around 2011 by two teenagers in Guinea for the Fulani language -- one of the youngest scripts ever encoded. Also added Tangut, a complex logographic script from medieval China with 6,136 characters. The Bitcoin Sign was proposed but not added in this version (it was added in a later version as part of the currency symbols).
Unicode 10.0 -- June 2017
| Metric | Value |
|---|---|
| Total Characters | 136,755 |
| Change | +8,518 characters |
| New Scripts | Zanabazar Square, Soyombo, Nushu, Masaram Gondi |
| Major Addition | CJK Unified Ideographs Extension F (7,473 characters) |
| New Emoji | 56 new emoji |
| New Symbol | Bitcoin Sign (U+20BF) |
Added the Bitcoin Sign (\u20bf), reflecting the growing importance of cryptocurrency. Also encoded Nushu, a writing system historically used exclusively by women in Hunan province, China -- making it the only known gender-specific script.
Unicode 11.0 -- June 2018
| Metric | Value |
|---|---|
| Total Characters | 137,439 |
| Change | +684 characters |
| New Scripts | Dogra, Gunjala Gondi, Hanifi Rohingya, Makasar, Medefaidrin, Old Sogdian, Sogdian |
| New Emoji | 157 new emoji |
| New Symbol | Copyleft Sign (U+1F12F) |
Added Hanifi Rohingya, giving digital representation to the Rohingya people of Myanmar. Also introduced many new emoji including superheroes, redheads, and additional skin tone combinations.
Unicode 12.0 -- March 2019
| Metric | Value |
|---|---|
| Total Characters | 137,993 |
| Change | +554 characters |
| New Scripts | Elymaic, Nandinagari, Nyiakeng Puachue Hmong, Wancho |
| New Emoji | 61 new emoji |
Continued expanding coverage of minority scripts. Added the Tamil supplement with historic Tamil characters and fractions.
Unicode 12.1 -- May 2019
| Metric | Value |
|---|---|
| Total Characters | 137,994 |
| Change | +1 character |
Added a single character: the Japanese square era name for Reiwa (U+32FF), needed for the new Japanese imperial era that began on May 1, 2019. The fastest turnaround in Unicode history, driven by the urgent need for the character before the era transition.
Unicode 13.0 -- March 2020
| Metric | Value |
|---|---|
| Total Characters | 143,859 |
| Change | +5,930 characters |
| New Scripts | Chorasmian, Dives Akuru, Khitan Small Script, Yezidi |
| Major Addition | CJK Unified Ideographs Extension G (4,939 characters) |
| New Emoji | 55 new emoji |
Added the Khitan Small Script, a partially deciphered writing system from the Khitan Empire (907--1125 CE), and another large CJK extension.
Modern Releases (2021 -- 2024)
Unicode 14.0 -- September 2021
| Metric | Value |
|---|---|
| Total Characters | 144,697 |
| Change | +838 characters |
| New Scripts | Cypro-Minoan, Old Uyghur, Tangsa, Toto, Vithkuqi |
| New Emoji | 37 new emoji |
| New Symbol | Equals Sign with bump above (U+2AAE) |
Added Cypro-Minoan, an undeciphered Bronze Age script from Cyprus, and Vithkuqi, a script created for the Albanian language in the early 1800s. The release was delayed from its usual June schedule to September due to the COVID-19 pandemic's impact on the editorial process.
Unicode 15.0 -- September 2022
| Metric | Value |
|---|---|
| Total Characters | 149,186 |
| Change | +4,489 characters |
| New Scripts | Kawi, Nag Mundari |
| Major Addition | CJK Unified Ideographs Extension I (622 characters) |
| New Emoji | 31 new emoji (including shaking face, pink heart, moose, and others) |
Added Kawi, the historic script used in inscriptions across Southeast Asia (Indonesia, Malaysia, Philippines), connecting modern scripts like Javanese and Balinese to their ancestor. Also added Nag Mundari, a script created in the 1960s for the Mundari language of eastern India.
Unicode 15.1 -- September 2023
| Metric | Value |
|---|---|
| Total Characters | 149,813 |
| Change | +627 characters |
| New Emoji | 0 (maintenance release) |
A maintenance release focused on CJK ideograph additions (118 new characters) and property updates. No new emoji were added -- a deliberate choice to decouple emoji releases from the core standard versioning.
Unicode 16.0 -- September 2024
| Metric | Value |
|---|---|
| Total Characters | 154,998 |
| Change | +5,185 characters |
| New Scripts | Garay, Gurung Khema, Kirat Rai, Ol Onal, Sunuwar, Todhri, Tulu-Tigalari |
| Major Addition | Egyptian Hieroglyphs Extended-A (991 characters) |
| New Emoji | 7 new emoji |
The latest release added seven new scripts, with a focus on South Asian and West African writing systems. Tulu-Tigalari is a historic script from southern India. Garay is used for the Wolof language in West Africa. The massive Egyptian Hieroglyphs Extended-A block nearly doubled the number of encoded hieroglyphs.
Growth at a Glance
| Year | Version | Total Characters | New Scripts (Cumulative) |
|---|---|---|---|
| 1991 | 1.0 | 7,161 | 24 |
| 1996 | 2.0 | 38,950 | 25 |
| 1999 | 3.0 | 49,259 | 38 |
| 2001 | 3.1 | 94,205 | 41 |
| 2006 | 5.0 | 99,089 | 57 |
| 2008 | 5.1 | 100,713 | 68 |
| 2010 | 6.0 | 109,449 | 74 |
| 2014 | 7.0 | 113,021 | 97 |
| 2016 | 9.0 | 128,172 | 135 |
| 2020 | 13.0 | 143,859 | 154 |
| 2022 | 15.0 | 149,186 | 161 |
| 2024 | 16.0 | 154,998 | 168 |
In 33 years, Unicode has grown from 7,161 characters in 24 scripts to nearly 155,000 characters in 168 scripts -- a 21x increase in character count and a 7x increase in script coverage.
Patterns and Observations
The CJK Expansions
CJK Unified Ideographs are the single largest contributor to Unicode's character count. The extensions alone account for tens of thousands of characters:
| Extension | Version | Characters | Plane |
|---|---|---|---|
| Original CJK | 1.0 | 20,902 | BMP (Plane 0) |
| Extension A | 3.0 | 6,582 | BMP (Plane 0) |
| Extension B | 3.1 | 42,711 | Plane 2 |
| Extension C | 5.2 | 4,149 | Plane 2 |
| Extension D | 6.0 | 222 | Plane 2 |
| Extension E | 8.0 | 5,762 | Plane 2 |
| Extension F | 10.0 | 7,473 | Plane 2 |
| Extension G | 13.0 | 4,939 | Plane 3 |
| Extension H | 15.0 | 4,192 | Plane 3 |
| Extension I | 15.0 | 622 | Plane 2 |
| Total | ~97,554 |
CJK ideographs alone account for roughly 63% of all assigned Unicode characters.
The Emoji Timeline
| Version | Year | Emoji Added | Cumulative Emoji |
|---|---|---|---|
| 6.0 | 2010 | 722 | 722 |
| 7.0 | 2014 | ~250 | ~1,000 |
| 8.0 | 2015 | 41 + skin tones | ~1,100 |
| 9.0 | 2016 | 72 | ~1,200 |
| 10.0 | 2017 | 56 | ~1,300 |
| 11.0 | 2018 | 157 | ~1,500 |
| 12.0/12.1 | 2019 | 61 | ~1,600 |
| 13.0 | 2020 | 55 | ~1,700 |
| 14.0 | 2021 | 37 | ~1,750 |
| 15.0 | 2022 | 31 | ~1,800 |
| 16.0 | 2024 | 7 | ~1,800 |
The number of new emoji per release has been declining since 2018, reflecting both a maturing repertoire and deliberate restraint by the Emoji Subcommittee.
Encoding the Undeciphered
Unicode has encoded several writing systems that have not yet been fully deciphered:
| Script | Version | Era | Status |
|---|---|---|---|
| Linear A | 7.0 (2014) | Minoan, ~1800 BCE | Undeciphered |
| Cypro-Minoan | 14.0 (2021) | Bronze Age Cyprus, ~1550 BCE | Undeciphered |
| Proto-Elamite | Not yet encoded | ~3100 BCE | Undeciphered |
Encoding undeciphered scripts is controversial but enables digital scholarship and corpus analysis that may eventually lead to decipherment.
Key Takeaways
- Unicode has been released in over 30 versions since 1991, with major releases roughly once per year.
- The standard grew from 7,161 characters in 24 scripts (1991) to 154,998 characters in 168 scripts (2024).
- Unicode 2.0 (1996) was the most architecturally significant release, expanding the code space from 65,536 to over 1.1 million code points.
- Unicode 6.0 (2010) was the most culturally significant release, adding emoji and transforming public awareness of the standard.
- CJK ideographs account for roughly 63% of all encoded characters, spread across the original block and nine extensions.
- The pace of new emoji additions has slowed, while script encoding continues to accelerate as more communities advocate for digital representation of their writing systems.
- Unicode encodes scripts spanning from ~3400 BCE (Sumerian Cuneiform) to ~2011 CE (Adlam), covering over 5,000 years of human writing.
Unicode Fundamentals में और
Unicode is the universal character encoding standard that assigns a unique number …
UTF-8 is the dominant character encoding on the web, capable of representing …
UTF-8, UTF-16, and UTF-32 are three encodings of Unicode, each with different …
A Unicode code point is the unique number assigned to each character …
Unicode is divided into 17 planes, each containing up to 65,536 code …
The Byte Order Mark (BOM) is a special Unicode character used at …
Surrogate pairs are a mechanism in UTF-16 that allows code points outside …
ASCII defined 128 characters for the English alphabet and was the foundation …
The same visible character can be represented by multiple different byte sequences …
The Unicode Bidirectional Algorithm (UBA) determines how text containing a mix of …
Every Unicode character belongs to a general category such as Letter, Number, …
Unicode blocks are contiguous ranges of code points grouped by script or …
Unicode assigns every character to a script property that identifies the writing …
Combining characters are Unicode code points that attach to a preceding base …
A single visible character on screen — called a grapheme cluster — …
Unicode confusables are characters that look identical or nearly identical to others, …
Zero-width characters are invisible Unicode code points that affect text layout, joining, …
Unicode defines over two dozen whitespace characters beyond the ordinary space, including …
Unicode began in 1987 as a collaboration between engineers at Apple and …