📚 Unicode Fundamentals

Unicode Versions Timeline

Unicode has released major versions regularly since 1.0 in 1991, with each release adding thousands of new characters, emoji, and scripts from around the world. This timeline covers every Unicode version, its key additions, and how the standard has grown to cover over 140,000 characters.

·

Every version of the Unicode Standard has expanded the range of human writing that computers can represent. From the modest 7,161 characters of Unicode 1.0 in 1991 to the 154,998 characters of Unicode 16.0 in 2024, each release tells a story about which communities gained digital representation and which technical problems were solved. This guide provides a comprehensive chronological reference of every major Unicode release.

How Unicode Versioning Works

The Unicode Consortium publishes the standard using a major.minor versioning scheme. Major versions (e.g., 15.0, 16.0) typically add new scripts, characters, and emoji. Minor versions (e.g., 15.1) are maintenance releases that add smaller batches of characters or fix errata without introducing major new features.

Each release includes:

  • Character additions -- new code point assignments
  • Script additions -- entirely new writing systems
  • Property updates -- changes to character properties (category, bidirectional class, etc.)
  • Algorithm updates -- revisions to normalization, collation, segmentation, and other algorithms
  • Emoji additions -- new emoji characters and sequences (since Unicode 6.0)
  • Stability guarantees -- once a code point is assigned, its core identity never changes

The full release cycle typically takes 12 months, with beta review periods allowing implementers to prepare.

The Early Years: Establishing the Foundation (1991 -- 1999)

Unicode 1.0 -- October 1991

Metric Value
Total Characters 7,161
Scripts 24
Planes Used 1 (BMP only)

The first release of the Unicode Standard. Published as a two-volume printed book. Covered the major scripts of the modern world: Latin, Greek, Cyrillic, Arabic, Hebrew, Devanagari, and other Indic scripts, Thai, Lao, and the massive CJK Unified Ideographs block (20,902 characters).

The original design assumed that 16 bits (65,536 code points) would be sufficient for all of the world's characters -- an assumption that would be revisited within five years.

Unicode 1.0.1 -- June 1992

Metric Value
Total Characters 28,359
Change +21,198 characters

A major update that added the initial set of Hangul syllables (used for Korean) and CJK Compatibility Ideographs. This was considered part of Unicode 1.0 in practice.

Unicode 1.1 -- June 1993

Metric Value
Total Characters 34,168
Change +5,809 characters

Synchronized with ISO/IEC 10646-1:1993, completing the merger between the Unicode and ISO character set efforts. Added Tibetan and additional CJK characters.

Unicode 2.0 -- July 1996

Metric Value
Total Characters 38,950
Change +4,782 characters
Major Change Surrogate mechanism; code space expanded to 17 planes (1,114,112 code points)

A pivotal release. Recognized that 65,536 code points were not enough and introduced the surrogate pair mechanism for UTF-16, expanding the addressable space to over one million code points. Also completely replaced the original Hangul block with 11,172 algorithmically composed Hangul syllables -- the only time Unicode broke backward compatibility.

UTF-16 was formalized as the primary encoding form (replacing the original UCS-2).

Unicode 2.1 -- May 1998

Metric Value
Total Characters 38,952
Change +2 characters

A minimal update that added the Euro Sign (U+20AC) -- one of the most requested characters of its era, needed for the launch of the European single currency in 1999. Also added Object Replacement Character (U+FFFC).

Unicode 3.0 -- September 1999

Metric Value
Total Characters 49,259
Change +10,307 characters
New Scripts Cherokee, Ethiopic, Khmer, Mongolian, Myanmar, Ogham, Runic, Sinhala, Thaana, Canadian Aboriginal Syllabics, Yi

A major expansion focused on living scripts that had been missing from Unicode. Brought comprehensive coverage of South and Southeast Asian scripts. Also introduced the Unicode Character Database (UCD) as a machine-readable resource, which became the standard reference for implementers.

The Supplementary Planes Open Up (2001 -- 2005)

Unicode 3.1 -- March 2001

Metric Value
Total Characters 94,205
Change +44,946 characters
New Scripts Deseret, Gothic, Old Italic
Major Addition CJK Unified Ideographs Extension B (42,711 characters)
Milestone First characters assigned outside the BMP

A landmark release that finally used the supplementary planes made possible by Unicode 2.0. CJK Extension B alone added nearly 43,000 rare and historic ideographs to Plane 2 (the Supplementary Ideographic Plane). Plane 1 (the Supplementary Multilingual Plane) received its first historic scripts and musical symbols.

Unicode 3.2 -- March 2002

Metric Value
Total Characters 95,221
Change +1,016 characters
New Scripts Buhid, Hanunoo, Tagalog, Tagbanwa

Added four Philippine scripts and introduced several important format characters including the Byte Order Mark (U+FEFF) as an official character (it had previously existed as ZERO WIDTH NO-BREAK SPACE).

Unicode 4.0 -- April 2003

Metric Value
Total Characters 96,447
Change +1,226 characters
New Scripts Cypriot, Limbu, Linear B, Osmanya, Shavian, Tai Le, Ugaritic

Continued the push into ancient and historic scripts with Linear B (the oldest deciphered Greek writing system, dating to ~1450 BCE) and Cypriot (an ancient syllabary from Cyprus). Also added Osmanya, a script invented in the 1920s for the Somali language.

Unicode 4.1 -- March 2005

Metric Value
Total Characters 97,720
Change +1,273 characters
New Scripts Buginese, Coptic (separated from Greek), Glagolitic, Kharoshthi, New Tai Lue, Old Persian, Syloti Nagri, Tifinagh

Notable for separating Coptic from the Greek script block (previously they shared code points) and adding Old Persian cuneiform, the writing system of the Achaemenid Empire.

Global Coverage Accelerates (2006 -- 2012)

Unicode 5.0 -- July 2006

Metric Value
Total Characters 99,089
Change +1,369 characters
New Scripts Balinese, Cuneiform, N'Ko, Phags-pa, Phoenician

Crossed the 99,000 character threshold. Added Sumerian/Akkadian Cuneiform (one of the oldest writing systems, ~3400 BCE) and N'Ko (a modern script for Manding languages in West Africa, invented in 1949).

Unicode 5.1 -- April 2008

Metric Value
Total Characters 100,713
Change +1,624 characters
New Scripts Carian, Cham, Kayah Li, Lepcha, Lycian, Lydian, Ol Chiki, Rejang, Saurashtra, Sundanese, Vai
Milestone Passed 100,000 characters

Broke the 100,000-character barrier. Added eleven new scripts in a single release -- the most ever at that time -- including several ancient Anatolian scripts (Carian, Lycian, Lydian) and minority scripts from South and Southeast Asia.

Unicode 5.2 -- October 2009

Metric Value
Total Characters 107,361
Change +6,648 characters
New Scripts Avestan, Bamum, Egyptian Hieroglyphs, Imperial Aramaic, Inscriptional Pahlavi, Inscriptional Parthian, Javanese, Kaithi, Lisu, Meetei Mayek, Old South Arabian, Old Turkic, Samaritan, Tai Tham, Tai Viet

A massive release with 15 new scripts. The standout addition was Egyptian Hieroglyphs (1,071 characters) -- encoding a 5,000-year-old writing system was a powerful symbol of Unicode's commitment to comprehensive coverage. Also added several Central Asian historic scripts and additional Southeast Asian scripts.

Unicode 6.0 -- October 2010

Metric Value
Total Characters 109,449
Change +2,088 characters
New Scripts Mandaic, Batak, Brahmi
Major Addition 722 emoji characters
New Symbol Indian Rupee Sign (U+20B9)

The release that changed everything. Officially added emoji to the Unicode Standard, encoding 722 characters previously used on Japanese mobile phones. This brought Unicode into mainstream public awareness and triggered a cultural phenomenon. Also added the Indian Rupee Sign, requested by the Indian government after they adopted a new currency symbol in 2010.

Unicode 6.1 -- January 2012

Metric Value
Total Characters 110,181
Change +732 characters
New Scripts Chakma, Meroitic Cursive, Meroitic Hieroglyphs, Miao, Sharada, Sora Sompeng, Takri

Added scripts from the ancient Sudanese kingdom of Meroe and several South Asian scripts. Turkish Lira Sign (U+20BA) was also added.

The Emoji Explosion (2012 -- 2020)

Unicode 6.2 -- September 2012

Metric Value
Total Characters 110,187
Change +6 characters

The smallest release in Unicode history. Its primary contribution was the Turkish Lira Sign (U+20BA). Also added five other characters.

Unicode 6.3 -- September 2013

Metric Value
Total Characters 110,187
Change 0 new characters (property updates only)

A maintenance release that added no new characters but introduced important bidirectional formatting characters and updated character properties for improved Arabic and Hebrew text processing.

Unicode 7.0 -- June 2014

Metric Value
Total Characters 113,021
Change +2,834 characters
New Scripts Bassa Vah, Caucasian Albanian, Duployan, Elbasan, Grantha, Khojki, Khudawadi, Linear A, Mahajani, Manichaean, Mende Kikakui, Modi, Mro, Nabataean, Old Hungarian, Old North Arabian, Pahawh Hmong, Palmyrene, Pau Cin Hau, Psalter Pahlavi, Siddham, Tirhuta, Warang Citi
New Emoji ~250 new emoji

A major release with 23 new scripts -- the all-time record for a single version. Added the Russian Ruble Sign (U+20BD) and a large batch of new emoji. Also included Linear A, the undeciphered writing system of the Minoan civilization.

Unicode 8.0 -- June 2015

Metric Value
Total Characters 120,737
Change +7,716 characters
New Scripts Ahom, Anatolian Hieroglyphs, Hatran, Multani, Old Hungarian, SignWriting
Major Addition Emoji skin tone modifiers (Fitzpatrick scale)
Major Addition CJK Unified Ideographs Extension E (5,762 characters)
New Symbol Lari Sign (Georgian currency, U+20BE)

Introduced emoji skin tone modifiers, allowing five skin tone variants for human-form emoji. This was a significant step toward emoji diversity and representation. Also added a large CJK extension and Anatolian Hieroglyphs.

Unicode 9.0 -- June 2016

Metric Value
Total Characters 128,172
Change +7,500 characters
New Scripts Adlam, Bhaiksuki, Marchen, Newa, Osage, Tangut
New Emoji 72 new emoji

Added Adlam, a script invented around 2011 by two teenagers in Guinea for the Fulani language -- one of the youngest scripts ever encoded. Also added Tangut, a complex logographic script from medieval China with 6,136 characters. The Bitcoin Sign was proposed but not added in this version (it was added in a later version as part of the currency symbols).

Unicode 10.0 -- June 2017

Metric Value
Total Characters 136,755
Change +8,518 characters
New Scripts Zanabazar Square, Soyombo, Nushu, Masaram Gondi
Major Addition CJK Unified Ideographs Extension F (7,473 characters)
New Emoji 56 new emoji
New Symbol Bitcoin Sign (U+20BF)

Added the Bitcoin Sign (\u20bf), reflecting the growing importance of cryptocurrency. Also encoded Nushu, a writing system historically used exclusively by women in Hunan province, China -- making it the only known gender-specific script.

Unicode 11.0 -- June 2018

Metric Value
Total Characters 137,439
Change +684 characters
New Scripts Dogra, Gunjala Gondi, Hanifi Rohingya, Makasar, Medefaidrin, Old Sogdian, Sogdian
New Emoji 157 new emoji
New Symbol Copyleft Sign (U+1F12F)

Added Hanifi Rohingya, giving digital representation to the Rohingya people of Myanmar. Also introduced many new emoji including superheroes, redheads, and additional skin tone combinations.

Unicode 12.0 -- March 2019

Metric Value
Total Characters 137,993
Change +554 characters
New Scripts Elymaic, Nandinagari, Nyiakeng Puachue Hmong, Wancho
New Emoji 61 new emoji

Continued expanding coverage of minority scripts. Added the Tamil supplement with historic Tamil characters and fractions.

Unicode 12.1 -- May 2019

Metric Value
Total Characters 137,994
Change +1 character

Added a single character: the Japanese square era name for Reiwa (U+32FF), needed for the new Japanese imperial era that began on May 1, 2019. The fastest turnaround in Unicode history, driven by the urgent need for the character before the era transition.

Unicode 13.0 -- March 2020

Metric Value
Total Characters 143,859
Change +5,930 characters
New Scripts Chorasmian, Dives Akuru, Khitan Small Script, Yezidi
Major Addition CJK Unified Ideographs Extension G (4,939 characters)
New Emoji 55 new emoji

Added the Khitan Small Script, a partially deciphered writing system from the Khitan Empire (907--1125 CE), and another large CJK extension.

Modern Releases (2021 -- 2024)

Unicode 14.0 -- September 2021

Metric Value
Total Characters 144,697
Change +838 characters
New Scripts Cypro-Minoan, Old Uyghur, Tangsa, Toto, Vithkuqi
New Emoji 37 new emoji
New Symbol Equals Sign with bump above (U+2AAE)

Added Cypro-Minoan, an undeciphered Bronze Age script from Cyprus, and Vithkuqi, a script created for the Albanian language in the early 1800s. The release was delayed from its usual June schedule to September due to the COVID-19 pandemic's impact on the editorial process.

Unicode 15.0 -- September 2022

Metric Value
Total Characters 149,186
Change +4,489 characters
New Scripts Kawi, Nag Mundari
Major Addition CJK Unified Ideographs Extension I (622 characters)
New Emoji 31 new emoji (including shaking face, pink heart, moose, and others)

Added Kawi, the historic script used in inscriptions across Southeast Asia (Indonesia, Malaysia, Philippines), connecting modern scripts like Javanese and Balinese to their ancestor. Also added Nag Mundari, a script created in the 1960s for the Mundari language of eastern India.

Unicode 15.1 -- September 2023

Metric Value
Total Characters 149,813
Change +627 characters
New Emoji 0 (maintenance release)

A maintenance release focused on CJK ideograph additions (118 new characters) and property updates. No new emoji were added -- a deliberate choice to decouple emoji releases from the core standard versioning.

Unicode 16.0 -- September 2024

Metric Value
Total Characters 154,998
Change +5,185 characters
New Scripts Garay, Gurung Khema, Kirat Rai, Ol Onal, Sunuwar, Todhri, Tulu-Tigalari
Major Addition Egyptian Hieroglyphs Extended-A (991 characters)
New Emoji 7 new emoji

The latest release added seven new scripts, with a focus on South Asian and West African writing systems. Tulu-Tigalari is a historic script from southern India. Garay is used for the Wolof language in West Africa. The massive Egyptian Hieroglyphs Extended-A block nearly doubled the number of encoded hieroglyphs.

Growth at a Glance

Year Version Total Characters New Scripts (Cumulative)
1991 1.0 7,161 24
1996 2.0 38,950 25
1999 3.0 49,259 38
2001 3.1 94,205 41
2006 5.0 99,089 57
2008 5.1 100,713 68
2010 6.0 109,449 74
2014 7.0 113,021 97
2016 9.0 128,172 135
2020 13.0 143,859 154
2022 15.0 149,186 161
2024 16.0 154,998 168

In 33 years, Unicode has grown from 7,161 characters in 24 scripts to nearly 155,000 characters in 168 scripts -- a 21x increase in character count and a 7x increase in script coverage.

Patterns and Observations

The CJK Expansions

CJK Unified Ideographs are the single largest contributor to Unicode's character count. The extensions alone account for tens of thousands of characters:

Extension Version Characters Plane
Original CJK 1.0 20,902 BMP (Plane 0)
Extension A 3.0 6,582 BMP (Plane 0)
Extension B 3.1 42,711 Plane 2
Extension C 5.2 4,149 Plane 2
Extension D 6.0 222 Plane 2
Extension E 8.0 5,762 Plane 2
Extension F 10.0 7,473 Plane 2
Extension G 13.0 4,939 Plane 3
Extension H 15.0 4,192 Plane 3
Extension I 15.0 622 Plane 2
Total ~97,554

CJK ideographs alone account for roughly 63% of all assigned Unicode characters.

The Emoji Timeline

Version Year Emoji Added Cumulative Emoji
6.0 2010 722 722
7.0 2014 ~250 ~1,000
8.0 2015 41 + skin tones ~1,100
9.0 2016 72 ~1,200
10.0 2017 56 ~1,300
11.0 2018 157 ~1,500
12.0/12.1 2019 61 ~1,600
13.0 2020 55 ~1,700
14.0 2021 37 ~1,750
15.0 2022 31 ~1,800
16.0 2024 7 ~1,800

The number of new emoji per release has been declining since 2018, reflecting both a maturing repertoire and deliberate restraint by the Emoji Subcommittee.

Encoding the Undeciphered

Unicode has encoded several writing systems that have not yet been fully deciphered:

Script Version Era Status
Linear A 7.0 (2014) Minoan, ~1800 BCE Undeciphered
Cypro-Minoan 14.0 (2021) Bronze Age Cyprus, ~1550 BCE Undeciphered
Proto-Elamite Not yet encoded ~3100 BCE Undeciphered

Encoding undeciphered scripts is controversial but enables digital scholarship and corpus analysis that may eventually lead to decipherment.

Key Takeaways

  • Unicode has been released in over 30 versions since 1991, with major releases roughly once per year.
  • The standard grew from 7,161 characters in 24 scripts (1991) to 154,998 characters in 168 scripts (2024).
  • Unicode 2.0 (1996) was the most architecturally significant release, expanding the code space from 65,536 to over 1.1 million code points.
  • Unicode 6.0 (2010) was the most culturally significant release, adding emoji and transforming public awareness of the standard.
  • CJK ideographs account for roughly 63% of all encoded characters, spread across the original block and nine extensions.
  • The pace of new emoji additions has slowed, while script encoding continues to accelerate as more communities advocate for digital representation of their writing systems.
  • Unicode encodes scripts spanning from ~3400 BCE (Sumerian Cuneiform) to ~2011 CE (Adlam), covering over 5,000 years of human writing.

Mehr in Unicode Fundamentals

What is Unicode? A Complete Guide

Unicode is the universal character encoding standard that assigns a unique number …

UTF-8 Encoding Explained

UTF-8 is the dominant character encoding on the web, capable of representing …

UTF-8 vs UTF-16 vs UTF-32: When to Use Each

UTF-8, UTF-16, and UTF-32 are three encodings of Unicode, each with different …

What is a Unicode Code Point?

A Unicode code point is the unique number assigned to each character …

Unicode Planes and the BMP

Unicode is divided into 17 planes, each containing up to 65,536 code …

Understanding Byte Order Mark (BOM)

The Byte Order Mark (BOM) is a special Unicode character used at …

Surrogate Pairs Explained

Surrogate pairs are a mechanism in UTF-16 that allows code points outside …

ASCII to Unicode: The Evolution of Character Encoding

ASCII defined 128 characters for the English alphabet and was the foundation …

Unicode Normalization: NFC, NFD, NFKC, NFKD

The same visible character can be represented by multiple different byte sequences …

The Unicode Bidirectional Algorithm

The Unicode Bidirectional Algorithm (UBA) determines how text containing a mix of …

Unicode General Categories Explained

Every Unicode character belongs to a general category such as Letter, Number, …

Understanding Unicode Blocks

Unicode blocks are contiguous ranges of code points grouped by script or …

Unicode Scripts: How Writing Systems are Organized

Unicode assigns every character to a script property that identifies the writing …

What are Combining Characters?

Combining characters are Unicode code points that attach to a preceding base …

Grapheme Clusters vs Code Points

A single visible character on screen — called a grapheme cluster — …

Unicode Confusables: A Security Guide

Unicode confusables are characters that look identical or nearly identical to others, …

Zero Width Characters: What They Are and Why They Matter

Zero-width characters are invisible Unicode code points that affect text layout, joining, …

Unicode Whitespace Characters Guide

Unicode defines over two dozen whitespace characters beyond the ordinary space, including …

History of Unicode

Unicode began in 1987 as a collaboration between engineers at Apple and …