📚 Unicode Fundamentals

Unicode Versions Timeline

Unicode has released major versions regularly since 1.0 in 1991, with each release adding thousands of new characters, emoji, and scripts from around the world. This timeline covers every Unicode version, its key additions, and how the standard has grown to cover over 140,000 characters.

Published 2022-01-10 · Updated 2025-11-18

Every version of the Unicode Standard has expanded the range of human writing that computers can represent. From the modest 7,161 characters of Unicode 1.0 in 1991 to the 154,998 characters of Unicode 16.0 in 2024, each release tells a story about which communities gained digital representation and which technical problems were solved. This guide provides a comprehensive chronological reference of every major Unicode release.

How Unicode Versioning Works

The Unicode Consortium publishes the standard using a major.minor versioning scheme. Major versions (e.g., 15.0, 16.0) typically add new scripts, characters, and emoji. Minor versions (e.g., 15.1) are maintenance releases that add smaller batches of characters or fix errata without introducing major new features.

Each release includes:

Character additions -- new code point assignments
Script additions -- entirely new writing systems
Property updates -- changes to character properties (category, bidirectional class, etc.)
Algorithm updates -- revisions to normalization, collation, segmentation, and other algorithms
Emoji additions -- new emoji characters and sequences (since Unicode 6.0)
Stability guarantees -- once a code point is assigned, its core identity never changes

The full release cycle typically takes 12 months, with beta review periods allowing implementers to prepare.

The Early Years: Establishing the Foundation (1991 -- 1999)

Unicode 1.0 -- October 1991

Metric	Value
Total Characters	7,161
Scripts	24
Planes Used	1 (BMP only)

The first release of the Unicode Standard. Published as a two-volume printed book. Covered the major scripts of the modern world: Latin, Greek, Cyrillic, Arabic, Hebrew, Devanagari, and other Indic scripts, Thai, Lao, and the massive CJK Unified Ideographs block (20,902 characters).

The original design assumed that 16 bits (65,536 code points) would be sufficient for all of the world's characters -- an assumption that would be revisited within five years.

Unicode 1.0.1 -- June 1992

Metric	Value
Total Characters	28,359
Change	+21,198 characters

A major update that added the initial set of Hangul syllables (used for Korean) and CJK Compatibility Ideographs. This was considered part of Unicode 1.0 in practice.

Unicode 1.1 -- June 1993

Metric	Value
Total Characters	34,168
Change	+5,809 characters

Synchronized with ISO/IEC 10646-1:1993, completing the merger between the Unicode and ISO character set efforts. Added Tibetan and additional CJK characters.

Unicode 2.0 -- July 1996

Metric	Value
Total Characters	38,950
Change	+4,782 characters
Major Change	Surrogate mechanism; code space expanded to 17 planes (1,114,112 code points)

A pivotal release. Recognized that 65,536 code points were not enough and introduced the surrogate pair mechanism for UTF-16, expanding the addressable space to over one million code points. Also completely replaced the original Hangul block with 11,172 algorithmically composed Hangul syllables -- the only time Unicode broke backward compatibility.

UTF-16 was formalized as the primary encoding form (replacing the original UCS-2).

Unicode 2.1 -- May 1998

Metric	Value
Total Characters	38,952
Change	+2 characters

A minimal update that added the Euro Sign (U+20AC) -- one of the most requested characters of its era, needed for the launch of the European single currency in 1999. Also added Object Replacement Character (U+FFFC).

Unicode 3.0 -- September 1999

Metric	Value
Total Characters	49,259
Change	+10,307 characters
New Scripts	Cherokee, Ethiopic, Khmer, Mongolian, Myanmar, Ogham, Runic, Sinhala, Thaana, Canadian Aboriginal Syllabics, Yi

A major expansion focused on living scripts that had been missing from Unicode. Brought comprehensive coverage of South and Southeast Asian scripts. Also introduced the Unicode Character Database (UCD) as a machine-readable resource, which became the standard reference for implementers.

The Supplementary Planes Open Up (2001 -- 2005)

Unicode 3.1 -- March 2001

Metric	Value
Total Characters	94,205
Change	+44,946 characters
New Scripts	Deseret, Gothic, Old Italic
Major Addition	CJK Unified Ideographs Extension B (42,711 characters)
Milestone	First characters assigned outside the BMP

A landmark release that finally used the supplementary planes made possible by Unicode 2.0. CJK Extension B alone added nearly 43,000 rare and historic ideographs to Plane 2 (the Supplementary Ideographic Plane). Plane 1 (the Supplementary Multilingual Plane) received its first historic scripts and musical symbols.

Unicode 3.2 -- March 2002

Metric	Value
Total Characters	95,221
Change	+1,016 characters
New Scripts	Buhid, Hanunoo, Tagalog, Tagbanwa

Added four Philippine scripts and introduced several important format characters including the Byte Order Mark (U+FEFF) as an official character (it had previously existed as ZERO WIDTH NO-BREAK SPACE).

Unicode 4.0 -- April 2003

Metric	Value
Total Characters	96,447
Change	+1,226 characters
New Scripts	Cypriot, Limbu, Linear B, Osmanya, Shavian, Tai Le, Ugaritic

Continued the push into ancient and historic scripts with Linear B (the oldest deciphered Greek writing system, dating to ~1450 BCE) and Cypriot (an ancient syllabary from Cyprus). Also added Osmanya, a script invented in the 1920s for the Somali language.

Unicode 4.1 -- March 2005

Metric	Value
Total Characters	97,720
Change	+1,273 characters
New Scripts	Buginese, Coptic (separated from Greek), Glagolitic, Kharoshthi, New Tai Lue, Old Persian, Syloti Nagri, Tifinagh

Notable for separating Coptic from the Greek script block (previously they shared code points) and adding Old Persian cuneiform, the writing system of the Achaemenid Empire.

Global Coverage Accelerates (2006 -- 2012)

Unicode 5.0 -- July 2006

Metric	Value
Total Characters	99,089
Change	+1,369 characters
New Scripts	Balinese, Cuneiform, N'Ko, Phags-pa, Phoenician

Crossed the 99,000 character threshold. Added Sumerian/Akkadian Cuneiform (one of the oldest writing systems, ~3400 BCE) and N'Ko (a modern script for Manding languages in West Africa, invented in 1949).

Unicode 5.1 -- April 2008

Metric	Value
Total Characters	100,713
Change	+1,624 characters
New Scripts	Carian, Cham, Kayah Li, Lepcha, Lycian, Lydian, Ol Chiki, Rejang, Saurashtra, Sundanese, Vai
Milestone	Passed 100,000 characters

Broke the 100,000-character barrier. Added eleven new scripts in a single release -- the most ever at that time -- including several ancient Anatolian scripts (Carian, Lycian, Lydian) and minority scripts from South and Southeast Asia.

Unicode 5.2 -- October 2009

Metric	Value
Total Characters	107,361
Change	+6,648 characters
New Scripts	Avestan, Bamum, Egyptian Hieroglyphs, Imperial Aramaic, Inscriptional Pahlavi, Inscriptional Parthian, Javanese, Kaithi, Lisu, Meetei Mayek, Old South Arabian, Old Turkic, Samaritan, Tai Tham, Tai Viet

A massive release with 15 new scripts. The standout addition was Egyptian Hieroglyphs (1,071 characters) -- encoding a 5,000-year-old writing system was a powerful symbol of Unicode's commitment to comprehensive coverage. Also added several Central Asian historic scripts and additional Southeast Asian scripts.

Unicode 6.0 -- October 2010

Metric	Value
Total Characters	109,449
Change	+2,088 characters
New Scripts	Mandaic, Batak, Brahmi
Major Addition	722 emoji characters
New Symbol	Indian Rupee Sign (U+20B9)

The release that changed everything. Officially added emoji to the Unicode Standard, encoding 722 characters previously used on Japanese mobile phones. This brought Unicode into mainstream public awareness and triggered a cultural phenomenon. Also added the Indian Rupee Sign, requested by the Indian government after they adopted a new currency symbol in 2010.

Unicode 6.1 -- January 2012

Metric	Value
Total Characters	110,181
Change	+732 characters
New Scripts	Chakma, Meroitic Cursive, Meroitic Hieroglyphs, Miao, Sharada, Sora Sompeng, Takri

Added scripts from the ancient Sudanese kingdom of Meroe and several South Asian scripts. Turkish Lira Sign (U+20BA) was also added.

The Emoji Explosion (2012 -- 2020)

Unicode 6.2 -- September 2012

Metric	Value
Total Characters	110,187
Change	+6 characters

The smallest release in Unicode history. Its primary contribution was the Turkish Lira Sign (U+20BA). Also added five other characters.

Unicode 6.3 -- September 2013

Metric	Value
Total Characters	110,187
Change	0 new characters (property updates only)

A maintenance release that added no new characters but introduced important bidirectional formatting characters and updated character properties for improved Arabic and Hebrew text processing.

Unicode 7.0 -- June 2014

Metric	Value
Total Characters	113,021
Change	+2,834 characters
New Scripts	Bassa Vah, Caucasian Albanian, Duployan, Elbasan, Grantha, Khojki, Khudawadi, Linear A, Mahajani, Manichaean, Mende Kikakui, Modi, Mro, Nabataean, Old Hungarian, Old North Arabian, Pahawh Hmong, Palmyrene, Pau Cin Hau, Psalter Pahlavi, Siddham, Tirhuta, Warang Citi
New Emoji	~250 new emoji

A major release with 23 new scripts -- the all-time record for a single version. Added the Russian Ruble Sign (U+20BD) and a large batch of new emoji. Also included Linear A, the undeciphered writing system of the Minoan civilization.

Unicode 8.0 -- June 2015

Metric	Value
Total Characters	120,737
Change	+7,716 characters
New Scripts	Ahom, Anatolian Hieroglyphs, Hatran, Multani, Old Hungarian, SignWriting
Major Addition	Emoji skin tone modifiers (Fitzpatrick scale)
Major Addition	CJK Unified Ideographs Extension E (5,762 characters)
New Symbol	Lari Sign (Georgian currency, U+20BE)

Introduced emoji skin tone modifiers, allowing five skin tone variants for human-form emoji. This was a significant step toward emoji diversity and representation. Also added a large CJK extension and Anatolian Hieroglyphs.

Unicode 9.0 -- June 2016

Metric	Value
Total Characters	128,172
Change	+7,500 characters
New Scripts	Adlam, Bhaiksuki, Marchen, Newa, Osage, Tangut
New Emoji	72 new emoji

Added Adlam, a script invented around 2011 by two teenagers in Guinea for the Fulani language -- one of the youngest scripts ever encoded. Also added Tangut, a complex logographic script from medieval China with 6,136 characters. The Bitcoin Sign was proposed but not added in this version (it was added in a later version as part of the currency symbols).

Unicode 10.0 -- June 2017

Metric	Value
Total Characters	136,755
Change	+8,518 characters
New Scripts	Zanabazar Square, Soyombo, Nushu, Masaram Gondi
Major Addition	CJK Unified Ideographs Extension F (7,473 characters)
New Emoji	56 new emoji
New Symbol	Bitcoin Sign (U+20BF)

Added the Bitcoin Sign (\u20bf), reflecting the growing importance of cryptocurrency. Also encoded Nushu, a writing system historically used exclusively by women in Hunan province, China -- making it the only known gender-specific script.

Unicode 11.0 -- June 2018

Metric	Value
Total Characters	137,439
Change	+684 characters
New Scripts	Dogra, Gunjala Gondi, Hanifi Rohingya, Makasar, Medefaidrin, Old Sogdian, Sogdian
New Emoji	157 new emoji
New Symbol	Copyleft Sign (U+1F12F)

Added Hanifi Rohingya, giving digital representation to the Rohingya people of Myanmar. Also introduced many new emoji including superheroes, redheads, and additional skin tone combinations.

Unicode 12.0 -- March 2019

Metric	Value
Total Characters	137,993
Change	+554 characters
New Scripts	Elymaic, Nandinagari, Nyiakeng Puachue Hmong, Wancho
New Emoji	61 new emoji

Continued expanding coverage of minority scripts. Added the Tamil supplement with historic Tamil characters and fractions.

Unicode 12.1 -- May 2019

Metric	Value
Total Characters	137,994
Change	+1 character

Added a single character: the Japanese square era name for Reiwa (U+32FF), needed for the new Japanese imperial era that began on May 1, 2019. The fastest turnaround in Unicode history, driven by the urgent need for the character before the era transition.

Unicode 13.0 -- March 2020

Metric	Value
Total Characters	143,859
Change	+5,930 characters
New Scripts	Chorasmian, Dives Akuru, Khitan Small Script, Yezidi
Major Addition	CJK Unified Ideographs Extension G (4,939 characters)
New Emoji	55 new emoji

Added the Khitan Small Script, a partially deciphered writing system from the Khitan Empire (907--1125 CE), and another large CJK extension.

Modern Releases (2021 -- 2024)

Unicode 14.0 -- September 2021

Metric	Value
Total Characters	144,697
Change	+838 characters
New Scripts	Cypro-Minoan, Old Uyghur, Tangsa, Toto, Vithkuqi
New Emoji	37 new emoji
New Symbol	Equals Sign with bump above (U+2AAE)

Added Cypro-Minoan, an undeciphered Bronze Age script from Cyprus, and Vithkuqi, a script created for the Albanian language in the early 1800s. The release was delayed from its usual June schedule to September due to the COVID-19 pandemic's impact on the editorial process.

Unicode 15.0 -- September 2022

Metric	Value
Total Characters	149,186
Change	+4,489 characters
New Scripts	Kawi, Nag Mundari
Major Addition	CJK Unified Ideographs Extension I (622 characters)
New Emoji	31 new emoji (including shaking face, pink heart, moose, and others)

Added Kawi, the historic script used in inscriptions across Southeast Asia (Indonesia, Malaysia, Philippines), connecting modern scripts like Javanese and Balinese to their ancestor. Also added Nag Mundari, a script created in the 1960s for the Mundari language of eastern India.

Unicode 15.1 -- September 2023

Metric	Value
Total Characters	149,813
Change	+627 characters
New Emoji	0 (maintenance release)

A maintenance release focused on CJK ideograph additions (118 new characters) and property updates. No new emoji were added -- a deliberate choice to decouple emoji releases from the core standard versioning.

Unicode 16.0 -- September 2024

Metric	Value
Total Characters	154,998
Change	+5,185 characters
New Scripts	Garay, Gurung Khema, Kirat Rai, Ol Onal, Sunuwar, Todhri, Tulu-Tigalari
Major Addition	Egyptian Hieroglyphs Extended-A (991 characters)
New Emoji	7 new emoji

The latest release added seven new scripts, with a focus on South Asian and West African writing systems. Tulu-Tigalari is a historic script from southern India. Garay is used for the Wolof language in West Africa. The massive Egyptian Hieroglyphs Extended-A block nearly doubled the number of encoded hieroglyphs.

Growth at a Glance

Year	Version	Total Characters	New Scripts (Cumulative)
1991	1.0	7,161	24
1996	2.0	38,950	25
1999	3.0	49,259	38
2001	3.1	94,205	41
2006	5.0	99,089	57
2008	5.1	100,713	68
2010	6.0	109,449	74
2014	7.0	113,021	97
2016	9.0	128,172	135
2020	13.0	143,859	154
2022	15.0	149,186	161
2024	16.0	154,998	168

In 33 years, Unicode has grown from 7,161 characters in 24 scripts to nearly 155,000 characters in 168 scripts -- a 21x increase in character count and a 7x increase in script coverage.

Patterns and Observations

The CJK Expansions

CJK Unified Ideographs are the single largest contributor to Unicode's character count. The extensions alone account for tens of thousands of characters:

Extension	Version	Characters	Plane
Original CJK	1.0	20,902	BMP (Plane 0)
Extension A	3.0	6,582	BMP (Plane 0)
Extension B	3.1	42,711	Plane 2
Extension C	5.2	4,149	Plane 2
Extension D	6.0	222	Plane 2
Extension E	8.0	5,762	Plane 2
Extension F	10.0	7,473	Plane 2
Extension G	13.0	4,939	Plane 3
Extension H	15.0	4,192	Plane 3
Extension I	15.0	622	Plane 2
Total		~97,554

CJK ideographs alone account for roughly 63% of all assigned Unicode characters.

The Emoji Timeline

Version	Year	Emoji Added	Cumulative Emoji
6.0	2010	722	722
7.0	2014	~250	~1,000
8.0	2015	41 + skin tones	~1,100
9.0	2016	72	~1,200
10.0	2017	56	~1,300
11.0	2018	157	~1,500
12.0/12.1	2019	61	~1,600
13.0	2020	55	~1,700
14.0	2021	37	~1,750
15.0	2022	31	~1,800
16.0	2024	7	~1,800

The number of new emoji per release has been declining since 2018, reflecting both a maturing repertoire and deliberate restraint by the Emoji Subcommittee.

Encoding the Undeciphered

Unicode has encoded several writing systems that have not yet been fully deciphered:

Script	Version	Era	Status
Linear A	7.0 (2014)	Minoan, ~1800 BCE	Undeciphered
Cypro-Minoan	14.0 (2021)	Bronze Age Cyprus, ~1550 BCE	Undeciphered
Proto-Elamite	Not yet encoded	~3100 BCE	Undeciphered

Encoding undeciphered scripts is controversial but enables digital scholarship and corpus analysis that may eventually lead to decipherment.

Key Takeaways

Unicode has been released in over 30 versions since 1991, with major releases roughly once per year.
The standard grew from 7,161 characters in 24 scripts (1991) to 154,998 characters in 168 scripts (2024).
Unicode 2.0 (1996) was the most architecturally significant release, expanding the code space from 65,536 to over 1.1 million code points.
Unicode 6.0 (2010) was the most culturally significant release, adding emoji and transforming public awareness of the standard.
CJK ideographs account for roughly 63% of all encoded characters, spread across the original block and nine extensions.
The pace of new emoji additions has slowed, while script encoding continues to accelerate as more communities advocate for digital representation of their writing systems.
Unicode encodes scripts spanning from ~3400 BCE (Sumerian Cuneiform) to ~2011 CE (Adlam), covering over 5,000 years of human writing.

Unicode Fundamentals में और

What is Unicode? A Complete Guide

Unicode is the universal character encoding standard that assigns a unique number …

UTF-8 Encoding Explained

UTF-8 is the dominant character encoding on the web, capable of representing …

UTF-8 vs UTF-16 vs UTF-32: When to Use Each

UTF-8, UTF-16, and UTF-32 are three encodings of Unicode, each with different …

What is a Unicode Code Point?

A Unicode code point is the unique number assigned to each character …

Unicode Planes and the BMP

Unicode is divided into 17 planes, each containing up to 65,536 code …

Understanding Byte Order Mark (BOM)

The Byte Order Mark (BOM) is a special Unicode character used at …

Surrogate Pairs Explained

Surrogate pairs are a mechanism in UTF-16 that allows code points outside …

ASCII to Unicode: The Evolution of Character Encoding

ASCII defined 128 characters for the English alphabet and was the foundation …

Unicode Normalization: NFC, NFD, NFKC, NFKD

The same visible character can be represented by multiple different byte sequences …

The Unicode Bidirectional Algorithm

The Unicode Bidirectional Algorithm (UBA) determines how text containing a mix of …

Unicode General Categories Explained

Every Unicode character belongs to a general category such as Letter, Number, …

Understanding Unicode Blocks

Unicode blocks are contiguous ranges of code points grouped by script or …

Unicode Scripts: How Writing Systems are Organized

Unicode assigns every character to a script property that identifies the writing …

What are Combining Characters?

Combining characters are Unicode code points that attach to a preceding base …

Grapheme Clusters vs Code Points

A single visible character on screen — called a grapheme cluster — …

Unicode Confusables: A Security Guide

Unicode confusables are characters that look identical or nearly identical to others, …

Zero Width Characters: What They Are and Why They Matter

Zero-width characters are invisible Unicode code points that affect text layout, joining, …

Unicode Whitespace Characters Guide

Unicode defines over two dozen whitespace characters beyond the ordinary space, including …

History of Unicode

Unicode began in 1987 as a collaboration between engineers at Apple and …

← गाइड पर वापस जाएं