Emoji Blocks Overview
Emoji in Unicode span multiple blocks across the Supplementary Multilingual Plane, including Emoticons, Miscellaneous Symbols and Pictographs, and Transport and Map Symbols, with sequences using ZWJ and variation selectors. This guide maps out where emoji live in Unicode, how sequences work, and how emoji are updated in each Unicode version.
Emoji have exploded from a single Japanese carrier's quirky addition into a global visual language encoded across multiple Unicode blocks. What started as 176 pictographs from NTT Docomo in 1999 now encompasses thousands of characters spread across several blocks, with new additions ratified annually by the Unicode Consortium. Understanding how emoji are organized in Unicode — and the complex mechanisms that make sequences and modifiers work — is essential for any developer handling modern text.
The Core Emoji Blocks
Emoji are not confined to a single block. They are distributed across several ranges in the Unicode codespace:
| Block | Range | Notable Contents |
|---|---|---|
| Miscellaneous Symbols | U+2600–U+26FF | ☀️ ☁️ ⛄ ♻️ |
| Dingbats | U+2700–U+27BF | ✂️ ✈️ ✉️ |
| Emoticons | U+1F600–U+1F64F | 😀 😂 🥺 🤔 |
| Misc. Symbols and Pictographs | U+1F300–U+1F5FF | 🌍 🎉 🔥 💎 |
| Transport and Map Symbols | U+1F680–U+1F6FF | 🚀 🚗 ✈️ 🛸 |
| Supplemental Symbols | U+1F900–U+1F9FF | 🤯 🦊 🧠 🧲 |
| Symbols and Pictographs Extended-A | U+1FA00–U+1FA6F | 🪄 🪐 🦷 |
Emoticons Block (U+1F600–U+1F64F)
This is the "classic" emoji block in most people's minds — the smiley faces. It begins with 😀 GRINNING FACE at U+1F600 and covers the full range of facial expressions, from joy to sadness, surprise to skepticism.
Notable characters: - 😀 U+1F600 — Grinning Face - 😂 U+1F602 — Face With Tears of Joy (consistently one of the most-used emoji globally) - 🥺 U+1F97A — Pleading Face (added Unicode 11.0, 2018) - 😶🌫️ — Face in Clouds (a ZWJ sequence, not a single code point)
Miscellaneous Symbols and Pictographs (U+1F300–U+1F5FF)
This large block covers nature, weather, food, activities, objects, and more. It is one of the densest and most varied emoji blocks:
- 🌍 U+1F30D — Earth Globe Europe-Africa
- 🎃 U+1F383 — Jack-O-Lantern
- 🔥 U+1F525 — Fire ("lit", trending, hot takes)
- 💬 U+1F4AC — Speech Bubble
- 🖥️ U+1F5A5 — Desktop Computer
Transport and Map Symbols (U+1F680–U+1F6FF)
Originally designed to represent transportation modes and map features, this block also contains many everyday objects:
- 🚀 U+1F680 — Rocket
- 🚗 U+1F697 — Automobile
- 🛸 U+1F6F8 — Flying Saucer (added Unicode 10.0)
- 🛑 U+1F6D1 — Stop Sign
- 🧳 U+1F9F3 — (actually in Supplemental Symbols)
Supplemental Symbols and Pictographs (U+1F900–U+1F9FF)
Added to accommodate the rapidly growing emoji vocabulary, this block covers animals, food, activities, and abstract concepts:
- 🤯 U+1F92F — Exploding Head
- 🦊 U+1F98A — Fox Face
- 🧠 U+1F9E0 — Brain
- 🧲 U+1F9F2 — Magnet
Emoji Presentation vs Text Presentation
Many characters in Unicode can appear either as emoji (colorful, graphical) or as plain text (monochrome, symbolic), depending on context. This is controlled by variation selectors:
- U+FE0F (Variation Selector-16): Forces emoji presentation
- U+FE0E (Variation Selector-15): Forces text presentation
For example: - ☀ U+2600 alone → text presentation (a simple black sun) - ☀️ U+2600 + U+FE0F → emoji presentation (colorful sun)
Developers must be aware that a "single emoji" may actually be two code points, affecting string length calculations.
Skin Tone Modifiers
Unicode 8.0 introduced five skin tone modifiers based on the Fitzpatrick scale:
| Modifier | Code Point | Tone |
|---|---|---|
| 🏻 | U+1F3FB | Light |
| 🏼 | U+1F3FC | Medium-Light |
| 🏽 | U+1F3FD | Medium |
| 🏾 | U+1F3FE | Medium-Dark |
| 🏿 | U+1F3FF | Dark |
These modifiers follow a base emoji (typically a person or hand gesture) to modify skin color. For example, 👋🏽 is U+1F44B followed by U+1F3FD. The string length is 2 code points despite appearing as a single visual unit.
ZWJ Sequences: Building Complex Emoji
Zero Width Joiner (U+200D, ZWJ) is used to create multi-character emoji sequences that render as a single glyph when supported. These sequences enable:
Family emoji: 👨👩👧👦 = Man + ZWJ + Woman + ZWJ + Girl + ZWJ + Boy (7 code points, 1 visual unit)
Profession emoji: 👩💻 = Woman + ZWJ + Laptop (3 code points)
Gender-neutral forms: Some platforms use ZWJ sequences to build gender variants from a base person emoji.
If a platform does not support a ZWJ sequence, it typically renders each component separately — so 👨👩👧 might display as three separate emoji.
Regional Indicator Symbols and Flag Emoji
Country flags are encoded as sequences of two Regional Indicator Symbols from the Enclosed Alphanumeric Supplement block (U+1F1E0–U+1F1FF). Each letter A–Z has a corresponding regional indicator:
- 🇺🇸 = U+1F1FA (Regional Indicator U) + U+1F1F8 (Regional Indicator S)
- 🇯🇵 = U+1F1EF (Regional Indicator J) + U+1F1F5 (Regional Indicator P)
Platforms that do not support flag rendering will display the two letters (US, JP) as fallback.
Emoji Version History and Compatibility
Each Unicode version adds new emoji. Key milestones:
| Unicode Version | Year | Notable Additions |
|---|---|---|
| 6.0 | 2010 | First official emoji standardization |
| 8.0 | 2015 | Skin tone modifiers |
| 11.0 | 2018 | Supervillain, lobster, infinity symbol |
| 13.0 | 2020 | Transgender flag, bubble tea |
| 15.0 | 2022 | Moose, phoenix, lime |
Older operating systems will display replacement boxes (□) for emoji added after the OS was released. This is a persistent cross-platform compatibility challenge.
Developer Considerations
Working with emoji in code requires care:
- String length:
len("😀")returns 1 in Python 3 (code points), but the UTF-16 encoding uses 2 units. In JavaScript,"😀".lengthreturns 2. - Grapheme clusters: A ZWJ family emoji might be 7+ code points but 1 grapheme cluster. Use a grapheme segmentation library for accurate character counts.
- Sorting: Emoji sort by code point value unless you use a locale-aware collator.
- Regular expressions: Match emoji with
\\p{Emoji}in Unicode-aware regex engines. - Database storage: Use UTF-8mb4 in MySQL (not plain UTF-8) to store supplementary plane characters including most emoji.
Lainnya di Block Explorer
The Basic Latin block (U+0000–U+007F) is the first Unicode block and covers …
The Latin-1 Supplement block (U+0080–U+00FF) extends ASCII with accented Latin characters for …
The General Punctuation block (U+2000–U+206F) contains typographic spaces, dashes, quotation marks, and …
The Mathematical Operators block (U+2200–U+22FF) contains 256 symbols covering set theory, logic, …
The Arrows block (U+2190–U+21FF) contains 112 arrow characters including simple directional arrows, …
The Dingbats block (U+2700–U+27BF) was created to encode the Zapf Dingbats typeface …
The Miscellaneous Symbols block (U+2600–U+26FF) is one of Unicode's most eclectic, containing …
The CJK Unified Ideographs block (U+4E00–U+9FFF) is one of the largest Unicode …
The Hangul Syllables block (U+AC00–U+D7A3) contains 11,172 precomposed Korean syllable blocks algorithmically …
The Currency Symbols block (U+20A0–U+20CF) contains dedicated Unicode characters for currencies that …
The Box Drawing block (U+2500–U+257F) and Block Elements block (U+2580–U+259F) provide characters …
The Enclosed Alphanumerics block (U+2460–U+24FF) contains circled numbers, parenthesized numbers and letters, …
The Geometric Shapes block (U+25A0–U+25FF) and related blocks contain squares, circles, triangles, …
The Musical Symbols block (U+1D100–U+1D1FF) is a Supplementary Multilingual Plane block containing …