# UnicodeFYI.com — Full Content Index > The comprehensive Unicode character reference: 154,998 characters, 336 blocks, 168 scripts, collections, glossary, guides, and developer tools. ## URL Patterns - /char/U+{hex}/ — Character detail (e.g., /char/U+2713/) - /block/ — All Unicode blocks - /block/{slug}/ — Block detail (e.g., /block/dingbats/) - /script/ — All scripts - /script/{slug}/ — Script detail (e.g., /script/latin/) - /collection/ — All collections - /collection/{slug}/ — Collection detail - /glossary/ — Unicode glossary - /glossary/{slug}/ — Term detail - /guide/ — In-depth guides - /guide/{slug}/ — Guide detail - /series/ — Signature series index - /series/{series_slug}/ — Series overview - /series/{series_slug}/{chapter_slug}/ — Chapter detail - /tool/ — Interactive tools - /search/?q={query} — Character search ## Markdown Endpoints (.md) Every content page has a `.md` variant that returns plain-text markdown, making the site LLM-friendly and easy to consume programmatically: - /char/U+{hex}.md — Character detail as markdown - /glossary/{slug}.md — Glossary term as markdown - /guide/{slug}.md — Guide as markdown - /series/{series}/{chapter}.md — Signature series chapter as markdown Example: https://unicodefyi.com/glossary/utf-8.md ## Feeds - /feed/rss/ — RSS 2.0 feed (latest 20 guides) - /feed/atom/ — Atom 1.0 feed (latest 20 guides) ## Glossary Terms - [ASCII](/glossary/ascii/) — American Standard Code for Information Interchange. 7-bit encoding covering 128 - [UTF-8](/glossary/utf-8/) — Variable-length Unicode encoding using 1–4 bytes per character. The dominant enc - [UTF-16](/glossary/utf-16/) — Variable-length Unicode encoding using 2 or 4 bytes (1 or 2 code units of 16 bit - [UTF-32](/glossary/utf-32/) — Fixed-length Unicode encoding using exactly 4 bytes per character. Simple but sp - [UCS-2](/glossary/ucs-2/) — Obsolete fixed-length 2-byte encoding covering only the BMP (U+0000–U+FFFF). Pre - [ISO 8859](/glossary/iso-8859/) — Family of 8-bit single-byte encodings for different language groups. ISO 8859-1 - [Windows-1252](/glossary/windows-1252/) — Microsoft's superset of ISO 8859-1, adding smart quotes, em dash, and euro sign - [Shift JIS](/glossary/shift-jis/) — Japanese character encoding combining single-byte ASCII/JIS Roman with double-by - [EUC-KR](/glossary/euc-kr/) — Korean character encoding based on KS X 1001, mapping Hangul syllables and Hanja - [Big5](/glossary/big5/) — Traditional Chinese character encoding used primarily in Taiwan and Hong Kong, e - [GB2312 / GB18030](/glossary/gb2312/) — Simplified Chinese character encoding family: GB2312 (6,763 characters) evolved - [EBCDIC](/glossary/ebcdic/) — Extended Binary Coded Decimal Interchange Code. IBM mainframe encoding with non- - [Character Encoding](/glossary/character-encoding/) — A system that maps characters to byte sequences for digital storage and transmis - [IANA Charset](/glossary/iana-charset/) — Official registry of character encoding names maintained by IANA, used in HTTP C - [Byte Order Mark (BOM)](/glossary/byte-order-mark/) — U+FEFF placed at the start of a text stream to indicate byte order and encoding. - [Unicode](/glossary/unicode/) — Universal character encoding standard assigning a unique number (code point) to - [Unicode Consortium](/glossary/unicode-consortium/) — Non-profit organization that develops and maintains the Unicode Standard. Member - [Code Point](/glossary/code-point/) — A numerical value in the Unicode code space (U+0000 to U+10FFFF), written as U+X - [Code Space](/glossary/code-space/) — The complete range of possible Unicode code points: U+0000 to U+10FFFF (1,114,11 - [Basic Multilingual Plane (BMP)](/glossary/bmp/) — Plane 0 (U+0000–U+FFFF), containing the most commonly used characters including - [Supplementary Plane / Astral Plane](/glossary/supplementary-plane/) — Planes 1–16 (U+10000–U+10FFFF), containing emoji, historic scripts, CJK extensio - [Private Use Area (PUA)](/glossary/private-use-area/) — Reserved ranges where organizations can assign their own characters: BMP PUA (U+ - [Unicode Version](/glossary/unicode-version/) — Major releases of the Unicode Standard, each adding new characters, scripts, and - [Unicode Character Database (UCD)](/glossary/ucd/) — Machine-readable collection of data files defining all Unicode character propert - [ISO 10646 / Universal Character Set](/glossary/iso-10646/) — International standard (ISO/IEC 10646) synchronized with Unicode, defining the s - [Unicode Stability Policy](/glossary/unicode-stability-policy/) — Guarantee that once a character is assigned, its code point and name never chang - [Assigned Character](/glossary/assigned-character/) — A code point that has been given a character designation in a Unicode version. A - [Noncharacter](/glossary/noncharacter/) — Code points permanently reserved for internal use (66 total): U+FDD0–U+FDEF and - [Surrogate](/glossary/surrogate/) — Code points U+D800–U+DFFF reserved exclusively for UTF-16 surrogate pairs. Not v - [Code Unit](/glossary/code-unit/) — The minimal unit of encoding: an 8-bit byte in UTF-8, a 16-bit word in UTF-16, a - [Plane](/glossary/plane/) — A contiguous block of 65,536 code points. Unicode has 17 planes (0–16): Plane 0 - [Unicode Scalar Value](/glossary/unicode-scalar-value/) — Any code point except surrogate code points (U+D800–U+DFFF). The valid set of va - [Unassigned Code Point](/glossary/unassigned-code-point/) — A code point not yet assigned a character in any Unicode version, categorized as - [Reserved Code Point](/glossary/reserved-code-point/) — A code point set aside for future standardization, distinct from noncharacters ( - [Abstract Character](/glossary/abstract-character/) — A unit of information used for organizing, controlling, or representing textual - [Block](/glossary/block/) — A named contiguous range of code points (e.g., Basic Latin = U+0000–U+007F). Uni - [Script](/glossary/script/) — The writing system a character belongs to (e.g., Latin, Cyrillic, Han). Unicode - [General Category](/glossary/general-category/) — Classification of every code point into one of 30 categories (Lu, Ll, Nd, So, et - [Bidirectional Category](/glossary/bidirectional-category/) — Property determining how a character behaves in bidirectional text (LTR, RTL, we - [Combining Class](/glossary/combining-class/) — Numeric value (0–254) controlling the ordering of combining marks during canonic - [Decomposition](/glossary/decomposition/) — The mapping of a character to its component parts. Canonical decomposition prese - [Numeric Value](/glossary/numeric-value/) — The numeric interpretation of a character, if any: digit value (0–9), decimal va - [Mirrored Property](/glossary/mirrored-property/) — Characters whose glyph should be horizontally mirrored in RTL context. Examples: - [Age Property](/glossary/age-property/) — The Unicode version in which a character was first assigned. Useful for determin - [Name Alias](/glossary/name-alias/) — Alternative names for characters, since Unicode names cannot change per the stab - [Canonical Equivalence](/glossary/canonical-equivalence/) — Two character sequences that are semantically identical and should be treated as - [Compatibility Equivalence](/glossary/compatibility-equivalence/) — Two character sequences with the same abstract content that may differ in appear - [Default Ignorable](/glossary/default-ignorable/) — Characters that should have no visible effect and can be ignored by processes th - [Extended Grapheme Cluster](/glossary/grapheme-cluster/) — The user-perceived 'character' — what feels like a single unit. May consist of m - [Case Mapping](/glossary/case-mapping/) — The rules for converting characters between uppercase, lowercase, and titlecase. - [Unicode Normalization](/glossary/normalization/) — Process of converting Unicode text to a standard canonical form. Four forms: NFC - [NFC (Canonical Composition)](/glossary/nfc/) — Normalization Form C: decompose then recompose canonically, producing the shorte - [NFD (Canonical Decomposition)](/glossary/nfd/) — Normalization Form D: fully decompose without recomposing. Used by the macOS HFS - [NFKC (Compatibility Composition)](/glossary/nfkc/) — Normalization Form KC: compatibility decomposition then canonical composition. M - [NFKD (Compatibility Decomposition)](/glossary/nfkd/) — Normalization Form KD: compatibility decomposition without recomposing. The most - [Unicode Bidirectional Algorithm (UBA)](/glossary/bidirectional-algorithm/) — Algorithm determining display order of characters in mixed-direction text (e.g., - [Unicode Collation Algorithm (UCA)](/glossary/collation-algorithm/) — Standard algorithm for comparing and sorting Unicode strings using multi-level c - [Unicode Line Breaking Algorithm](/glossary/line-breaking-algorithm/) — Rules for determining where text can wrap to the next line, considering characte - [Unicode Text Segmentation](/glossary/text-segmentation/) — Algorithms for finding boundaries in text: grapheme cluster, word, and sentence - [Word Boundary](/glossary/word-boundary/) — The position between words as determined by Unicode word break rules. Not a simp - [Sentence Boundary](/glossary/sentence-boundary/) — The position between sentences per Unicode rules. More complex than splitting on - [Composition Exclusion](/glossary/composition-exclusion/) — Characters excluded from canonical composition (NFC) to prevent non-starter deco - [Glyph](/glossary/glyph/) — The visual representation of a character as rendered by a font. One character ma - [Font](/glossary/font/) — A specific implementation of a typeface at a particular size, weight, and style. - [Ligature](/glossary/ligature/) — Two or more characters joined into a single glyph. Can be typographic (fi → fi vi - [Diacritical Mark / Diacritic](/glossary/diacritical-mark/) — A mark added to a letter to change pronunciation or meaning. Can be precomposed - [Whitespace Character](/glossary/whitespace/) — Characters that represent horizontal or vertical space but have no visible glyph - [Zero Width Character](/glossary/zero-width-character/) — Characters with zero advance width — invisible in rendering but affecting text b - [Non-Breaking Space](/glossary/non-breaking-space/) — U+00A0. A space that prevents line breaking at its position. HTML:  . Used - [Combining Character](/glossary/combining-character/) — A character that attaches to the preceding base character to modify it. General - [Dash](/glossary/dash/) — Punctuation marks used to separate parts of a sentence or indicate ranges. Unico - [Quotation Mark](/glossary/quotation-mark/) — Paired punctuation marks enclosing direct speech or quotations. Unicode includes - [Ellipsis](/glossary/ellipsis/) — U+2026 HORIZONTAL ELLIPSIS (…). A single character replacing three periods, typo - [Em / En (Typographic Units)](/glossary/em-en/) — Em: a width equal to the font size. En: half an em. Used to define em dash width - [RTL (Right-to-Left)](/glossary/rtl/) — Text directionality where characters flow from right to left. Used by Arabic, He - [Kerning](/glossary/kerning/) — Adjusting the spacing between specific character pairs for visual harmony (e.g., - [Small Caps](/glossary/small-caps/) — Uppercase letterforms at the height of lowercase letters. CSS: font-variant: sma - [Input Method Editor (IME)](/glossary/ime/) — Software component enabling input of complex characters (CJK, Korean, etc.) usin - [Dead Key](/glossary/dead-key/) — A key that produces no output immediately but modifies the next keystroke. Used - [Compose Key](/glossary/compose-key/) — A key (usually Right Alt or custom-mapped) that starts a multi-key composition s - [Character Map](/glossary/character-map/) — GUI utility for browsing and inserting Unicode characters. Windows: charmap.exe. - [Alt Code](/glossary/alt-code/) — Windows input method using Alt + numpad digits to type characters by their code - [Hex Input](/glossary/hex-input/) — Direct Unicode code point entry by typing the hex value. Mac: hold Option + hex - [Unicode Input Method](/glossary/unicode-input-method/) — Any method for entering characters by their Unicode code point: hex input (Mac), - [Character Picker](/glossary/character-picker/) — UI component (native or web-based) for browsing and selecting characters visuall - [HTML Entity](/glossary/html-entity/) — A textual representation of a character in HTML. Three forms: named (&), dec - [Named Character Reference](/glossary/named-character-reference/) — HTML entity using a human-readable name: © → ©, — → —. HTML5 defines - [Numeric Character Reference](/glossary/numeric-character-reference/) — HTML entity using the Unicode code point number: decimal (© → ©) or hexadec - [CSS Content Property](/glossary/css-content-property/) — CSS property inserting generated content via ::before and ::after pseudo-element - [Percent-Encoding (URL Encoding)](/glossary/url-encoding/) — Encoding non-ASCII and reserved characters in URLs by replacing each byte with % - [Punycode](/glossary/punycode/) — ASCII-compatible encoding of Unicode domain names, converting internationalized - [Internationalized Domain Name (IDN)](/glossary/idn/) — Domain names containing non-ASCII Unicode characters, internally stored as Punyc - [Content-Type Charset](/glossary/content-type-charset/) — HTTP header parameter declaring the character encoding of a response (Content-Ty - [Variation Selector](/glossary/variation-selector/) — Characters (U+FE00–U+FE0F, U+E0100–U+E01EF) that select a specific glyph variant - [Emoji Presentation](/glossary/emoji-presentation/) — Rendering a character with a colorful emoji glyph, typically using Variation Sel - [Word Joiner](/glossary/word-joiner/) — U+2060. A zero-width character that prevents line breaking. The modern replaceme - [XML Character Reference](/glossary/xml-character-reference/) — XML's version of numeric character references: ✓ or ✓. XML has onl - [String](/glossary/string/) — A sequence of characters in a programming language. Internal representation vari - [Surrogate Pair](/glossary/surrogate-pair/) — Two 16-bit code units (a high surrogate U+D800–U+DBFF + low surrogate U+DC00–U+D - [Unicode Escape Sequence](/glossary/unicode-escape-sequence/) — Syntax for representing Unicode characters in source code. Varies by language: \ - [Unicode Regular Expression](/glossary/unicode-regex/) — Regex patterns using Unicode properties: \p{L} (any letter), \p{Script=Greek} (G - [String Length Ambiguity](/glossary/string-length/) — The 'length' of a Unicode string depends on the unit: code units (JavaScript .le - [Mojibake](/glossary/mojibake/) — Garbled text resulting from decoding bytes with the wrong encoding. Japanese ter - [Replacement Character](/glossary/replacement-character/) — U+FFFD (�). Displayed when a decoder encounters invalid byte sequences — the uni - [Invisible Character](/glossary/invisible-character/) — Any character with no visible glyph: whitespace, zero-width characters, control - [Encoding / Decoding](/glossary/encoding-decoding/) — Encoding converts characters to bytes (str.encode('utf-8')); decoding converts b - [Null Character](/glossary/null-character/) — U+0000 (NUL). The first Unicode/ASCII character, used as a string terminator in - [Homoglyph](/glossary/homoglyph/) — Characters from different scripts that look identical or very similar, such as L - [Confusable](/glossary/confusable/) — Unicode's official term for character pairs that can be visually confused, defin - [IDN Homograph Attack](/glossary/idn-homograph-attack/) — Using visually similar Unicode characters in domain names to impersonate legitim - [Bidi Override Attack](/glossary/bidi-override/) — Using Unicode bidirectional override characters (U+202A–U+202E, U+2066–U+2069) t - [Zero Width Joiner (ZWJ)](/glossary/zwj/) — U+200D. Requests that adjacent characters be joined. Critical for emoji sequence - [Zero Width Non-Joiner (ZWNJ)](/glossary/zwnj/) — U+200C. Prevents joining of adjacent characters. Essential in Persian/Arabic for - [Unicode Spoofing](/glossary/unicode-spoofing/) — Using Unicode features to deceive users: homoglyphs for fake domains, bidi overr - [Mixed-Script Detection](/glossary/mixed-script-detection/) — Identifying text that mixes characters from different scripts (e.g., Latin + Cyr - [Emoji](/glossary/emoji/) — Pictographic Unicode characters originating from Japanese mobile phones. Now 3,7 - [Emoji Modifier (Skin Tone)](/glossary/emoji-modifier/) — Fitzpatrick scale skin tone modifiers (U+1F3FB–U+1F3FF) that change the skin col - [Emoji ZWJ Sequence](/glossary/emoji-zwj-sequence/) — Emoji constructed by joining multiple emoji with Zero Width Joiner (U+200D). 👨‍👩 - [Regional Indicator](/glossary/regional-indicator/) — 26 characters (U+1F1E6–U+1F1FF, 🇦–🇿) that combine in pairs to form country flag - [Control Character](/glossary/control-character/) — Non-printing characters that control text processing. C0 (U+0000–U+001F): NUL, T - [CJK](/glossary/cjk/) — Chinese, Japanese, and Korean — the collective term for the unified Han ideograp - [Text Presentation](/glossary/text-presentation/) — Rendering a character with a plain monochrome text glyph rather than a colorful - [Punctuation](/glossary/punctuation/) — Characters used to organize and clarify written language: periods, commas, dashe - [Han Unification](/glossary/han-unification/) — The process of mapping Chinese, Japanese, and Korean ideographs that share a com - [Hangul Jamo](/glossary/hangul-jamo/) — The individual consonant and vowel components (jamo) of the Korean Hangul writin - [Unicode Technical Report (UTR)](/glossary/unicode-technical-report/) — Informational documents published by the Unicode Consortium covering specific to - [Unicode Standard Annex (UAX)](/glossary/unicode-standard-annex/) — Normative or informative documents that are integral parts of the Unicode Standa - [Base64](/glossary/base64/) — Binary-to-text encoding that represents binary data using 64 ASCII characters (A - [ASCII Art](/glossary/ascii-art/) — Visual art created from text characters, originally limited to the 95 printable - [East Asian Width](/glossary/east-asian-width/) — Unicode property (UAX#11) classifying characters as Narrow, Wide, Fullwidth, Hal - [Script Extensions](/glossary/script-extensions/) — Unicode property listing all scripts that use a character, broader than the sing - [Joining Type](/glossary/joining-type/) — Unicode property controlling how Arabic and Syriac characters connect to adjacen - [Font Fallback](/glossary/font-fallback/) — The mechanism by which a rendering engine substitutes glyphs from a secondary fo - [OpenType](/glossary/opentype/) — Modern font format developed by Microsoft and Adobe supporting up to 65,535 glyp - [CSS unicode-range](/glossary/unicode-range-css/) — CSS @font-face descriptor specifying which Unicode code points a font should cov - [Web Fonts](/glossary/web-fonts/) — Fonts downloaded by the browser to render text, declared via CSS @font-face. WOF - [Case Folding](/glossary/case-folding/) — Mapping characters to a common case form for case-insensitive comparison. More c - [Grapheme Cluster Boundary](/glossary/grapheme-break/) — Rules (UAX#29) for determining where one user-perceived character ends and anoth - [String Comparison](/glossary/string-comparison/) — Comparing Unicode strings requires normalization (NFC/NFD) and optionally collat - [JavaScript Intl API](/glossary/intl-api/) — ECMAScript Internationalization API providing locale-aware string comparison (Co - [Unicode in CSS](/glossary/unicode-in-css/) — CSS supports Unicode via escape sequences (\2713 for ✓), the content property fo - [CSS Text Direction](/glossary/text-direction-css/) — CSS properties (direction, writing-mode, unicode-bidi) controlling text layout d - [Python Unicode](/glossary/python-unicode/) — Python 3 uses Unicode strings by default (str = UTF-8 internally via PEP 393). K - [Java Unicode](/glossary/java-unicode/) — Java strings use UTF-16 internally. char is 16-bit (only BMP). For supplementary - [Rust Unicode](/glossary/rust-unicode/) — Rust strings (str/String) are guaranteed valid UTF-8. char type represents a Uni - [Character Palette](/glossary/character-palette/) — A system-level tool for browsing and inserting Unicode characters. macOS Charact - [Bidi Text Attack](/glossary/bidi-attack/) — Exploiting Unicode bidirectional control characters to disguise malicious code o - [Normalization Attack](/glossary/unicode-normalization-attack/) — Exploiting Unicode normalization to bypass security filters. Input validated bef - [Emoji Sequences](/glossary/emoji-sequences/) — Multi-character emoji constructed by combining base emoji with modifiers, ZWJ ch - [Emoji Skin Tone](/glossary/emoji-skin-tone/) — Five Fitzpatrick scale modifiers (U+1F3FB–U+1F3FF, 🏻–🏿) that change human emoji ## Guides - [What is Unicode? A Complete Guide](/guide/what-is-unicode/) - [UTF-8 Encoding Explained](/guide/utf-8-encoding-explained/) - [UTF-8 vs UTF-16 vs UTF-32: When to Use Each](/guide/utf-8-vs-utf-16-vs-utf-32/) - [What is a Unicode Code Point?](/guide/what-is-code-point/) - [Unicode Planes and the BMP](/guide/unicode-planes-guide/) - [Understanding Byte Order Mark (BOM)](/guide/byte-order-mark-guide/) - [Surrogate Pairs Explained](/guide/surrogate-pairs-explained/) - [ASCII to Unicode: The Evolution of Character Encoding](/guide/ascii-to-unicode/) - [Unicode Normalization: NFC, NFD, NFKC, NFKD](/guide/unicode-normalization-guide/) - [The Unicode Bidirectional Algorithm](/guide/unicode-bidirectional-algorithm/) - [Unicode General Categories Explained](/guide/unicode-general-categories/) - [Understanding Unicode Blocks](/guide/understanding-unicode-blocks/) - [Unicode Scripts: How Writing Systems are Organized](/guide/unicode-scripts-guide/) - [What are Combining Characters?](/guide/combining-characters-guide/) - [Grapheme Clusters vs Code Points](/guide/grapheme-clusters-vs-code-points/) - [Unicode Confusables: A Security Guide](/guide/unicode-confusables-guide/) - [Zero Width Characters: What They Are and Why They Matter](/guide/zero-width-characters-guide/) - [Unicode Whitespace Characters Guide](/guide/unicode-whitespace-guide/) - [History of Unicode](/guide/history-of-unicode/) - [Unicode Versions Timeline](/guide/unicode-versions-timeline/) - [Unicode in Python](/guide/unicode-in-python/) - [Unicode in JavaScript](/guide/unicode-in-javascript/) - [Unicode in Java](/guide/unicode-in-java/) - [Unicode in Go](/guide/unicode-in-go/) - [Unicode in Rust](/guide/unicode-in-rust/) - [Unicode in C/C++](/guide/unicode-in-c-cpp/) - [Unicode in Ruby](/guide/unicode-in-ruby/) - [Unicode in PHP](/guide/unicode-in-php/) - [Unicode in Swift](/guide/unicode-in-swift/) - [Unicode in HTML & CSS](/guide/unicode-in-html-css/) - [Unicode in Regular Expressions](/guide/unicode-in-regular-expressions/) - [Unicode in SQL](/guide/unicode-in-sql/) - [Unicode in URLs](/guide/unicode-in-urls/) - [Unicode Escape Sequences: Cross-Language Reference](/guide/unicode-escape-sequences-guide/) - [How to Handle Unicode in APIs and JSON](/guide/unicode-in-json-api/) - [Complete Arrow Symbols List](/guide/arrow-symbols-guide/) - [All Check Mark and Tick Symbols](/guide/check-mark-symbols-guide/) - [Star and Asterisk Symbols](/guide/star-symbols-guide/) - [Heart Symbols Complete Guide](/guide/heart-symbols-guide/) - [Currency Symbols Around the World](/guide/currency-symbols-guide/) - [Mathematical Symbols and Operators](/guide/math-symbols-guide/) - [Bracket and Parenthesis Symbols](/guide/bracket-symbols-guide/) - [Bullet Point Symbols](/guide/bullet-point-symbols-guide/) - [Line and Box Drawing Characters](/guide/box-drawing-guide/) - [Musical Note Symbols](/guide/musical-note-symbols-guide/) - [Fraction Symbols Guide](/guide/fraction-symbols-guide/) - [Superscript and Subscript Characters](/guide/superscript-subscript-guide/) - [Circle Symbols](/guide/circle-symbols-guide/) - [Square and Rectangle Symbols](/guide/square-symbols-guide/) - [Triangle Symbols](/guide/triangle-symbols-guide/) - [Diamond Symbols](/guide/diamond-symbols-guide/) - [Cross and X Mark Symbols](/guide/cross-x-mark-guide/) - [Dash and Hyphen Symbols Guide](/guide/dash-hyphen-guide/) - [Quotation Mark Symbols Complete Guide](/guide/quotation-marks-guide/) - [Copyright, Trademark & Legal Symbols](/guide/legal-symbols-guide/) - [Degree and Temperature Symbols](/guide/temperature-symbols-guide/) - [Circled and Enclosed Number Symbols](/guide/enclosed-numbers-guide/) - [Roman Numeral Symbols](/guide/roman-numerals-guide/) - [Greek Alphabet Symbols for Math and Science](/guide/greek-alphabet-guide/) - [Decorative Dingbats](/guide/dingbats-guide/) - [Playing Card Symbols](/guide/playing-card-guide/) - [Chess Piece Symbols](/guide/chess-symbols-guide/) - [Zodiac and Astrological Symbols](/guide/zodiac-symbols-guide/) - [Braille Pattern Characters](/guide/braille-patterns-guide/) - [Geometric Shapes Complete Guide](/guide/geometric-shapes-guide/) - [Letterlike Symbols](/guide/letterlike-symbols-guide/) - [Technical Symbols Guide](/guide/technical-symbols-guide/) - [Combining Characters and Diacritics Guide](/guide/diacritics-guide/) - [Whitespace and Invisible Characters Guide](/guide/invisible-characters-guide/) - [Warning and Hazard Signs](/guide/warning-symbols-guide/) - [Weather Symbols Guide](/guide/weather-symbols-guide/) - [Religious Symbols in Unicode](/guide/religious-symbols-guide/) - [Gender and Identity Symbols](/guide/gender-symbols-guide/) - [Keyboard Shortcut Symbols Guide](/guide/keyboard-symbols-guide/) - [Symbols for Social Media Bios](/guide/social-media-symbols-guide/) - [Basic Latin (ASCII) Block](/guide/basic-latin-block/) - [Latin-1 Supplement Block](/guide/latin-1-supplement-block/) - [General Punctuation Block](/guide/general-punctuation-block/) - [Mathematical Operators Block](/guide/mathematical-operators-block/) - [Arrows Block](/guide/arrows-block/) - [Dingbats Block](/guide/dingbats-block/) - [Miscellaneous Symbols Block](/guide/miscellaneous-symbols-block/) - [CJK Unified Ideographs Overview](/guide/cjk-unified-ideographs/) - [Hangul Block](/guide/hangul-block/) - [Emoji Blocks Overview](/guide/emoji-blocks-guide/) - [Currency Symbols Block](/guide/currency-block/) - [Box Drawing & Block Elements Blocks](/guide/box-drawing-block-elements/) - [Enclosed Alphanumerics Block](/guide/enclosed-alphanumerics-block/) - [Geometric Shapes Blocks](/guide/geometric-shapes-block/) - [Musical Symbols Block](/guide/musical-symbols-block/) - [Arabic Script Deep Dive](/guide/arabic-script-guide/) - [Devanagari Script Deep Dive](/guide/devanagari-script-guide/) - [Greek and Coptic](/guide/greek-coptic-guide/) - [Cyrillic Script](/guide/cyrillic-script-guide/) - [Hebrew Script](/guide/hebrew-script-guide/) - [Thai Script](/guide/thai-script-guide/) - [Japanese Writing Systems](/guide/japanese-writing-guide/) - [Korean Hangul System](/guide/hangul-system-guide/) - [Bengali Script](/guide/bengali-script-guide/) - [Tamil Script](/guide/tamil-script-guide/) - [Armenian Script](/guide/armenian-script-guide/) - [Georgian Script](/guide/georgian-script-guide/) - [Ethiopic Script](/guide/ethiopic-script-guide/) - [Dead Scripts in Unicode](/guide/dead-scripts-unicode/) - [Writing Systems of the World](/guide/writing-systems-overview/) - [How to Type Special Characters on Windows](/guide/type-special-chars-windows/) - [How to Type Special Characters on Mac](/guide/type-special-chars-mac/) - [How to Type Special Characters on Linux](/guide/type-special-chars-linux/) - [Special Characters on Mobile (iOS/Android)](/guide/type-special-chars-mobile/) - [How to Fix Mojibake (Garbled Text)](/guide/fix-mojibake-guide/) - [Unicode in Databases](/guide/unicode-in-databases/) - [Unicode in Filenames](/guide/unicode-in-filenames/) - [Unicode in Email](/guide/unicode-in-email/) - [Unicode in Domain Names (IDN)](/guide/unicode-domain-names/) - [Unicode for Accessibility](/guide/unicode-accessibility/) - [Unicode Text Direction: LTR vs RTL](/guide/unicode-text-direction/) - [Unicode Fonts: How Characters Get Rendered](/guide/unicode-fonts-guide/) - [How to Find Any Unicode Character](/guide/find-unicode-character/) - [Unicode Copy and Paste Best Practices](/guide/unicode-copy-paste/) - [How to Create Fancy Text with Unicode](/guide/fancy-text-guide/) - [Unicode in Microsoft Word](/guide/unicode-in-word/) - [Unicode in Google Docs & Sheets](/guide/unicode-in-google-docs/) - [Unicode in Terminal / Command Line](/guide/unicode-in-terminal/) - [Unicode in PDF Documents](/guide/unicode-in-pdf/) - [Unicode in Excel](/guide/unicode-in-excel/) - [Unicode in Social Media](/guide/unicode-social-media/) - [Unicode in XML and JSON](/guide/unicode-in-xml-json/) - [Unicode in Data Science and NLP](/guide/unicode-data-science/) - [Unicode in QR Codes](/guide/unicode-in-qr-codes/) - [Unicode in Passwords: Security Implications](/guide/unicode-in-passwords/) - [The Birth of ASCII (1963)](/guide/birth-of-ascii/) - [EBCDIC: IBM's Alternative](/guide/ebcdic-history/) - [The Unicode Consortium: Who Decides?](/guide/unicode-consortium-guide/) - [How New Characters Get Added to Unicode](/guide/unicode-proposal-process/) - [The Emoji Proposal Process](/guide/emoji-proposal-process/) - [CJK Unification: Controversy and Compromise](/guide/cjk-unification-controversy/) - [The Mojibake Problem: A History](/guide/mojibake-history/) - [Unicode Milestones](/guide/unicode-milestones/) - [How Unicode Changed the Internet](/guide/unicode-changed-internet/) - [Fun Unicode Facts and Easter Eggs](/guide/unicode-fun-facts/) - [Unicode Security Overview](/guide/unicode-security-guide/) - [IDN Homograph Attack Detection](/guide/idn-homograph-detection/) - [Invisible Character Detection and Removal](/guide/invisible-char-detection/) - [Unicode in Passwords and Authentication](/guide/unicode-authentication/) - [Preventing Unicode-based Phishing](/guide/unicode-phishing-prevention/) - [Unicode Collation: Sorting Text Correctly](/guide/unicode-collation-guide/) - [ICU Library: International Components for Unicode](/guide/icu-library-guide/) - [The Future of Unicode: What Comes After 16.0?](/guide/future-of-unicode/) - [Unicode in Compilers and Programming Language Design](/guide/unicode-in-compilers/) - [Unicode Normalization Performance: Benchmarks](/guide/normalization-performance/) ## Signature Series ### The Unicode Odyssey - Ch. 1: [The Problem: Why We Need Unicode](/series/unicode-odyssey/the-problem/) - Ch. 2: [The Solution: How Unicode Works](/series/unicode-odyssey/how-unicode-works/) - Ch. 3: [Encoding the Codepoints: UTF-8, UTF-16, UTF-32](/series/unicode-odyssey/encoding-the-codepoints/) - Ch. 4: [Characters Are Not What You Think](/series/unicode-odyssey/characters-are-not-what-you-think/) - Ch. 5: [The World's Writing Systems in Unicode](/series/unicode-odyssey/writing-systems-in-unicode/) - Ch. 6: [Unicode in Your Programming Language](/series/unicode-odyssey/unicode-in-your-language/) - Ch. 7: [Normalization: When Equal Isn't Equal](/series/unicode-odyssey/normalization/) - Ch. 8: [Security: The Dark Side of Unicode](/series/unicode-odyssey/security-dark-side/) - Ch. 9: [Unicode on the Web: HTML, CSS, and Beyond](/series/unicode-odyssey/unicode-on-the-web/) - Ch. 10: [The Future of Unicode](/series/unicode-odyssey/the-future/) ### Writing Systems of the World - Ch. 1: [The Latin Alphabet: From Rome to the Internet](/series/writing-systems/latin-alphabet/) - Ch. 2: [The Arabic Script: Right-to-Left and Beyond](/series/writing-systems/arabic-script/) - Ch. 3: [Chinese Characters: 20,000 Years of Writing](/series/writing-systems/chinese-characters/) - Ch. 4: [The Korean Hangul: An Alphabet Designed by a King](/series/writing-systems/korean-hangul/) - Ch. 5: [Devanagari and the Indic Scripts](/series/writing-systems/devanagari-indic/) - Ch. 6: [Japanese: Three Scripts in One](/series/writing-systems/japanese-three-scripts/) - Ch. 7: [The Greek Alphabet: From Philosophy to Physics](/series/writing-systems/greek-alphabet/) - Ch. 8: [Cyrillic: The Script That Spans Continents](/series/writing-systems/cyrillic-script/) - Ch. 9: [Hebrew: Ancient Script in the Digital Age](/series/writing-systems/hebrew-script/) - Ch. 10: [Thai, Khmer, and the Southeast Asian Scripts](/series/writing-systems/southeast-asian-scripts/) - Ch. 11: [Ge'ez (Ethiopic): Africa's Ancient Writing System](/series/writing-systems/ethiopic-geez/) - Ch. 12: [The Endangered Scripts: Preserving Languages Through Unicode](/series/writing-systems/endangered-scripts/) ### The Developer's Unicode Handbook - Ch. 1: [String Length Is a Lie](/series/developers-handbook/string-length-is-a-lie/) - Ch. 2: [The Encoding Minefield](/series/developers-handbook/encoding-minefield/) - Ch. 3: [Comparison and Sorting](/series/developers-handbook/comparison-and-sorting/) - Ch. 4: [Search That Actually Works](/series/developers-handbook/search-that-works/) - Ch. 5: [Input Validation Done Right](/series/developers-handbook/input-validation/) - Ch. 6: [Rendering Complex Scripts](/series/developers-handbook/rendering-complex-scripts/) - Ch. 7: [Security Hardening](/series/developers-handbook/security-hardening/) - Ch. 8: [Testing Unicode](/series/developers-handbook/testing-unicode/) ### The Encoding Wars - Ch. 1: [Morse, Baudot, and the First Codes](/series/encoding-wars/morse-baudot-first-codes/) - Ch. 2: [ASCII: 128 Characters That Changed the World](/series/encoding-wars/ascii-128-characters/) - Ch. 3: [The Code Page Explosion](/series/encoding-wars/code-page-explosion/) - Ch. 4: [The Unicode Vision](/series/encoding-wars/unicode-vision/) - Ch. 5: [UTF-8: The Encoding That Won](/series/encoding-wars/utf-8-encoding-that-won/) - Ch. 6: [Emoji: When Characters Became Culture](/series/encoding-wars/emoji-characters-became-culture/) - Ch. 7: [Unicode Today and Tomorrow](/series/encoding-wars/unicode-today-tomorrow/) ### Unicode for the Modern Web - Ch. 1: [HTML and Unicode: Entities, Escapes, and Encoding](/series/modern-web/html-and-unicode/) - Ch. 2: [CSS and Unicode: Beyond content: ""](/series/modern-web/css-and-unicode/) - Ch. 3: [JavaScript Strings: The UTF-16 Legacy](/series/modern-web/javascript-strings/) - Ch. 4: [APIs and Unicode: JSON, URLs, and Headers](/series/modern-web/apis-and-unicode/) - Ch. 5: [Databases and Unicode: Collation Matters](/series/modern-web/databases-and-unicode/) - Ch. 6: [Fonts and Rendering: Making It Look Right](/series/modern-web/fonts-and-rendering/) - Ch. 7: [Internationalization: i18n Best Practices](/series/modern-web/internationalization/) ## i18n All pages available in 15 languages. Prefix URL with language code: - /ko/ (Korean), /ja/ (Japanese), /zh-hans/ (Chinese Simplified) - /es/ (Spanish), /pt/ (Portuguese), /hi/ (Hindi), /ar/ (Arabic) - /fr/ (French), /ru/ (Russian), /de/ (German), /tr/ (Turkish) - /vi/ (Vietnamese), /id/ (Indonesian), /th/ (Thai)