# UnicodeFYI.com — Full Content Index

> The comprehensive Unicode character reference: 154,998 characters, 336 blocks, 168 scripts, collections, glossary, guides, and developer tools.

## URL Patterns

- /char/U+{hex}/ — Character detail (e.g., /char/U+2713/)
- /block/ — All Unicode blocks
- /block/{slug}/ — Block detail (e.g., /block/dingbats/)
- /script/ — All scripts
- /script/{slug}/ — Script detail (e.g., /script/latin/)
- /collection/ — All collections
- /collection/{slug}/ — Collection detail
- /glossary/ — Unicode glossary
- /glossary/{slug}/ — Term detail
- /guide/ — In-depth guides
- /guide/{slug}/ — Guide detail
- /series/ — Signature series index
- /series/{series_slug}/ — Series overview
- /series/{series_slug}/{chapter_slug}/ — Chapter detail
- /tool/ — Interactive tools
- /search/?q={query} — Character search

## Markdown Endpoints (.md)

Every content page has a `.md` variant that returns plain-text markdown,
making the site LLM-friendly and easy to consume programmatically:

- /char/U+{hex}.md — Character detail as markdown
- /glossary/{slug}.md — Glossary term as markdown
- /guide/{slug}.md — Guide as markdown
- /series/{series}/{chapter}.md — Signature series chapter as markdown

Example: https://unicodefyi.com/glossary/utf-8.md

## Feeds

- /feed/rss/ — RSS 2.0 feed (latest 20 guides)
- /feed/atom/ — Atom 1.0 feed (latest 20 guides)

## Glossary Terms

- [ASCII](/glossary/ascii/) — American Standard Code for Information Interchange. 7-bit encoding covering 128 
- [UTF-8](/glossary/utf-8/) — Variable-length Unicode encoding using 1–4 bytes per character. The dominant enc
- [UTF-16](/glossary/utf-16/) — Variable-length Unicode encoding using 2 or 4 bytes (1 or 2 code units of 16 bit
- [UTF-32](/glossary/utf-32/) — Fixed-length Unicode encoding using exactly 4 bytes per character. Simple but sp
- [UCS-2](/glossary/ucs-2/) — Obsolete fixed-length 2-byte encoding covering only the BMP (U+0000–U+FFFF). Pre
- [ISO 8859](/glossary/iso-8859/) — Family of 8-bit single-byte encodings for different language groups. ISO 8859-1 
- [Windows-1252](/glossary/windows-1252/) — Microsoft's superset of ISO 8859-1, adding smart quotes, em dash, and euro sign 
- [Shift JIS](/glossary/shift-jis/) — Japanese character encoding combining single-byte ASCII/JIS Roman with double-by
- [EUC-KR](/glossary/euc-kr/) — Korean character encoding based on KS X 1001, mapping Hangul syllables and Hanja
- [Big5](/glossary/big5/) — Traditional Chinese character encoding used primarily in Taiwan and Hong Kong, e
- [GB2312 / GB18030](/glossary/gb2312/) — Simplified Chinese character encoding family: GB2312 (6,763 characters) evolved 
- [EBCDIC](/glossary/ebcdic/) — Extended Binary Coded Decimal Interchange Code. IBM mainframe encoding with non-
- [Character Encoding](/glossary/character-encoding/) — A system that maps characters to byte sequences for digital storage and transmis
- [IANA Charset](/glossary/iana-charset/) — Official registry of character encoding names maintained by IANA, used in HTTP C
- [Byte Order Mark (BOM)](/glossary/byte-order-mark/) — U+FEFF placed at the start of a text stream to indicate byte order and encoding.
- [Unicode](/glossary/unicode/) — Universal character encoding standard assigning a unique number (code point) to 
- [Unicode Consortium](/glossary/unicode-consortium/) — Non-profit organization that develops and maintains the Unicode Standard. Member
- [Code Point](/glossary/code-point/) — A numerical value in the Unicode code space (U+0000 to U+10FFFF), written as U+X
- [Code Space](/glossary/code-space/) — The complete range of possible Unicode code points: U+0000 to U+10FFFF (1,114,11
- [Basic Multilingual Plane (BMP)](/glossary/bmp/) — Plane 0 (U+0000–U+FFFF), containing the most commonly used characters including 
- [Supplementary Plane / Astral Plane](/glossary/supplementary-plane/) — Planes 1–16 (U+10000–U+10FFFF), containing emoji, historic scripts, CJK extensio
- [Private Use Area (PUA)](/glossary/private-use-area/) — Reserved ranges where organizations can assign their own characters: BMP PUA (U+
- [Unicode Version](/glossary/unicode-version/) — Major releases of the Unicode Standard, each adding new characters, scripts, and
- [Unicode Character Database (UCD)](/glossary/ucd/) — Machine-readable collection of data files defining all Unicode character propert
- [ISO 10646 / Universal Character Set](/glossary/iso-10646/) — International standard (ISO/IEC 10646) synchronized with Unicode, defining the s
- [Unicode Stability Policy](/glossary/unicode-stability-policy/) — Guarantee that once a character is assigned, its code point and name never chang
- [Assigned Character](/glossary/assigned-character/) — A code point that has been given a character designation in a Unicode version. A
- [Noncharacter](/glossary/noncharacter/) — Code points permanently reserved for internal use (66 total): U+FDD0–U+FDEF and 
- [Surrogate](/glossary/surrogate/) — Code points U+D800–U+DFFF reserved exclusively for UTF-16 surrogate pairs. Not v
- [Code Unit](/glossary/code-unit/) — The minimal unit of encoding: an 8-bit byte in UTF-8, a 16-bit word in UTF-16, a
- [Plane](/glossary/plane/) — A contiguous block of 65,536 code points. Unicode has 17 planes (0–16): Plane 0 
- [Unicode Scalar Value](/glossary/unicode-scalar-value/) — Any code point except surrogate code points (U+D800–U+DFFF). The valid set of va
- [Unassigned Code Point](/glossary/unassigned-code-point/) — A code point not yet assigned a character in any Unicode version, categorized as
- [Reserved Code Point](/glossary/reserved-code-point/) — A code point set aside for future standardization, distinct from noncharacters (
- [Abstract Character](/glossary/abstract-character/) — A unit of information used for organizing, controlling, or representing textual 
- [Block](/glossary/block/) — A named contiguous range of code points (e.g., Basic Latin = U+0000–U+007F). Uni
- [Script](/glossary/script/) — The writing system a character belongs to (e.g., Latin, Cyrillic, Han). Unicode 
- [General Category](/glossary/general-category/) — Classification of every code point into one of 30 categories (Lu, Ll, Nd, So, et
- [Bidirectional Category](/glossary/bidirectional-category/) — Property determining how a character behaves in bidirectional text (LTR, RTL, we
- [Combining Class](/glossary/combining-class/) — Numeric value (0–254) controlling the ordering of combining marks during canonic
- [Decomposition](/glossary/decomposition/) — The mapping of a character to its component parts. Canonical decomposition prese
- [Numeric Value](/glossary/numeric-value/) — The numeric interpretation of a character, if any: digit value (0–9), decimal va
- [Mirrored Property](/glossary/mirrored-property/) — Characters whose glyph should be horizontally mirrored in RTL context. Examples:
- [Age Property](/glossary/age-property/) — The Unicode version in which a character was first assigned. Useful for determin
- [Name Alias](/glossary/name-alias/) — Alternative names for characters, since Unicode names cannot change per the stab
- [Canonical Equivalence](/glossary/canonical-equivalence/) — Two character sequences that are semantically identical and should be treated as
- [Compatibility Equivalence](/glossary/compatibility-equivalence/) — Two character sequences with the same abstract content that may differ in appear
- [Default Ignorable](/glossary/default-ignorable/) — Characters that should have no visible effect and can be ignored by processes th
- [Extended Grapheme Cluster](/glossary/grapheme-cluster/) — The user-perceived 'character' — what feels like a single unit. May consist of m
- [Case Mapping](/glossary/case-mapping/) — The rules for converting characters between uppercase, lowercase, and titlecase.
- [Unicode Normalization](/glossary/normalization/) — Process of converting Unicode text to a standard canonical form. Four forms: NFC
- [NFC (Canonical Composition)](/glossary/nfc/) — Normalization Form C: decompose then recompose canonically, producing the shorte
- [NFD (Canonical Decomposition)](/glossary/nfd/) — Normalization Form D: fully decompose without recomposing. Used by the macOS HFS
- [NFKC (Compatibility Composition)](/glossary/nfkc/) — Normalization Form KC: compatibility decomposition then canonical composition. M
- [NFKD (Compatibility Decomposition)](/glossary/nfkd/) — Normalization Form KD: compatibility decomposition without recomposing. The most
- [Unicode Bidirectional Algorithm (UBA)](/glossary/bidirectional-algorithm/) — Algorithm determining display order of characters in mixed-direction text (e.g.,
- [Unicode Collation Algorithm (UCA)](/glossary/collation-algorithm/) — Standard algorithm for comparing and sorting Unicode strings using multi-level c
- [Unicode Line Breaking Algorithm](/glossary/line-breaking-algorithm/) — Rules for determining where text can wrap to the next line, considering characte
- [Unicode Text Segmentation](/glossary/text-segmentation/) — Algorithms for finding boundaries in text: grapheme cluster, word, and sentence 
- [Word Boundary](/glossary/word-boundary/) — The position between words as determined by Unicode word break rules. Not a simp
- [Sentence Boundary](/glossary/sentence-boundary/) — The position between sentences per Unicode rules. More complex than splitting on
- [Composition Exclusion](/glossary/composition-exclusion/) — Characters excluded from canonical composition (NFC) to prevent non-starter deco
- [Glyph](/glossary/glyph/) — The visual representation of a character as rendered by a font. One character ma
- [Font](/glossary/font/) — A specific implementation of a typeface at a particular size, weight, and style.
- [Ligature](/glossary/ligature/) — Two or more characters joined into a single glyph. Can be typographic (fi → ﬁ vi
- [Diacritical Mark / Diacritic](/glossary/diacritical-mark/) — A mark added to a letter to change pronunciation or meaning. Can be precomposed 
- [Whitespace Character](/glossary/whitespace/) — Characters that represent horizontal or vertical space but have no visible glyph
- [Zero Width Character](/glossary/zero-width-character/) — Characters with zero advance width — invisible in rendering but affecting text b
- [Non-Breaking Space](/glossary/non-breaking-space/) — U+00A0. A space that prevents line breaking at its position. HTML: &nbsp;. Used 
- [Combining Character](/glossary/combining-character/) — A character that attaches to the preceding base character to modify it. General 
- [Dash](/glossary/dash/) — Punctuation marks used to separate parts of a sentence or indicate ranges. Unico
- [Quotation Mark](/glossary/quotation-mark/) — Paired punctuation marks enclosing direct speech or quotations. Unicode includes
- [Ellipsis](/glossary/ellipsis/) — U+2026 HORIZONTAL ELLIPSIS (…). A single character replacing three periods, typo
- [Em / En (Typographic Units)](/glossary/em-en/) — Em: a width equal to the font size. En: half an em. Used to define em dash width
- [RTL (Right-to-Left)](/glossary/rtl/) — Text directionality where characters flow from right to left. Used by Arabic, He
- [Kerning](/glossary/kerning/) — Adjusting the spacing between specific character pairs for visual harmony (e.g.,
- [Small Caps](/glossary/small-caps/) — Uppercase letterforms at the height of lowercase letters. CSS: font-variant: sma
- [Input Method Editor (IME)](/glossary/ime/) — Software component enabling input of complex characters (CJK, Korean, etc.) usin
- [Dead Key](/glossary/dead-key/) — A key that produces no output immediately but modifies the next keystroke. Used 
- [Compose Key](/glossary/compose-key/) — A key (usually Right Alt or custom-mapped) that starts a multi-key composition s
- [Character Map](/glossary/character-map/) — GUI utility for browsing and inserting Unicode characters. Windows: charmap.exe.
- [Alt Code](/glossary/alt-code/) — Windows input method using Alt + numpad digits to type characters by their code 
- [Hex Input](/glossary/hex-input/) — Direct Unicode code point entry by typing the hex value. Mac: hold Option + hex 
- [Unicode Input Method](/glossary/unicode-input-method/) — Any method for entering characters by their Unicode code point: hex input (Mac),
- [Character Picker](/glossary/character-picker/) — UI component (native or web-based) for browsing and selecting characters visuall
- [HTML Entity](/glossary/html-entity/) — A textual representation of a character in HTML. Three forms: named (&amp;), dec
- [Named Character Reference](/glossary/named-character-reference/) — HTML entity using a human-readable name: &copy; → ©, &mdash; → —. HTML5 defines 
- [Numeric Character Reference](/glossary/numeric-character-reference/) — HTML entity using the Unicode code point number: decimal (&#169; → ©) or hexadec
- [CSS Content Property](/glossary/css-content-property/) — CSS property inserting generated content via ::before and ::after pseudo-element
- [Percent-Encoding (URL Encoding)](/glossary/url-encoding/) — Encoding non-ASCII and reserved characters in URLs by replacing each byte with %
- [Punycode](/glossary/punycode/) — ASCII-compatible encoding of Unicode domain names, converting internationalized 
- [Internationalized Domain Name (IDN)](/glossary/idn/) — Domain names containing non-ASCII Unicode characters, internally stored as Punyc
- [Content-Type Charset](/glossary/content-type-charset/) — HTTP header parameter declaring the character encoding of a response (Content-Ty
- [Variation Selector](/glossary/variation-selector/) — Characters (U+FE00–U+FE0F, U+E0100–U+E01EF) that select a specific glyph variant
- [Emoji Presentation](/glossary/emoji-presentation/) — Rendering a character with a colorful emoji glyph, typically using Variation Sel
- [Word Joiner](/glossary/word-joiner/) — U+2060. A zero-width character that prevents line breaking. The modern replaceme
- [XML Character Reference](/glossary/xml-character-reference/) — XML's version of numeric character references: &#x2713; or &#10003;. XML has onl
- [String](/glossary/string/) — A sequence of characters in a programming language. Internal representation vari
- [Surrogate Pair](/glossary/surrogate-pair/) — Two 16-bit code units (a high surrogate U+D800–U+DBFF + low surrogate U+DC00–U+D
- [Unicode Escape Sequence](/glossary/unicode-escape-sequence/) — Syntax for representing Unicode characters in source code. Varies by language: \
- [Unicode Regular Expression](/glossary/unicode-regex/) — Regex patterns using Unicode properties: \p{L} (any letter), \p{Script=Greek} (G
- [String Length Ambiguity](/glossary/string-length/) — The 'length' of a Unicode string depends on the unit: code units (JavaScript .le
- [Mojibake](/glossary/mojibake/) — Garbled text resulting from decoding bytes with the wrong encoding. Japanese ter
- [Replacement Character](/glossary/replacement-character/) — U+FFFD (�). Displayed when a decoder encounters invalid byte sequences — the uni
- [Invisible Character](/glossary/invisible-character/) — Any character with no visible glyph: whitespace, zero-width characters, control 
- [Encoding / Decoding](/glossary/encoding-decoding/) — Encoding converts characters to bytes (str.encode('utf-8')); decoding converts b
- [Null Character](/glossary/null-character/) — U+0000 (NUL). The first Unicode/ASCII character, used as a string terminator in 
- [Homoglyph](/glossary/homoglyph/) — Characters from different scripts that look identical or very similar, such as L
- [Confusable](/glossary/confusable/) — Unicode's official term for character pairs that can be visually confused, defin
- [IDN Homograph Attack](/glossary/idn-homograph-attack/) — Using visually similar Unicode characters in domain names to impersonate legitim
- [Bidi Override Attack](/glossary/bidi-override/) — Using Unicode bidirectional override characters (U+202A–U+202E, U+2066–U+2069) t
- [Zero Width Joiner (ZWJ)](/glossary/zwj/) — U+200D. Requests that adjacent characters be joined. Critical for emoji sequence
- [Zero Width Non-Joiner (ZWNJ)](/glossary/zwnj/) — U+200C. Prevents joining of adjacent characters. Essential in Persian/Arabic for
- [Unicode Spoofing](/glossary/unicode-spoofing/) — Using Unicode features to deceive users: homoglyphs for fake domains, bidi overr
- [Mixed-Script Detection](/glossary/mixed-script-detection/) — Identifying text that mixes characters from different scripts (e.g., Latin + Cyr
- [Emoji](/glossary/emoji/) — Pictographic Unicode characters originating from Japanese mobile phones. Now 3,7
- [Emoji Modifier (Skin Tone)](/glossary/emoji-modifier/) — Fitzpatrick scale skin tone modifiers (U+1F3FB–U+1F3FF) that change the skin col
- [Emoji ZWJ Sequence](/glossary/emoji-zwj-sequence/) — Emoji constructed by joining multiple emoji with Zero Width Joiner (U+200D). 👨‍👩
- [Regional Indicator](/glossary/regional-indicator/) — 26 characters (U+1F1E6–U+1F1FF, 🇦–🇿) that combine in pairs to form country flag 
- [Control Character](/glossary/control-character/) — Non-printing characters that control text processing. C0 (U+0000–U+001F): NUL, T
- [CJK](/glossary/cjk/) — Chinese, Japanese, and Korean — the collective term for the unified Han ideograp
- [Text Presentation](/glossary/text-presentation/) — Rendering a character with a plain monochrome text glyph rather than a colorful 
- [Punctuation](/glossary/punctuation/) — Characters used to organize and clarify written language: periods, commas, dashe
- [Han Unification](/glossary/han-unification/) — The process of mapping Chinese, Japanese, and Korean ideographs that share a com
- [Hangul Jamo](/glossary/hangul-jamo/) — The individual consonant and vowel components (jamo) of the Korean Hangul writin
- [Unicode Technical Report (UTR)](/glossary/unicode-technical-report/) — Informational documents published by the Unicode Consortium covering specific to
- [Unicode Standard Annex (UAX)](/glossary/unicode-standard-annex/) — Normative or informative documents that are integral parts of the Unicode Standa
- [Base64](/glossary/base64/) — Binary-to-text encoding that represents binary data using 64 ASCII characters (A
- [ASCII Art](/glossary/ascii-art/) — Visual art created from text characters, originally limited to the 95 printable 
- [East Asian Width](/glossary/east-asian-width/) — Unicode property (UAX#11) classifying characters as Narrow, Wide, Fullwidth, Hal
- [Script Extensions](/glossary/script-extensions/) — Unicode property listing all scripts that use a character, broader than the sing
- [Joining Type](/glossary/joining-type/) — Unicode property controlling how Arabic and Syriac characters connect to adjacen
- [Font Fallback](/glossary/font-fallback/) — The mechanism by which a rendering engine substitutes glyphs from a secondary fo
- [OpenType](/glossary/opentype/) — Modern font format developed by Microsoft and Adobe supporting up to 65,535 glyp
- [CSS unicode-range](/glossary/unicode-range-css/) — CSS @font-face descriptor specifying which Unicode code points a font should cov
- [Web Fonts](/glossary/web-fonts/) — Fonts downloaded by the browser to render text, declared via CSS @font-face. WOF
- [Case Folding](/glossary/case-folding/) — Mapping characters to a common case form for case-insensitive comparison. More c
- [Grapheme Cluster Boundary](/glossary/grapheme-break/) — Rules (UAX#29) for determining where one user-perceived character ends and anoth
- [String Comparison](/glossary/string-comparison/) — Comparing Unicode strings requires normalization (NFC/NFD) and optionally collat
- [JavaScript Intl API](/glossary/intl-api/) — ECMAScript Internationalization API providing locale-aware string comparison (Co
- [Unicode in CSS](/glossary/unicode-in-css/) — CSS supports Unicode via escape sequences (\2713 for ✓), the content property fo
- [CSS Text Direction](/glossary/text-direction-css/) — CSS properties (direction, writing-mode, unicode-bidi) controlling text layout d
- [Python Unicode](/glossary/python-unicode/) — Python 3 uses Unicode strings by default (str = UTF-8 internally via PEP 393). K
- [Java Unicode](/glossary/java-unicode/) — Java strings use UTF-16 internally. char is 16-bit (only BMP). For supplementary
- [Rust Unicode](/glossary/rust-unicode/) — Rust strings (str/String) are guaranteed valid UTF-8. char type represents a Uni
- [Character Palette](/glossary/character-palette/) — A system-level tool for browsing and inserting Unicode characters. macOS Charact
- [Bidi Text Attack](/glossary/bidi-attack/) — Exploiting Unicode bidirectional control characters to disguise malicious code o
- [Normalization Attack](/glossary/unicode-normalization-attack/) — Exploiting Unicode normalization to bypass security filters. Input validated bef
- [Emoji Sequences](/glossary/emoji-sequences/) — Multi-character emoji constructed by combining base emoji with modifiers, ZWJ ch
- [Emoji Skin Tone](/glossary/emoji-skin-tone/) — Five Fitzpatrick scale modifiers (U+1F3FB–U+1F3FF, 🏻–🏿) that change human emoji 

## Guides

- [What is Unicode? A Complete Guide](/guide/what-is-unicode/)
- [UTF-8 Encoding Explained](/guide/utf-8-encoding-explained/)
- [UTF-8 vs UTF-16 vs UTF-32: When to Use Each](/guide/utf-8-vs-utf-16-vs-utf-32/)
- [What is a Unicode Code Point?](/guide/what-is-code-point/)
- [Unicode Planes and the BMP](/guide/unicode-planes-guide/)
- [Understanding Byte Order Mark (BOM)](/guide/byte-order-mark-guide/)
- [Surrogate Pairs Explained](/guide/surrogate-pairs-explained/)
- [ASCII to Unicode: The Evolution of Character Encoding](/guide/ascii-to-unicode/)
- [Unicode Normalization: NFC, NFD, NFKC, NFKD](/guide/unicode-normalization-guide/)
- [The Unicode Bidirectional Algorithm](/guide/unicode-bidirectional-algorithm/)
- [Unicode General Categories Explained](/guide/unicode-general-categories/)
- [Understanding Unicode Blocks](/guide/understanding-unicode-blocks/)
- [Unicode Scripts: How Writing Systems are Organized](/guide/unicode-scripts-guide/)
- [What are Combining Characters?](/guide/combining-characters-guide/)
- [Grapheme Clusters vs Code Points](/guide/grapheme-clusters-vs-code-points/)
- [Unicode Confusables: A Security Guide](/guide/unicode-confusables-guide/)
- [Zero Width Characters: What They Are and Why They Matter](/guide/zero-width-characters-guide/)
- [Unicode Whitespace Characters Guide](/guide/unicode-whitespace-guide/)
- [History of Unicode](/guide/history-of-unicode/)
- [Unicode Versions Timeline](/guide/unicode-versions-timeline/)
- [Unicode in Python](/guide/unicode-in-python/)
- [Unicode in JavaScript](/guide/unicode-in-javascript/)
- [Unicode in Java](/guide/unicode-in-java/)
- [Unicode in Go](/guide/unicode-in-go/)
- [Unicode in Rust](/guide/unicode-in-rust/)
- [Unicode in C/C++](/guide/unicode-in-c-cpp/)
- [Unicode in Ruby](/guide/unicode-in-ruby/)
- [Unicode in PHP](/guide/unicode-in-php/)
- [Unicode in Swift](/guide/unicode-in-swift/)
- [Unicode in HTML & CSS](/guide/unicode-in-html-css/)
- [Unicode in Regular Expressions](/guide/unicode-in-regular-expressions/)
- [Unicode in SQL](/guide/unicode-in-sql/)
- [Unicode in URLs](/guide/unicode-in-urls/)
- [Unicode Escape Sequences: Cross-Language Reference](/guide/unicode-escape-sequences-guide/)
- [How to Handle Unicode in APIs and JSON](/guide/unicode-in-json-api/)
- [Complete Arrow Symbols List](/guide/arrow-symbols-guide/)
- [All Check Mark and Tick Symbols](/guide/check-mark-symbols-guide/)
- [Star and Asterisk Symbols](/guide/star-symbols-guide/)
- [Heart Symbols Complete Guide](/guide/heart-symbols-guide/)
- [Currency Symbols Around the World](/guide/currency-symbols-guide/)
- [Mathematical Symbols and Operators](/guide/math-symbols-guide/)
- [Bracket and Parenthesis Symbols](/guide/bracket-symbols-guide/)
- [Bullet Point Symbols](/guide/bullet-point-symbols-guide/)
- [Line and Box Drawing Characters](/guide/box-drawing-guide/)
- [Musical Note Symbols](/guide/musical-note-symbols-guide/)
- [Fraction Symbols Guide](/guide/fraction-symbols-guide/)
- [Superscript and Subscript Characters](/guide/superscript-subscript-guide/)
- [Circle Symbols](/guide/circle-symbols-guide/)
- [Square and Rectangle Symbols](/guide/square-symbols-guide/)
- [Triangle Symbols](/guide/triangle-symbols-guide/)
- [Diamond Symbols](/guide/diamond-symbols-guide/)
- [Cross and X Mark Symbols](/guide/cross-x-mark-guide/)
- [Dash and Hyphen Symbols Guide](/guide/dash-hyphen-guide/)
- [Quotation Mark Symbols Complete Guide](/guide/quotation-marks-guide/)
- [Copyright, Trademark & Legal Symbols](/guide/legal-symbols-guide/)
- [Degree and Temperature Symbols](/guide/temperature-symbols-guide/)
- [Circled and Enclosed Number Symbols](/guide/enclosed-numbers-guide/)
- [Roman Numeral Symbols](/guide/roman-numerals-guide/)
- [Greek Alphabet Symbols for Math and Science](/guide/greek-alphabet-guide/)
- [Decorative Dingbats](/guide/dingbats-guide/)
- [Playing Card Symbols](/guide/playing-card-guide/)
- [Chess Piece Symbols](/guide/chess-symbols-guide/)
- [Zodiac and Astrological Symbols](/guide/zodiac-symbols-guide/)
- [Braille Pattern Characters](/guide/braille-patterns-guide/)
- [Geometric Shapes Complete Guide](/guide/geometric-shapes-guide/)
- [Letterlike Symbols](/guide/letterlike-symbols-guide/)
- [Technical Symbols Guide](/guide/technical-symbols-guide/)
- [Combining Characters and Diacritics Guide](/guide/diacritics-guide/)
- [Whitespace and Invisible Characters Guide](/guide/invisible-characters-guide/)
- [Warning and Hazard Signs](/guide/warning-symbols-guide/)
- [Weather Symbols Guide](/guide/weather-symbols-guide/)
- [Religious Symbols in Unicode](/guide/religious-symbols-guide/)
- [Gender and Identity Symbols](/guide/gender-symbols-guide/)
- [Keyboard Shortcut Symbols Guide](/guide/keyboard-symbols-guide/)
- [Symbols for Social Media Bios](/guide/social-media-symbols-guide/)
- [Basic Latin (ASCII) Block](/guide/basic-latin-block/)
- [Latin-1 Supplement Block](/guide/latin-1-supplement-block/)
- [General Punctuation Block](/guide/general-punctuation-block/)
- [Mathematical Operators Block](/guide/mathematical-operators-block/)
- [Arrows Block](/guide/arrows-block/)
- [Dingbats Block](/guide/dingbats-block/)
- [Miscellaneous Symbols Block](/guide/miscellaneous-symbols-block/)
- [CJK Unified Ideographs Overview](/guide/cjk-unified-ideographs/)
- [Hangul Block](/guide/hangul-block/)
- [Emoji Blocks Overview](/guide/emoji-blocks-guide/)
- [Currency Symbols Block](/guide/currency-block/)
- [Box Drawing & Block Elements Blocks](/guide/box-drawing-block-elements/)
- [Enclosed Alphanumerics Block](/guide/enclosed-alphanumerics-block/)
- [Geometric Shapes Blocks](/guide/geometric-shapes-block/)
- [Musical Symbols Block](/guide/musical-symbols-block/)
- [Arabic Script Deep Dive](/guide/arabic-script-guide/)
- [Devanagari Script Deep Dive](/guide/devanagari-script-guide/)
- [Greek and Coptic](/guide/greek-coptic-guide/)
- [Cyrillic Script](/guide/cyrillic-script-guide/)
- [Hebrew Script](/guide/hebrew-script-guide/)
- [Thai Script](/guide/thai-script-guide/)
- [Japanese Writing Systems](/guide/japanese-writing-guide/)
- [Korean Hangul System](/guide/hangul-system-guide/)
- [Bengali Script](/guide/bengali-script-guide/)
- [Tamil Script](/guide/tamil-script-guide/)
- [Armenian Script](/guide/armenian-script-guide/)
- [Georgian Script](/guide/georgian-script-guide/)
- [Ethiopic Script](/guide/ethiopic-script-guide/)
- [Dead Scripts in Unicode](/guide/dead-scripts-unicode/)
- [Writing Systems of the World](/guide/writing-systems-overview/)
- [How to Type Special Characters on Windows](/guide/type-special-chars-windows/)
- [How to Type Special Characters on Mac](/guide/type-special-chars-mac/)
- [How to Type Special Characters on Linux](/guide/type-special-chars-linux/)
- [Special Characters on Mobile (iOS/Android)](/guide/type-special-chars-mobile/)
- [How to Fix Mojibake (Garbled Text)](/guide/fix-mojibake-guide/)
- [Unicode in Databases](/guide/unicode-in-databases/)
- [Unicode in Filenames](/guide/unicode-in-filenames/)
- [Unicode in Email](/guide/unicode-in-email/)
- [Unicode in Domain Names (IDN)](/guide/unicode-domain-names/)
- [Unicode for Accessibility](/guide/unicode-accessibility/)
- [Unicode Text Direction: LTR vs RTL](/guide/unicode-text-direction/)
- [Unicode Fonts: How Characters Get Rendered](/guide/unicode-fonts-guide/)
- [How to Find Any Unicode Character](/guide/find-unicode-character/)
- [Unicode Copy and Paste Best Practices](/guide/unicode-copy-paste/)
- [How to Create Fancy Text with Unicode](/guide/fancy-text-guide/)
- [Unicode in Microsoft Word](/guide/unicode-in-word/)
- [Unicode in Google Docs & Sheets](/guide/unicode-in-google-docs/)
- [Unicode in Terminal / Command Line](/guide/unicode-in-terminal/)
- [Unicode in PDF Documents](/guide/unicode-in-pdf/)
- [Unicode in Excel](/guide/unicode-in-excel/)
- [Unicode in Social Media](/guide/unicode-social-media/)
- [Unicode in XML and JSON](/guide/unicode-in-xml-json/)
- [Unicode in Data Science and NLP](/guide/unicode-data-science/)
- [Unicode in QR Codes](/guide/unicode-in-qr-codes/)
- [Unicode in Passwords: Security Implications](/guide/unicode-in-passwords/)
- [The Birth of ASCII (1963)](/guide/birth-of-ascii/)
- [EBCDIC: IBM's Alternative](/guide/ebcdic-history/)
- [The Unicode Consortium: Who Decides?](/guide/unicode-consortium-guide/)
- [How New Characters Get Added to Unicode](/guide/unicode-proposal-process/)
- [The Emoji Proposal Process](/guide/emoji-proposal-process/)
- [CJK Unification: Controversy and Compromise](/guide/cjk-unification-controversy/)
- [The Mojibake Problem: A History](/guide/mojibake-history/)
- [Unicode Milestones](/guide/unicode-milestones/)
- [How Unicode Changed the Internet](/guide/unicode-changed-internet/)
- [Fun Unicode Facts and Easter Eggs](/guide/unicode-fun-facts/)
- [Unicode Security Overview](/guide/unicode-security-guide/)
- [IDN Homograph Attack Detection](/guide/idn-homograph-detection/)
- [Invisible Character Detection and Removal](/guide/invisible-char-detection/)
- [Unicode in Passwords and Authentication](/guide/unicode-authentication/)
- [Preventing Unicode-based Phishing](/guide/unicode-phishing-prevention/)
- [Unicode Collation: Sorting Text Correctly](/guide/unicode-collation-guide/)
- [ICU Library: International Components for Unicode](/guide/icu-library-guide/)
- [The Future of Unicode: What Comes After 16.0?](/guide/future-of-unicode/)
- [Unicode in Compilers and Programming Language Design](/guide/unicode-in-compilers/)
- [Unicode Normalization Performance: Benchmarks](/guide/normalization-performance/)

## Signature Series

### The Unicode Odyssey

- Ch. 1: [The Problem: Why We Need Unicode](/series/unicode-odyssey/the-problem/)
- Ch. 2: [The Solution: How Unicode Works](/series/unicode-odyssey/how-unicode-works/)
- Ch. 3: [Encoding the Codepoints: UTF-8, UTF-16, UTF-32](/series/unicode-odyssey/encoding-the-codepoints/)
- Ch. 4: [Characters Are Not What You Think](/series/unicode-odyssey/characters-are-not-what-you-think/)
- Ch. 5: [The World's Writing Systems in Unicode](/series/unicode-odyssey/writing-systems-in-unicode/)
- Ch. 6: [Unicode in Your Programming Language](/series/unicode-odyssey/unicode-in-your-language/)
- Ch. 7: [Normalization: When Equal Isn't Equal](/series/unicode-odyssey/normalization/)
- Ch. 8: [Security: The Dark Side of Unicode](/series/unicode-odyssey/security-dark-side/)
- Ch. 9: [Unicode on the Web: HTML, CSS, and Beyond](/series/unicode-odyssey/unicode-on-the-web/)
- Ch. 10: [The Future of Unicode](/series/unicode-odyssey/the-future/)

### Writing Systems of the World

- Ch. 1: [The Latin Alphabet: From Rome to the Internet](/series/writing-systems/latin-alphabet/)
- Ch. 2: [The Arabic Script: Right-to-Left and Beyond](/series/writing-systems/arabic-script/)
- Ch. 3: [Chinese Characters: 20,000 Years of Writing](/series/writing-systems/chinese-characters/)
- Ch. 4: [The Korean Hangul: An Alphabet Designed by a King](/series/writing-systems/korean-hangul/)
- Ch. 5: [Devanagari and the Indic Scripts](/series/writing-systems/devanagari-indic/)
- Ch. 6: [Japanese: Three Scripts in One](/series/writing-systems/japanese-three-scripts/)
- Ch. 7: [The Greek Alphabet: From Philosophy to Physics](/series/writing-systems/greek-alphabet/)
- Ch. 8: [Cyrillic: The Script That Spans Continents](/series/writing-systems/cyrillic-script/)
- Ch. 9: [Hebrew: Ancient Script in the Digital Age](/series/writing-systems/hebrew-script/)
- Ch. 10: [Thai, Khmer, and the Southeast Asian Scripts](/series/writing-systems/southeast-asian-scripts/)
- Ch. 11: [Ge'ez (Ethiopic): Africa's Ancient Writing System](/series/writing-systems/ethiopic-geez/)
- Ch. 12: [The Endangered Scripts: Preserving Languages Through Unicode](/series/writing-systems/endangered-scripts/)

### The Developer's Unicode Handbook

- Ch. 1: [String Length Is a Lie](/series/developers-handbook/string-length-is-a-lie/)
- Ch. 2: [The Encoding Minefield](/series/developers-handbook/encoding-minefield/)
- Ch. 3: [Comparison and Sorting](/series/developers-handbook/comparison-and-sorting/)
- Ch. 4: [Search That Actually Works](/series/developers-handbook/search-that-works/)
- Ch. 5: [Input Validation Done Right](/series/developers-handbook/input-validation/)
- Ch. 6: [Rendering Complex Scripts](/series/developers-handbook/rendering-complex-scripts/)
- Ch. 7: [Security Hardening](/series/developers-handbook/security-hardening/)
- Ch. 8: [Testing Unicode](/series/developers-handbook/testing-unicode/)

### The Encoding Wars

- Ch. 1: [Morse, Baudot, and the First Codes](/series/encoding-wars/morse-baudot-first-codes/)
- Ch. 2: [ASCII: 128 Characters That Changed the World](/series/encoding-wars/ascii-128-characters/)
- Ch. 3: [The Code Page Explosion](/series/encoding-wars/code-page-explosion/)
- Ch. 4: [The Unicode Vision](/series/encoding-wars/unicode-vision/)
- Ch. 5: [UTF-8: The Encoding That Won](/series/encoding-wars/utf-8-encoding-that-won/)
- Ch. 6: [Emoji: When Characters Became Culture](/series/encoding-wars/emoji-characters-became-culture/)
- Ch. 7: [Unicode Today and Tomorrow](/series/encoding-wars/unicode-today-tomorrow/)

### Unicode for the Modern Web

- Ch. 1: [HTML and Unicode: Entities, Escapes, and Encoding](/series/modern-web/html-and-unicode/)
- Ch. 2: [CSS and Unicode: Beyond content: ""](/series/modern-web/css-and-unicode/)
- Ch. 3: [JavaScript Strings: The UTF-16 Legacy](/series/modern-web/javascript-strings/)
- Ch. 4: [APIs and Unicode: JSON, URLs, and Headers](/series/modern-web/apis-and-unicode/)
- Ch. 5: [Databases and Unicode: Collation Matters](/series/modern-web/databases-and-unicode/)
- Ch. 6: [Fonts and Rendering: Making It Look Right](/series/modern-web/fonts-and-rendering/)
- Ch. 7: [Internationalization: i18n Best Practices](/series/modern-web/internationalization/)

## i18n

All pages available in 15 languages. Prefix URL with language code:
- /ko/ (Korean), /ja/ (Japanese), /zh-hans/ (Chinese Simplified)
- /es/ (Spanish), /pt/ (Portuguese), /hi/ (Hindi), /ar/ (Arabic)
- /fr/ (French), /ru/ (Russian), /de/ (German), /tr/ (Turkish)
- /vi/ (Vietnamese), /id/ (Indonesian), /th/ (Thai)