🧱 Block Explorer

General Punctuation Block

The General Punctuation block (U+2000–U+206F) contains typographic spaces, dashes, quotation marks, and various punctuation characters used in professional typography across many languages. This guide explores every character group in the General Punctuation block with usage context and copy-paste support.

·

The General Punctuation block (U+2000–U+206F) contains 112 code points dedicated to typographic and formatting characters that go far beyond the basic punctuation in ASCII. The block includes a rich variety of space characters, dashes, quotation marks, and invisible formatting controls — tools that professional typesetters and internationalization engineers depend on daily.

Space Characters (U+2000–U+200B)

Unicode defines many different space characters, each with a specific typographic width defined in terms of the em square. An "em" is a unit equal to the current font size; in a 12pt font, an em is 12pt.

Code Point Name Width
U+2000 EN QUAD 1 en (= ½ em)
U+2001 EM QUAD 1 em
U+2002 EN SPACE 1 en (= ½ em)
U+2003 EM SPACE 1 em
U+2004 THREE-PER-EM SPACE ⅓ em
U+2005 FOUR-PER-EM SPACE ¼ em
U+2006 SIX-PER-EM SPACE ⅙ em
U+2007 FIGURE SPACE Same width as a digit (tabular figure spacing)
U+2008 PUNCTUATION SPACE Width of a period
U+2009 THIN SPACE ⅕ em (or narrower)
U+200A HAIR SPACE Narrower than thin space
U+200B ZERO WIDTH SPACE Zero width; marks word-break opportunity

The em space (U+2003) is used in high-quality typography for paragraph indentation and after colons in some styles. The thin space (U+2009) appears between a number and its unit in SI style ("100 kg", "37.5 °C") and in French typography before certain punctuation marks (« Bonjour ! »).

The figure space (U+2007) is invaluable in financial tables: it has exactly the same width as a digit in fonts with tabular (monospaced) numerals, so decimal points and digits align perfectly in columns without using monospace fonts.

Dashes and Hyphens

ASCII provides only a single hyphen-minus (U+002D -), but professional typography distinguishes several distinct dash characters:

Code Point Character Name Use
U+2010 HYPHEN Word hyphenation (explicit, non-breaking)
U+2011 NON-BREAKING HYPHEN Hyphen that prevents line breaks
U+2012 FIGURE DASH Same width as a digit; used in phone numbers
U+2013 EN DASH Ranges ("2019–2024"), parenthetical phrases
U+2014 EM DASH Strong parenthetical—like this—in American style
U+2015 HORIZONTAL BAR Dialogue attribution in some European typographic styles

The en dash (U+2013) is used in English for number ranges ("pages 12–45"), for compound adjectives with a complex element ("post–World War II"), and for scores ("Arsenal 3–1 Chelsea"). The em dash (U+2014) serves as a parenthetical delimiter or replaces a colon — particularly common in American English — while European styles more often use spaced en dashes.

Quotation Marks

Basic Latin's straight quotation marks (" and ') are typewriter artifacts. Professional typography uses curly quotes, which Unicode supplies:

Code Points Characters Style
U+2018 / U+2019 ' ' Single: English primary, Czech/Slovak secondary
U+201A / U+201B ‚ ‛ Single: German/Czech opening; single high-reversed
U+201C / U+201D " " Double: English primary
U+201E / U+201F „ ‟ Double: German/Polish opening; double high-reversed
U+2039 / U+203A ‹ › Single angle quotation marks (French secondary)

Different languages use different conventions: German opens with low marks and closes with high („text"), French uses spaced guillemets (« texte »), English uses "curly" double quotes, and some Scandinavian languages close with the same mark used to open.

Other Typographic Punctuation

  • U+2026 HORIZONTAL ELLIPSIS — preferred over three periods ...; has different metrics and wrapping behavior
  • U+2020 DAGGER and U+2021 DOUBLE DAGGER — footnote reference marks
  • U+2022 BULLET — the standard bullet for lists
  • U+2030 PER MILLE SIGN — per thousand (0.1% = 1‰)
  • U+2031 PER TEN THOUSAND SIGN — basis points in finance
  • U+2032 PRIME and U+2033 DOUBLE PRIME — feet/minutes and inches/seconds in measurements
  • U+2038 CARET — proofreading caret
  • U+203B REFERENCE MARK — widely used in East Asian text to mark references or warnings

Invisible Formatting Characters (U+200C–U+200F, U+2028–U+202F, U+2060–U+206F)

This block contains several zero-width and invisible characters that affect text rendering and directionality without adding visible glyphs:

  • U+200C ZERO WIDTH NON-JOINER (ZWNJ) — prevents two adjacent characters from forming a ligature or joining form. Essential in Persian, Arabic, and Indic script contexts where the default joining behavior must be overridden.
  • U+200D ZERO WIDTH JOINER (ZWJ) — causes adjacent characters to use their joining/ligature forms. The basis of emoji sequences: Man + ZWJ + Laptop = 👨‍💻
  • U+200E LEFT-TO-RIGHT MARK (LRM) and U+200F RIGHT-TO-LEFT MARK (RLM) — invisible characters that influence the Unicode Bidirectional Algorithm. Used to fix text directionality in mixed LTR/RTL contexts.
  • U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR — Unicode's own line and paragraph break characters, distinct from CR and LF. JavaScript historically treated these as line terminators inside string literals, causing parse errors in JS embedded in HTML.
  • U+2060 WORD JOINER — like a zero-width no-break space; prevents line breaking between two characters without adding any space. The preferred replacement for U+FEFF BYTE ORDER MARK when used as a word-joiner.
  • U+2064 INVISIBLE PLUS — used in mathematical notation to indicate an implicit addition (e.g., between the integer and fractional parts of a mixed number) without rendering a visible sign.

Security Considerations

Several characters in this block have been used in security attacks. Zero-width characters can be used to create visually identical strings with different byte sequences — useful for watermarking but dangerous for spoofing. The RLM and LRM characters can cause confusingly bidirectional filenames and URLs. Security-sensitive contexts such as usernames, domain names, and code identifiers should normalize or reject many General Punctuation characters.

Mehr in Block Explorer

Basic Latin (ASCII) Block

The Basic Latin block (U+0000–U+007F) is the first Unicode block and covers …

Latin-1 Supplement Block

The Latin-1 Supplement block (U+0080–U+00FF) extends ASCII with accented Latin characters for …

Mathematical Operators Block

The Mathematical Operators block (U+2200–U+22FF) contains 256 symbols covering set theory, logic, …

Arrows Block

The Arrows block (U+2190–U+21FF) contains 112 arrow characters including simple directional arrows, …

Dingbats Block

The Dingbats block (U+2700–U+27BF) was created to encode the Zapf Dingbats typeface …

Miscellaneous Symbols Block

The Miscellaneous Symbols block (U+2600–U+26FF) is one of Unicode's most eclectic, containing …

CJK Unified Ideographs Overview

The CJK Unified Ideographs block (U+4E00–U+9FFF) is one of the largest Unicode …

Hangul Block

The Hangul Syllables block (U+AC00–U+D7A3) contains 11,172 precomposed Korean syllable blocks algorithmically …

Emoji Blocks Overview

Emoji in Unicode span multiple blocks across the Supplementary Multilingual Plane, including …

Currency Symbols Block

The Currency Symbols block (U+20A0–U+20CF) contains dedicated Unicode characters for currencies that …

Box Drawing & Block Elements Blocks

The Box Drawing block (U+2500–U+257F) and Block Elements block (U+2580–U+259F) provide characters …

Enclosed Alphanumerics Block

The Enclosed Alphanumerics block (U+2460–U+24FF) contains circled numbers, parenthesized numbers and letters, …

Geometric Shapes Blocks

The Geometric Shapes block (U+25A0–U+25FF) and related blocks contain squares, circles, triangles, …

Musical Symbols Block

The Musical Symbols block (U+1D100–U+1D1FF) is a Supplementary Multilingual Plane block containing …