General Punctuation Block
The General Punctuation block (U+2000–U+206F) contains typographic spaces, dashes, quotation marks, and various punctuation characters used in professional typography across many languages. This guide explores every character group in the General Punctuation block with usage context and copy-paste support.
The General Punctuation block (U+2000–U+206F) contains 112 code points dedicated to typographic and formatting characters that go far beyond the basic punctuation in ASCII. The block includes a rich variety of space characters, dashes, quotation marks, and invisible formatting controls — tools that professional typesetters and internationalization engineers depend on daily.
Space Characters (U+2000–U+200B)
Unicode defines many different space characters, each with a specific typographic width defined in terms of the em square. An "em" is a unit equal to the current font size; in a 12pt font, an em is 12pt.
| Code Point | Name | Width |
|---|---|---|
| U+2000 | EN QUAD | 1 en (= ½ em) |
| U+2001 | EM QUAD | 1 em |
| U+2002 | EN SPACE | 1 en (= ½ em) |
| U+2003 | EM SPACE | 1 em |
| U+2004 | THREE-PER-EM SPACE | ⅓ em |
| U+2005 | FOUR-PER-EM SPACE | ¼ em |
| U+2006 | SIX-PER-EM SPACE | ⅙ em |
| U+2007 | FIGURE SPACE | Same width as a digit (tabular figure spacing) |
| U+2008 | PUNCTUATION SPACE | Width of a period |
| U+2009 | THIN SPACE | ⅕ em (or narrower) |
| U+200A | HAIR SPACE | Narrower than thin space |
| U+200B | ZERO WIDTH SPACE | Zero width; marks word-break opportunity |
The em space (U+2003) is used in high-quality typography for paragraph indentation and after colons in some styles. The thin space (U+2009) appears between a number and its unit in SI style ("100 kg", "37.5 °C") and in French typography before certain punctuation marks (« Bonjour ! »).
The figure space (U+2007) is invaluable in financial tables: it has exactly the same width as a digit in fonts with tabular (monospaced) numerals, so decimal points and digits align perfectly in columns without using monospace fonts.
Dashes and Hyphens
ASCII provides only a single hyphen-minus (U+002D -), but professional typography distinguishes several distinct dash characters:
| Code Point | Character | Name | Use |
|---|---|---|---|
| U+2010 | ‐ | HYPHEN | Word hyphenation (explicit, non-breaking) |
| U+2011 | ‑ | NON-BREAKING HYPHEN | Hyphen that prevents line breaks |
| U+2012 | ‒ | FIGURE DASH | Same width as a digit; used in phone numbers |
| U+2013 | – | EN DASH | Ranges ("2019–2024"), parenthetical phrases |
| U+2014 | — | EM DASH | Strong parenthetical—like this—in American style |
| U+2015 | ― | HORIZONTAL BAR | Dialogue attribution in some European typographic styles |
The en dash (U+2013) is used in English for number ranges ("pages 12–45"), for compound adjectives with a complex element ("post–World War II"), and for scores ("Arsenal 3–1 Chelsea"). The em dash (U+2014) serves as a parenthetical delimiter or replaces a colon — particularly common in American English — while European styles more often use spaced en dashes.
Quotation Marks
Basic Latin's straight quotation marks (" and ') are typewriter artifacts. Professional typography uses curly quotes, which Unicode supplies:
| Code Points | Characters | Style |
|---|---|---|
| U+2018 / U+2019 | ' ' | Single: English primary, Czech/Slovak secondary |
| U+201A / U+201B | ‚ ‛ | Single: German/Czech opening; single high-reversed |
| U+201C / U+201D | " " | Double: English primary |
| U+201E / U+201F | „ ‟ | Double: German/Polish opening; double high-reversed |
| U+2039 / U+203A | ‹ › | Single angle quotation marks (French secondary) |
Different languages use different conventions: German opens with low marks and closes with high („text"), French uses spaced guillemets (« texte »), English uses "curly" double quotes, and some Scandinavian languages close with the same mark used to open.
Other Typographic Punctuation
- U+2026 HORIZONTAL ELLIPSIS
…— preferred over three periods...; has different metrics and wrapping behavior - U+2020 DAGGER
†and U+2021 DOUBLE DAGGER‡— footnote reference marks - U+2022 BULLET
•— the standard bullet for lists - U+2030 PER MILLE SIGN
‰— per thousand (0.1% = 1‰) - U+2031 PER TEN THOUSAND SIGN
‱— basis points in finance - U+2032 PRIME
′and U+2033 DOUBLE PRIME″— feet/minutes and inches/seconds in measurements - U+2038 CARET
‸— proofreading caret - U+203B REFERENCE MARK
※— widely used in East Asian text to mark references or warnings
Invisible Formatting Characters (U+200C–U+200F, U+2028–U+202F, U+2060–U+206F)
This block contains several zero-width and invisible characters that affect text rendering and directionality without adding visible glyphs:
- U+200C ZERO WIDTH NON-JOINER (ZWNJ) — prevents two adjacent characters from forming a ligature or joining form. Essential in Persian, Arabic, and Indic script contexts where the default joining behavior must be overridden.
- U+200D ZERO WIDTH JOINER (ZWJ) — causes adjacent characters to use their joining/ligature forms. The basis of emoji sequences: Man + ZWJ + Laptop = 👨💻
- U+200E LEFT-TO-RIGHT MARK (LRM) and U+200F RIGHT-TO-LEFT MARK (RLM) — invisible characters that influence the Unicode Bidirectional Algorithm. Used to fix text directionality in mixed LTR/RTL contexts.
- U+2028 LINE SEPARATOR and U+2029 PARAGRAPH SEPARATOR — Unicode's own line and paragraph break characters, distinct from CR and LF. JavaScript historically treated these as line terminators inside string literals, causing parse errors in JS embedded in HTML.
- U+2060 WORD JOINER — like a zero-width no-break space; prevents line breaking between two characters without adding any space. The preferred replacement for U+FEFF BYTE ORDER MARK when used as a word-joiner.
- U+2064 INVISIBLE PLUS — used in mathematical notation to indicate an implicit addition (e.g., between the integer and fractional parts of a mixed number) without rendering a visible sign.
Security Considerations
Several characters in this block have been used in security attacks. Zero-width characters can be used to create visually identical strings with different byte sequences — useful for watermarking but dangerous for spoofing. The RLM and LRM characters can cause confusingly bidirectional filenames and URLs. Security-sensitive contexts such as usernames, domain names, and code identifiers should normalize or reject many General Punctuation characters.
Block Explorer의 더 많은 가이드
The Basic Latin block (U+0000–U+007F) is the first Unicode block and covers …
The Latin-1 Supplement block (U+0080–U+00FF) extends ASCII with accented Latin characters for …
The Mathematical Operators block (U+2200–U+22FF) contains 256 symbols covering set theory, logic, …
The Arrows block (U+2190–U+21FF) contains 112 arrow characters including simple directional arrows, …
The Dingbats block (U+2700–U+27BF) was created to encode the Zapf Dingbats typeface …
The Miscellaneous Symbols block (U+2600–U+26FF) is one of Unicode's most eclectic, containing …
The CJK Unified Ideographs block (U+4E00–U+9FFF) is one of the largest Unicode …
The Hangul Syllables block (U+AC00–U+D7A3) contains 11,172 precomposed Korean syllable blocks algorithmically …
Emoji in Unicode span multiple blocks across the Supplementary Multilingual Plane, including …
The Currency Symbols block (U+20A0–U+20CF) contains dedicated Unicode characters for currencies that …
The Box Drawing block (U+2500–U+257F) and Block Elements block (U+2580–U+259F) provide characters …
The Enclosed Alphanumerics block (U+2460–U+24FF) contains circled numbers, parenthesized numbers and letters, …
The Geometric Shapes block (U+25A0–U+25FF) and related blocks contain squares, circles, triangles, …
The Musical Symbols block (U+1D100–U+1D1FF) is a Supplementary Multilingual Plane block containing …