Typography

Whitespace Character

Characters that represent horizontal or vertical space but have no visible glyph. Unicode defines 17+ whitespace characters with different widths and line-breaking behaviors.

· Updated

What is Whitespace?

Whitespace refers to any character that represents horizontal or vertical blank space without producing a visible mark. The term covers a wide family of characters — from the ordinary space (U+0020) you type with the spacebar, to tab, newline, carriage return, and a rich set of Unicode-specific space characters with precise typographic widths.

In text processing, whitespace characters are fundamental delimiters. In typography, different whitespace characters carry specific semantic meanings about the size and nature of the blank space they represent.

The Unicode Whitespace Family

Unicode defines approximately 25 characters with whitespace properties. The most important are:

Character Unicode Name Width
U+0020 Space Standard word space
U+00A0 No-Break Space Standard (no line break)
U+2002 En Space 1 en (½ em)
U+2003 Em Space 1 em
U+2004 Three-Per-Em Space ⅓ em
U+2005 Four-Per-Em Space ¼ em
U+2006 Six-Per-Em Space ⅙ em
U+2007 Figure Space Width of a digit
U+2008 Punctuation Space Width of a period
U+2009 Thin Space ⅕ em
U+200A Hair Space 1/24 em (approx)
\t U+0009 Character Tabulation (Tab) Variable (tab stop)
\n U+000A Line Feed Vertical
\r U+000D Carriage Return Vertical
U+3000 Ideographic Space 1 full-width em (CJK)

Control Characters vs. Space Characters

Whitespace splits into two categories: - Space characters: primarily horizontal, create horizontal gaps - Control characters: include line terminators (LF, CR, CRLF, LS, PS) that break lines and create vertical space

Line separator (U+2028) and paragraph separator (U+2029) are Unicode-specific line terminators that some parsers recognize. JavaScript template literals, for example, treat both as newlines.

Whitespace in Programming

Whitespace handling varies significantly across languages and contexts:

# Python: indentation is syntax
if True:
    print("This indent matters")

# Python string whitespace
import re
text = "hello\u2003world"          # em space
re.split(r'\s+', text)             # ['hello', 'world'] — \s matches em space
text.split(' ')                    # ['hello\u2003world'] — only splits on U+0020
// JavaScript: \s in regex matches all Unicode whitespace
/\s/.test('\u2003')   // true — em space
/\s/.test('\u3000')   // true — ideographic space
/* CSS white-space property controls whitespace rendering */
p { white-space: normal; }    /* collapse and wrap (default) */
p { white-space: pre; }       /* preserve all whitespace */
p { white-space: nowrap; }    /* no line wrapping */
p { white-space: pre-wrap; }  /* preserve, but allow wrapping */

Ideographic Space in CJK Typography

The ideographic space (U+3000, ) is a full-width space used in Chinese, Japanese, and Korean text. Its width matches a full CJK character (one em in a monospace CJK context). It is visually distinct from a standard space and matters for alignment in vertically set or grid-aligned CJK typography.

Quick Facts

Property Value
Standard space U+0020
Non-breaking space U+00A0
Thin space U+2009 (⅕ em)
Hair space U+200A (1/24 em)
Em space U+2003
Ideographic space (CJK) U+3000
Unicode line separator U+2028
Unicode paragraph separator U+2029
\s in most regex engines Matches all Unicode whitespace

Related Terms

More in Typography