Whitespace Character
Characters that represent horizontal or vertical space but have no visible glyph. Unicode defines 17+ whitespace characters with different widths and line-breaking behaviors.
What is Whitespace?
Whitespace refers to any character that represents horizontal or vertical blank space without producing a visible mark. The term covers a wide family of characters — from the ordinary space (U+0020) you type with the spacebar, to tab, newline, carriage return, and a rich set of Unicode-specific space characters with precise typographic widths.
In text processing, whitespace characters are fundamental delimiters. In typography, different whitespace characters carry specific semantic meanings about the size and nature of the blank space they represent.
The Unicode Whitespace Family
Unicode defines approximately 25 characters with whitespace properties. The most important are:
| Character | Unicode | Name | Width |
|---|---|---|---|
|
U+0020 | Space | Standard word space |
| U+00A0 | No-Break Space | Standard (no line break) | |
|
U+2002 | En Space | 1 en (½ em) |
|
U+2003 | Em Space | 1 em |
|
U+2004 | Three-Per-Em Space | ⅓ em |
|
U+2005 | Four-Per-Em Space | ¼ em |
|
U+2006 | Six-Per-Em Space | ⅙ em |
|
U+2007 | Figure Space | Width of a digit |
|
U+2008 | Punctuation Space | Width of a period |
|
U+2009 | Thin Space | ⅕ em |
|
U+200A | Hair Space | 1/24 em (approx) |
\t |
U+0009 | Character Tabulation (Tab) | Variable (tab stop) |
\n |
U+000A | Line Feed | Vertical |
\r |
U+000D | Carriage Return | Vertical |
|
U+3000 | Ideographic Space | 1 full-width em (CJK) |
Control Characters vs. Space Characters
Whitespace splits into two categories: - Space characters: primarily horizontal, create horizontal gaps - Control characters: include line terminators (LF, CR, CRLF, LS, PS) that break lines and create vertical space
Line separator (U+2028) and paragraph separator (U+2029) are Unicode-specific line terminators that some parsers recognize. JavaScript template literals, for example, treat both as newlines.
Whitespace in Programming
Whitespace handling varies significantly across languages and contexts:
# Python: indentation is syntax
if True:
print("This indent matters")
# Python string whitespace
import re
text = "hello\u2003world" # em space
re.split(r'\s+', text) # ['hello', 'world'] — \s matches em space
text.split(' ') # ['hello\u2003world'] — only splits on U+0020
// JavaScript: \s in regex matches all Unicode whitespace
/\s/.test('\u2003') // true — em space
/\s/.test('\u3000') // true — ideographic space
/* CSS white-space property controls whitespace rendering */
p { white-space: normal; } /* collapse and wrap (default) */
p { white-space: pre; } /* preserve all whitespace */
p { white-space: nowrap; } /* no line wrapping */
p { white-space: pre-wrap; } /* preserve, but allow wrapping */
Ideographic Space in CJK Typography
The ideographic space (U+3000, ) is a full-width space used in Chinese, Japanese, and Korean text. Its width matches a full CJK character (one em in a monospace CJK context). It is visually distinct from a standard space and matters for alignment in vertically set or grid-aligned CJK typography.
Quick Facts
| Property | Value |
|---|---|
| Standard space | U+0020 |
| Non-breaking space | U+00A0 |
| Thin space | U+2009 (⅕ em) |
| Hair space | U+200A (1/24 em) |
| Em space | U+2003 |
| Ideographic space (CJK) | U+3000 |
| Unicode line separator | U+2028 |
| Unicode paragraph separator | U+2029 |
\s in most regex engines |
Matches all Unicode whitespace |
Related Terms
More in Typography
A character that attaches to the preceding base character to modify it. …
CSS @font-face descriptor specifying which Unicode code points a font should cover. …
Punctuation marks used to separate parts of a sentence or indicate ranges. …
A mark added to a letter to change pronunciation or meaning. Can …
U+2026 HORIZONTAL ELLIPSIS (…). A single character replacing three periods, typographically correct …
Em: a width equal to the font size. En: half an em. …
A specific implementation of a typeface at a particular size, weight, and …
The mechanism by which a rendering engine substitutes glyphs from a secondary …
The visual representation of a character as rendered by a font. One …
Adjusting the spacing between specific character pairs for visual harmony (e.g., AV, …