Carácter de espacio en blanco
Caracteres que representan espacio horizontal o vertical pero no tienen glifo visible. Unicode define más de 17 caracteres de espacio en blanco con diferentes anchos y comportamientos de salto de línea.
What is Whitespace?
Whitespace refers to any character that represents horizontal or vertical blank space without producing a visible mark. The term covers a wide family of characters — from the ordinary space (U+0020) you type with the spacebar, to tab, newline, carriage return, and a rich set of Unicode-specific space characters with precise typographic widths.
In text processing, whitespace characters are fundamental delimiters. In typography, different whitespace characters carry specific semantic meanings about the size and nature of the blank space they represent.
The Unicode Whitespace Family
Unicode defines approximately 25 characters with whitespace properties. The most important are:
| Character | Unicode | Name | Width |
|---|---|---|---|
|
U+0020 | Space | Standard word space |
| U+00A0 | No-Break Space | Standard (no line break) | |
|
U+2002 | En Space | 1 en (½ em) |
|
U+2003 | Em Space | 1 em |
|
U+2004 | Three-Per-Em Space | ⅓ em |
|
U+2005 | Four-Per-Em Space | ¼ em |
|
U+2006 | Six-Per-Em Space | ⅙ em |
|
U+2007 | Figure Space | Width of a digit |
|
U+2008 | Punctuation Space | Width of a period |
|
U+2009 | Thin Space | ⅕ em |
|
U+200A | Hair Space | 1/24 em (approx) |
\t |
U+0009 | Character Tabulation (Tab) | Variable (tab stop) |
\n |
U+000A | Line Feed | Vertical |
\r |
U+000D | Carriage Return | Vertical |
|
U+3000 | Ideographic Space | 1 full-width em (CJK) |
Control Characters vs. Space Characters
Whitespace splits into two categories: - Space characters: primarily horizontal, create horizontal gaps - Control characters: include line terminators (LF, CR, CRLF, LS, PS) that break lines and create vertical space
Line separator (U+2028) and paragraph separator (U+2029) are Unicode-specific line terminators that some parsers recognize. JavaScript template literals, for example, treat both as newlines.
Whitespace in Programming
Whitespace handling varies significantly across languages and contexts:
# Python: indentation is syntax
if True:
print("This indent matters")
# Python string whitespace
import re
text = "hello\u2003world" # em space
re.split(r'\s+', text) # ['hello', 'world'] — \s matches em space
text.split(' ') # ['hello\u2003world'] — only splits on U+0020
// JavaScript: \s in regex matches all Unicode whitespace
/\s/.test('\u2003') // true — em space
/\s/.test('\u3000') // true — ideographic space
/* CSS white-space property controls whitespace rendering */
p { white-space: normal; } /* collapse and wrap (default) */
p { white-space: pre; } /* preserve all whitespace */
p { white-space: nowrap; } /* no line wrapping */
p { white-space: pre-wrap; } /* preserve, but allow wrapping */
Ideographic Space in CJK Typography
The ideographic space (U+3000, ) is a full-width space used in Chinese, Japanese, and Korean text. Its width matches a full CJK character (one em in a monospace CJK context). It is visually distinct from a standard space and matters for alignment in vertically set or grid-aligned CJK typography.
Quick Facts
| Property | Value |
|---|---|
| Standard space | U+0020 |
| Non-breaking space | U+00A0 |
| Thin space | U+2009 (⅕ em) |
| Hair space | U+200A (1/24 em) |
| Em space | U+2003 |
| Ideographic space (CJK) | U+3000 |
| Unicode line separator | U+2028 |
| Unicode paragraph separator | U+2029 |
\s in most regex engines |
Matches all Unicode whitespace |
Términos relacionados
Más en Tipografía
Caracteres con anchura de avance cero — invisibles en la renderización pero …
Un carácter que se adjunta al carácter base anterior para modificarlo. Categoría …
Signos de puntuación emparejados que encierran discurso directo o citas. Unicode incluye …
CSS @font-face descriptor specifying which Unicode code points a font should cover. …
Em: una anchura igual al tamaño de la fuente. En: la mitad …
U+00A0. Un espacio que impide el salto de línea en su posición. …
The mechanism by which a rendering engine substitutes glyphs from a secondary …
Una implementación específica de un tipo de letra en un tamaño, peso …
La representación visual de un carácter tal como lo renderiza una fuente …
Signos de puntuación utilizados para separar partes de una oración o indicar …