Пробельный символ
Символы, представляющие горизонтальные или вертикальные пробелы, но не имеющие видимого глифа. Unicode определяет более 17 пробельных символов с различной шириной и поведением при переносе строк.
What is Whitespace?
Whitespace refers to any character that represents horizontal or vertical blank space without producing a visible mark. The term covers a wide family of characters — from the ordinary space (U+0020) you type with the spacebar, to tab, newline, carriage return, and a rich set of Unicode-specific space characters with precise typographic widths.
In text processing, whitespace characters are fundamental delimiters. In typography, different whitespace characters carry specific semantic meanings about the size and nature of the blank space they represent.
The Unicode Whitespace Family
Unicode defines approximately 25 characters with whitespace properties. The most important are:
| Character | Unicode | Name | Width |
|---|---|---|---|
|
U+0020 | Space | Standard word space |
| U+00A0 | No-Break Space | Standard (no line break) | |
|
U+2002 | En Space | 1 en (½ em) |
|
U+2003 | Em Space | 1 em |
|
U+2004 | Three-Per-Em Space | ⅓ em |
|
U+2005 | Four-Per-Em Space | ¼ em |
|
U+2006 | Six-Per-Em Space | ⅙ em |
|
U+2007 | Figure Space | Width of a digit |
|
U+2008 | Punctuation Space | Width of a period |
|
U+2009 | Thin Space | ⅕ em |
|
U+200A | Hair Space | 1/24 em (approx) |
\t |
U+0009 | Character Tabulation (Tab) | Variable (tab stop) |
\n |
U+000A | Line Feed | Vertical |
\r |
U+000D | Carriage Return | Vertical |
|
U+3000 | Ideographic Space | 1 full-width em (CJK) |
Control Characters vs. Space Characters
Whitespace splits into two categories: - Space characters: primarily horizontal, create horizontal gaps - Control characters: include line terminators (LF, CR, CRLF, LS, PS) that break lines and create vertical space
Line separator (U+2028) and paragraph separator (U+2029) are Unicode-specific line terminators that some parsers recognize. JavaScript template literals, for example, treat both as newlines.
Whitespace in Programming
Whitespace handling varies significantly across languages and contexts:
# Python: indentation is syntax
if True:
print("This indent matters")
# Python string whitespace
import re
text = "hello\u2003world" # em space
re.split(r'\s+', text) # ['hello', 'world'] — \s matches em space
text.split(' ') # ['hello\u2003world'] — only splits on U+0020
// JavaScript: \s in regex matches all Unicode whitespace
/\s/.test('\u2003') // true — em space
/\s/.test('\u3000') // true — ideographic space
/* CSS white-space property controls whitespace rendering */
p { white-space: normal; } /* collapse and wrap (default) */
p { white-space: pre; } /* preserve all whitespace */
p { white-space: nowrap; } /* no line wrapping */
p { white-space: pre-wrap; } /* preserve, but allow wrapping */
Ideographic Space in CJK Typography
The ideographic space (U+3000, ) is a full-width space used in Chinese, Japanese, and Korean text. Its width matches a full CJK character (one em in a monospace CJK context). It is visually distinct from a standard space and matters for alignment in vertically set or grid-aligned CJK typography.
Quick Facts
| Property | Value |
|---|---|
| Standard space | U+0020 |
| Non-breaking space | U+00A0 |
| Thin space | U+2009 (⅕ em) |
| Hair space | U+200A (1/24 em) |
| Em space | U+2003 |
| Ideographic space (CJK) | U+3000 |
| Unicode line separator | U+2028 |
| Unicode paragraph separator | U+2029 |
\s in most regex engines |
Matches all Unicode whitespace |
Связанные термины
Ещё в Типографика
CSS @font-face descriptor specifying which Unicode code points a font should cover. …
Em: ширина, равная кеглю шрифта. En: половина em. Используются для определения ширины …
The mechanism by which a rendering engine substitutes glyphs from a secondary …
Modern font format developed by Microsoft and Adobe supporting up to 65,535 …
Направление текста, при котором символы располагаются справа налево. Используется арабским, еврейским, тана …
Fonts downloaded by the browser to render text, declared via CSS @font-face. …
Визуальное представление символа, отображаемое шрифтом. Один символ может иметь несколько глифов (лигатуры, …
Знак, добавляемый к букве для изменения произношения или значения. Может быть предкомпонованным …
Парные знаки пунктуации, обрамляющие прямую речь или цитаты. Unicode включает прямые (""), …
Настройка межсимвольного расстояния для конкретных пар символов для визуальной гармонии (например, AV, …