Типографика

Пробельный символ

Символы, представляющие горизонтальные или вертикальные пробелы, но не имеющие видимого глифа. Unicode определяет более 17 пробельных символов с различной шириной и поведением при переносе строк.

· Updated

What is Whitespace?

Whitespace refers to any character that represents horizontal or vertical blank space without producing a visible mark. The term covers a wide family of characters — from the ordinary space (U+0020) you type with the spacebar, to tab, newline, carriage return, and a rich set of Unicode-specific space characters with precise typographic widths.

In text processing, whitespace characters are fundamental delimiters. In typography, different whitespace characters carry specific semantic meanings about the size and nature of the blank space they represent.

The Unicode Whitespace Family

Unicode defines approximately 25 characters with whitespace properties. The most important are:

Character Unicode Name Width
U+0020 Space Standard word space
U+00A0 No-Break Space Standard (no line break)
U+2002 En Space 1 en (½ em)
U+2003 Em Space 1 em
U+2004 Three-Per-Em Space ⅓ em
U+2005 Four-Per-Em Space ¼ em
U+2006 Six-Per-Em Space ⅙ em
U+2007 Figure Space Width of a digit
U+2008 Punctuation Space Width of a period
U+2009 Thin Space ⅕ em
U+200A Hair Space 1/24 em (approx)
\t U+0009 Character Tabulation (Tab) Variable (tab stop)
\n U+000A Line Feed Vertical
\r U+000D Carriage Return Vertical
U+3000 Ideographic Space 1 full-width em (CJK)

Control Characters vs. Space Characters

Whitespace splits into two categories: - Space characters: primarily horizontal, create horizontal gaps - Control characters: include line terminators (LF, CR, CRLF, LS, PS) that break lines and create vertical space

Line separator (U+2028) and paragraph separator (U+2029) are Unicode-specific line terminators that some parsers recognize. JavaScript template literals, for example, treat both as newlines.

Whitespace in Programming

Whitespace handling varies significantly across languages and contexts:

# Python: indentation is syntax
if True:
    print("This indent matters")

# Python string whitespace
import re
text = "hello\u2003world"          # em space
re.split(r'\s+', text)             # ['hello', 'world'] — \s matches em space
text.split(' ')                    # ['hello\u2003world'] — only splits on U+0020
// JavaScript: \s in regex matches all Unicode whitespace
/\s/.test('\u2003')   // true — em space
/\s/.test('\u3000')   // true — ideographic space
/* CSS white-space property controls whitespace rendering */
p { white-space: normal; }    /* collapse and wrap (default) */
p { white-space: pre; }       /* preserve all whitespace */
p { white-space: nowrap; }    /* no line wrapping */
p { white-space: pre-wrap; }  /* preserve, but allow wrapping */

Ideographic Space in CJK Typography

The ideographic space (U+3000, ) is a full-width space used in Chinese, Japanese, and Korean text. Its width matches a full CJK character (one em in a monospace CJK context). It is visually distinct from a standard space and matters for alignment in vertically set or grid-aligned CJK typography.

Quick Facts

Property Value
Standard space U+0020
Non-breaking space U+00A0
Thin space U+2009 (⅕ em)
Hair space U+200A (1/24 em)
Em space U+2003
Ideographic space (CJK) U+3000
Unicode line separator U+2028
Unicode paragraph separator U+2029
\s in most regex engines Matches all Unicode whitespace

Связанные термины

Ещё в Типографика

CSS unicode-range

CSS @font-face descriptor specifying which Unicode code points a font should cover. …

Em / En (Типографские единицы)

Em: ширина, равная кеглю шрифта. En: половина em. Используются для определения ширины …

Font Fallback

The mechanism by which a rendering engine substitutes glyphs from a secondary …

OpenType

Modern font format developed by Microsoft and Adobe supporting up to 65,535 …

RTL (Right-to-Left)

Направление текста, при котором символы располагаются справа налево. Используется арабским, еврейским, тана …

Web Fonts

Fonts downloaded by the browser to render text, declared via CSS @font-face. …

Глиф

Визуальное представление символа, отображаемое шрифтом. Один символ может иметь несколько глифов (лигатуры, …

Диакритический знак

Знак, добавляемый к букве для изменения произношения или значения. Может быть предкомпонованным …

Кавычки

Парные знаки пунктуации, обрамляющие прямую речь или цитаты. Unicode включает прямые (""), …

Кернинг

Настройка межсимвольного расстояния для конкретных пар символов для визуальной гармонии (например, AV, …