공백 문자
가로 또는 세로 공간을 표현하지만 눈에 보이는 글리프가 없는 문자. 유니코드는 서로 다른 너비와 줄 바꿈 동작을 가진 17개 이상의 공백 문자를 정의합니다.
What is Whitespace?
Whitespace refers to any character that represents horizontal or vertical blank space without producing a visible mark. The term covers a wide family of characters — from the ordinary space (U+0020) you type with the spacebar, to tab, newline, carriage return, and a rich set of Unicode-specific space characters with precise typographic widths.
In text processing, whitespace characters are fundamental delimiters. In typography, different whitespace characters carry specific semantic meanings about the size and nature of the blank space they represent.
The Unicode Whitespace Family
Unicode defines approximately 25 characters with whitespace properties. The most important are:
| Character | Unicode | Name | Width |
|---|---|---|---|
|
U+0020 | Space | Standard word space |
| U+00A0 | No-Break Space | Standard (no line break) | |
|
U+2002 | En Space | 1 en (½ em) |
|
U+2003 | Em Space | 1 em |
|
U+2004 | Three-Per-Em Space | ⅓ em |
|
U+2005 | Four-Per-Em Space | ¼ em |
|
U+2006 | Six-Per-Em Space | ⅙ em |
|
U+2007 | Figure Space | Width of a digit |
|
U+2008 | Punctuation Space | Width of a period |
|
U+2009 | Thin Space | ⅕ em |
|
U+200A | Hair Space | 1/24 em (approx) |
\t |
U+0009 | Character Tabulation (Tab) | Variable (tab stop) |
\n |
U+000A | Line Feed | Vertical |
\r |
U+000D | Carriage Return | Vertical |
|
U+3000 | Ideographic Space | 1 full-width em (CJK) |
Control Characters vs. Space Characters
Whitespace splits into two categories: - Space characters: primarily horizontal, create horizontal gaps - Control characters: include line terminators (LF, CR, CRLF, LS, PS) that break lines and create vertical space
Line separator (U+2028) and paragraph separator (U+2029) are Unicode-specific line terminators that some parsers recognize. JavaScript template literals, for example, treat both as newlines.
Whitespace in Programming
Whitespace handling varies significantly across languages and contexts:
# Python: indentation is syntax
if True:
print("This indent matters")
# Python string whitespace
import re
text = "hello\u2003world" # em space
re.split(r'\s+', text) # ['hello', 'world'] — \s matches em space
text.split(' ') # ['hello\u2003world'] — only splits on U+0020
// JavaScript: \s in regex matches all Unicode whitespace
/\s/.test('\u2003') // true — em space
/\s/.test('\u3000') // true — ideographic space
/* CSS white-space property controls whitespace rendering */
p { white-space: normal; } /* collapse and wrap (default) */
p { white-space: pre; } /* preserve all whitespace */
p { white-space: nowrap; } /* no line wrapping */
p { white-space: pre-wrap; } /* preserve, but allow wrapping */
Ideographic Space in CJK Typography
The ideographic space (U+3000, ) is a full-width space used in Chinese, Japanese, and Korean text. Its width matches a full CJK character (one em in a monospace CJK context). It is visually distinct from a standard space and matters for alignment in vertically set or grid-aligned CJK typography.
Quick Facts
| Property | Value |
|---|---|
| Standard space | U+0020 |
| Non-breaking space | U+00A0 |
| Thin space | U+2009 (⅕ em) |
| Hair space | U+200A (1/24 em) |
| Em space | U+2003 |
| Ideographic space (CJK) | U+3000 |
| Unicode line separator | U+2028 |
| Unicode paragraph separator | U+2029 |
\s in most regex engines |
Matches all Unicode whitespace |
관련 용어
타이포그래피의 더 많은 용어
CSS @font-face descriptor specifying which Unicode code points a font should cover. …
Em: 폰트 크기와 같은 너비. En: Em의 절반. 엠 대시 너비, 엠 …
The mechanism by which a rendering engine substitutes glyphs from a secondary …
Modern font format developed by Microsoft and Adobe supporting up to 65,535 …
문자가 오른쪽에서 왼쪽으로 흐르는 텍스트 방향. 아랍어, 히브리어, 타아나 문자 등에서 사용되며, …
Fonts downloaded by the browser to render text, declared via CSS @font-face. …
앞의 기본 문자에 붙어 수정하는 문자. 일반 범주: Mn(비공백), Mc(공백 결합), Me(둘러싸기). …
폰트가 렌더링하는 문자의 시각적 표현. 하나의 문자가 여러 글리프를 가질 수 있고(합자, …
전진 너비가 0인 문자 — 렌더링에서 보이지 않지만 텍스트 동작에 영향을 줍니다. …
문장의 일부를 구분하거나 범위를 나타내는 데 사용되는 구두점. 유니코드는 하이픈(‐), 엔 대시(–), …