タイポグラフィ

空白文字

水平または垂直の空間を表すが目に見えるグリフを持たない文字。Unicodeは異なる幅と改行動作を持つ17種類以上の空白文字を定義しています。

· Updated

What is Whitespace?

Whitespace refers to any character that represents horizontal or vertical blank space without producing a visible mark. The term covers a wide family of characters — from the ordinary space (U+0020) you type with the spacebar, to tab, newline, carriage return, and a rich set of Unicode-specific space characters with precise typographic widths.

In text processing, whitespace characters are fundamental delimiters. In typography, different whitespace characters carry specific semantic meanings about the size and nature of the blank space they represent.

The Unicode Whitespace Family

Unicode defines approximately 25 characters with whitespace properties. The most important are:

Character Unicode Name Width
U+0020 Space Standard word space
U+00A0 No-Break Space Standard (no line break)
U+2002 En Space 1 en (½ em)
U+2003 Em Space 1 em
U+2004 Three-Per-Em Space ⅓ em
U+2005 Four-Per-Em Space ¼ em
U+2006 Six-Per-Em Space ⅙ em
U+2007 Figure Space Width of a digit
U+2008 Punctuation Space Width of a period
U+2009 Thin Space ⅕ em
U+200A Hair Space 1/24 em (approx)
\t U+0009 Character Tabulation (Tab) Variable (tab stop)
\n U+000A Line Feed Vertical
\r U+000D Carriage Return Vertical
U+3000 Ideographic Space 1 full-width em (CJK)

Control Characters vs. Space Characters

Whitespace splits into two categories: - Space characters: primarily horizontal, create horizontal gaps - Control characters: include line terminators (LF, CR, CRLF, LS, PS) that break lines and create vertical space

Line separator (U+2028) and paragraph separator (U+2029) are Unicode-specific line terminators that some parsers recognize. JavaScript template literals, for example, treat both as newlines.

Whitespace in Programming

Whitespace handling varies significantly across languages and contexts:

# Python: indentation is syntax
if True:
    print("This indent matters")

# Python string whitespace
import re
text = "hello\u2003world"          # em space
re.split(r'\s+', text)             # ['hello', 'world'] — \s matches em space
text.split(' ')                    # ['hello\u2003world'] — only splits on U+0020
// JavaScript: \s in regex matches all Unicode whitespace
/\s/.test('\u2003')   // true — em space
/\s/.test('\u3000')   // true — ideographic space
/* CSS white-space property controls whitespace rendering */
p { white-space: normal; }    /* collapse and wrap (default) */
p { white-space: pre; }       /* preserve all whitespace */
p { white-space: nowrap; }    /* no line wrapping */
p { white-space: pre-wrap; }  /* preserve, but allow wrapping */

Ideographic Space in CJK Typography

The ideographic space (U+3000, ) is a full-width space used in Chinese, Japanese, and Korean text. Its width matches a full CJK character (one em in a monospace CJK context). It is visually distinct from a standard space and matters for alignment in vertically set or grid-aligned CJK typography.

Quick Facts

Property Value
Standard space U+0020
Non-breaking space U+00A0
Thin space U+2009 (⅕ em)
Hair space U+200A (1/24 em)
Em space U+2003
Ideographic space (CJK) U+3000
Unicode line separator U+2028
Unicode paragraph separator U+2029
\s in most regex engines Matches all Unicode whitespace

関連用語

タイポグラフィ のその他の用語

CSS unicode-range

CSS @font-face descriptor specifying which Unicode code points a font should cover. …

Em / En(タイポグラフィ単位)

Em:フォントサイズと等しい幅。En:Emの半分。エムダッシュ幅・エムスペース・エンスペース・CSSユニット(1em・0.5em)の定義に使われます。

Font Fallback

The mechanism by which a rendering engine substitutes glyphs from a secondary …

OpenType

Modern font format developed by Microsoft and Adobe supporting up to 65,535 …

RTL(右から左)

文字が右から左に流れるテキスト方向。アラビア語・ヘブライ語・ターナ文字などで使われ、正しい表示のために双方向アルゴリズムが必要です。

Web Fonts

Fonts downloaded by the browser to render text, declared via CSS @font-face. …

カーニング

視覚的な調和のために特定の文字ペア(例:AV・To・LT)間のスペーシングを調整すること。Unicodeの概念ではなくフォント機能ですが、Unicodeテキストのレンダリングに影響します。

グリフ

フォントによってレンダリングされる文字の視覚的表現。1つの文字が複数のグリフを持つ場合があり(合字・文脈形態)、1つのグリフが複数の文字を表す場合もあります。

スモールキャップス

小文字の高さの大文字字形。CSS:font-variant: small-caps。Unicodeにはラテン拡張(ᴀ〜ᴢ)に実際のスモールキャップス文字があります。

ゼロ幅文字

前進幅がゼロの文字 — レンダリングでは見えませんがテキスト動作に影響します。ZWSP(単語区切り)・ZWJ(結合)・ZWNJ(結合防止)・WJ(改行防止)などがあります。