タイポグラフィ

ゼロ幅文字

前進幅がゼロの文字 — レンダリングでは見えませんがテキスト動作に影響します。ZWSP(単語区切り)・ZWJ(結合)・ZWNJ(結合防止)・WJ(改行防止)などがあります。

· Updated

What is a Zero-Width Character?

A zero-width character is a Unicode character that occupies no visual space when rendered — it is invisible, produces no glyph, and has zero advance width. Despite being invisible, zero-width characters have important semantic functions: they control how text is broken, joined, or displayed. They are essential for correct rendering of Arabic, Hebrew, Indic scripts, and for controlling ligature formation across all scripts.

The three most important zero-width characters are the Zero-Width Joiner (ZWJ), Zero-Width Non-Joiner (ZWNJ), and Zero-Width Space (ZWSP).

The Main Zero-Width Characters

Character Unicode Name Purpose
ZWJ U+200D Zero Width Joiner Forces joining/ligature between adjacent characters
ZWNJ U+200C Zero Width Non-Joiner Prevents joining/ligature between adjacent characters
ZWSP U+200B Zero Width Space Allows line break without visible space
WJ U+2060 Word Joiner Prevents line break (like NBSP but zero-width)
SHY U+00AD Soft Hyphen Invisible hyphenation hint; shows hyphen only if line breaks there
BOM U+FEFF Byte Order Mark / ZWNBSP File encoding marker; zero-width non-break space in older usage

Zero Width Joiner (ZWJ)

The ZWJ (U+200D) tells the text shaping engine to use a joined or ligature form between the surrounding characters, even when they would not normally join.

The most famous modern use is emoji sequences. Many complex emoji are encoded as sequences of simpler emoji joined by ZWJ:

Sequence Result
👨 + ZWJ + 💻 👨‍💻 (man technologist)
👩 + ZWJ + ❤️ + ZWJ + 👩 👩‍❤️‍👩 (couple with heart)
🏳️ + ZWJ + 🌈 🏳️‍🌈 (rainbow flag)

In Arabic, ZWJ can force a letter into its final-form shape even at mid-word, and in Indic scripts it controls how consonant clusters are rendered.

Zero Width Non-Joiner (ZWNJ)

The ZWNJ (U+200C) has the opposite effect: it breaks joining that would otherwise occur. In Arabic and Persian script, letters normally join to form cursive words — but ZWNJ between two letters prevents them from connecting, showing each in its isolated form. In Indic scripts, ZWNJ prevents conjunct consonant formation.

Example: In Persian, the word "می‌روم" (I go) uses a ZWNJ after "می" to keep it visually separate from "روم" while still being one word (no space, but no joining).

Zero Width Space (ZWSP)

The ZWSP (U+200B) is invisible and zero-width, but it marks a position where a line break is permitted. It is used in scripts that don't use spaces to separate words — such as Thai, Lao, Khmer, and Tibetan — to give the text renderer a hint about where to break long lines.

It is also used in URLs and long technical strings in HTML to allow wrapping without adding a visible space.

Security Concerns

Zero-width characters are invisible, which makes them exploitable: - Text spoofing: inserting ZWJ/ZWNJ into usernames to create visually identical but technically different strings - Hidden watermarks: embedding patterns of zero-width characters as steganographic markers - Homograph attacks: combined with lookalike characters

Sanitize user input by stripping unexpected zero-width characters from identifiers and URLs.

Quick Facts

Property Value
Zero Width Joiner U+200D — forces joining/ligature
Zero Width Non-Joiner U+200C — prevents joining
Zero Width Space U+200B — invisible line-break opportunity
Word Joiner U+2060 — prevents line break (zero-width)
Soft Hyphen U+00AD — visible only when line breaks there
ZWJ in emoji Used in 1,000+ multi-person and multi-component emoji
Security risk Can create invisible text or spoofed identifiers
Detection in Python '\u200d' in text or regex [\u200b-\u200d\ufeff]

関連用語

タイポグラフィ のその他の用語

CSS unicode-range

CSS @font-face descriptor specifying which Unicode code points a font should cover. …

Em / En(タイポグラフィ単位)

Em:フォントサイズと等しい幅。En:Emの半分。エムダッシュ幅・エムスペース・エンスペース・CSSユニット(1em・0.5em)の定義に使われます。

Font Fallback

The mechanism by which a rendering engine substitutes glyphs from a secondary …

OpenType

Modern font format developed by Microsoft and Adobe supporting up to 65,535 …

RTL(右から左)

文字が右から左に流れるテキスト方向。アラビア語・ヘブライ語・ターナ文字などで使われ、正しい表示のために双方向アルゴリズムが必要です。

Web Fonts

Fonts downloaded by the browser to render text, declared via CSS @font-face. …

カーニング

視覚的な調和のために特定の文字ペア(例:AV・To・LT)間のスペーシングを調整すること。Unicodeの概念ではなくフォント機能ですが、Unicodeテキストのレンダリングに影響します。

グリフ

フォントによってレンダリングされる文字の視覚的表現。1つの文字が複数のグリフを持つ場合があり(合字・文脈形態)、1つのグリフが複数の文字を表す場合もあります。

スモールキャップス

小文字の高さの大文字字形。CSS:font-variant: small-caps。Unicodeにはラテン拡張(ᴀ〜ᴢ)に実際のスモールキャップス文字があります。

ダッシュ

文の一部を区切ったり範囲を示したりする句読記号。Unicodeはハイフン(‐)・エンダッシュ(–)・エムダッシュ(—)・図表ダッシュ(‒)などを定義しています。