セキュリティ

ゼロ幅接合子 (ZWJ)

U+200D。隣接する文字の結合を要求します。絵文字シーケンスに不可欠です(👩+ZWJ+💻=👩‍💻)。インド系文字では合字形成を要求します。テキスト境界を隠すためにも使われます。

· Updated

What is ZWJ (Zero Width Joiner)?

ZWJ stands for Zero Width Joiner, encoded at U+200D. It is an invisible Unicode character with no visual representation of its own. Its purpose is to join adjacent characters in a way that signals to rendering software: "treat these as a combined unit." ZWJ has zero width — it takes up no space in the rendered output — but it influences how surrounding characters are shaped, ligated, or combined into a single graphical form.

ZWJ is used in two distinct but related contexts: script ligature control in complex scripts, and emoji sequence formation in modern Unicode.

ZWJ in Script Ligatures

In scripts like Arabic, Devanagari, and Sinhala, characters change shape depending on their position in a word and their neighboring characters. ZWJ instructs the rendering engine to use the "joining" or "connected" form of a character even when the natural context would produce a non-joining form.

For example, in Arabic, the letter ـه (ha in final form) would normally appear in isolated form at the end of a word. Inserting a ZWJ before it signals that it should render in its connected form, as if it were still attached to the next character in a ligature.

ZWJ in Emoji Sequences

In modern Unicode, ZWJ is best known as the mechanism for creating emoji ZWJ sequences — composite emoji formed by joining multiple base emoji with ZWJ characters between them. When a rendering platform supports the sequence, it displays a single combined image. When it does not, it falls back to displaying the individual emoji separately.

Well-known ZWJ sequences include:

  • Family emoji👨‍👩‍👧‍👦 is encoded as: MAN + ZWJ + WOMAN + ZWJ + GIRL + ZWJ + BOY (U+1F468 U+200D U+1F469 U+200D U+1F467 U+200D U+1F466)
  • Profession emoji👨‍💻 is MAN + ZWJ + LAPTOP (U+1F468 U+200D U+1F4BB)
  • Gendered roles👮‍♀️ is POLICE OFFICER + ZWJ + FEMALE SIGN + VARIATION SELECTOR
  • Rainbow flag🏳️‍🌈 is WHITE FLAG + VARIATION SELECTOR + ZWJ + RAINBOW

The number of possible ZWJ sequences grows with each Unicode release as new combinations are approved by the Unicode Emoji Subcommittee.

ZWJ and Grapheme Clusters

Unicode defines the concept of a grapheme cluster — the user-perceived unit of text, what a user thinks of as "one character." ZWJ sequences form an extended grapheme cluster that counts as a single unit for cursor movement, text selection, and deletion. Pressing backspace on a ZWJ sequence like 👨‍💻 deletes the entire combined emoji, not just the final code point.

This has implications for string processing in programming languages. In Python, len("👨‍💻") returns 3 (three code points: man, ZWJ, laptop), but the user-visible length is 1. Proper grapheme-cluster-aware libraries are needed for accurate text metrics.

ZWJ as an Invisible Character (Security Note)

Because ZWJ is invisible and zero-width, it can be used to insert hidden content into text strings. Two strings that look identical to a human reader may have different ZWJ placements and therefore differ when compared byte-for-byte. This has been used in digital watermarking (to tag documents with invisible identifiers) and, maliciously, to bypass string-matching filters.

Quick Facts

Property Value
Code point U+200D
Name ZERO WIDTH JOINER
Unicode category Cf (Format character)
Visual width Zero — completely invisible
Primary modern use Emoji ZWJ sequences
Script use Arabic, Devanagari, Sinhala ligature control
Grapheme cluster ZWJ sequences form a single extended grapheme cluster
Introduced Unicode 1.1 (1993)

関連用語

セキュリティ のその他の用語

Bidi Text Attack

Exploiting Unicode bidirectional control characters to disguise malicious code or filenames. The …

Bidi オーバーライド攻撃

Unicode双方向オーバーライド文字(U+202A〜U+202E・U+2066〜U+2069)を使って悪意のあるファイル名やコードを偽装する攻撃。'readme‮fdp.exe'は'readmeexe.pdf'と表示されます。

IDN ホモグラフ攻撃

ドメイン名に視覚的に似たUnicode文字を使って正規サイトになりすます攻撃。аpple.com(キリルа)はapple.comに見えます。ブラウザはPunycodeの表示ルールで防御します。

Normalization Attack

Exploiting Unicode normalization to bypass security filters. Input validated before normalization may …

Unicode スプーフィング

Unicode機能を使ってユーザーを欺くこと:偽ドメインのためのホモグリフ・偽ファイル拡張子のためのBidiオーバーライド・隠しテキストのための不可視文字。

ゼロ幅非接合子 (ZWNJ)

U+200C。隣接する文字の結合を防ぎます。ペルシャ語/アラビア語で正しい文字形態のために必須で、デーヴァナーガリーで合字を防ぐためにも使われます。

ホモグリフ

異なるスクリプトから来た同一または非常に似て見える文字。例:ラテン'a'とキリル'а'。フィッシング・スプーフィング・ソーシャルエンジニアリング攻撃に使われます。

混同しやすい文字

confusables.txt(UCD)で定義された、視覚的に混同しやすい文字ペアに対するUnicodeの公式用語。ホモグリフより広い概念で、単に似ているだけの文字も含みます。

混在スクリプト検出

異なるスクリプトの文字を混在させるテキストを識別します(例:ラテン+キリル)。ホモグリフ攻撃に対する主要な防御で、ブラウザはこれを使ってPunycode表示をトリガーします。