너비 없는 결합자 (ZWJ)
U+200D. 인접 문자의 결합을 요청합니다. 이모지 시퀀스에 필수적입니다(👩+ZWJ+💻=👩💻). 인도 문자에서는 합자 형성을 요청합니다. 텍스트 경계를 숨기는 데도 사용될 수 있습니다.
What is ZWJ (Zero Width Joiner)?
ZWJ stands for Zero Width Joiner, encoded at U+200D. It is an invisible Unicode character with no visual representation of its own. Its purpose is to join adjacent characters in a way that signals to rendering software: "treat these as a combined unit." ZWJ has zero width — it takes up no space in the rendered output — but it influences how surrounding characters are shaped, ligated, or combined into a single graphical form.
ZWJ is used in two distinct but related contexts: script ligature control in complex scripts, and emoji sequence formation in modern Unicode.
ZWJ in Script Ligatures
In scripts like Arabic, Devanagari, and Sinhala, characters change shape depending on their position in a word and their neighboring characters. ZWJ instructs the rendering engine to use the "joining" or "connected" form of a character even when the natural context would produce a non-joining form.
For example, in Arabic, the letter ـه (ha in final form) would normally appear in isolated form at the end of a word. Inserting a ZWJ before it signals that it should render in its connected form, as if it were still attached to the next character in a ligature.
ZWJ in Emoji Sequences
In modern Unicode, ZWJ is best known as the mechanism for creating emoji ZWJ sequences — composite emoji formed by joining multiple base emoji with ZWJ characters between them. When a rendering platform supports the sequence, it displays a single combined image. When it does not, it falls back to displaying the individual emoji separately.
Well-known ZWJ sequences include:
- Family emoji —
👨👩👧👦is encoded as: MAN + ZWJ + WOMAN + ZWJ + GIRL + ZWJ + BOY (U+1F468 U+200D U+1F469 U+200D U+1F467 U+200D U+1F466) - Profession emoji —
👨💻is MAN + ZWJ + LAPTOP (U+1F468 U+200D U+1F4BB) - Gendered roles —
👮♀️is POLICE OFFICER + ZWJ + FEMALE SIGN + VARIATION SELECTOR - Rainbow flag —
🏳️🌈is WHITE FLAG + VARIATION SELECTOR + ZWJ + RAINBOW
The number of possible ZWJ sequences grows with each Unicode release as new combinations are approved by the Unicode Emoji Subcommittee.
ZWJ and Grapheme Clusters
Unicode defines the concept of a grapheme cluster — the user-perceived unit of text, what a user thinks of as "one character." ZWJ sequences form an extended grapheme cluster that counts as a single unit for cursor movement, text selection, and deletion. Pressing backspace on a ZWJ sequence like 👨💻 deletes the entire combined emoji, not just the final code point.
This has implications for string processing in programming languages. In Python, len("👨💻") returns 3 (three code points: man, ZWJ, laptop), but the user-visible length is 1. Proper grapheme-cluster-aware libraries are needed for accurate text metrics.
ZWJ as an Invisible Character (Security Note)
Because ZWJ is invisible and zero-width, it can be used to insert hidden content into text strings. Two strings that look identical to a human reader may have different ZWJ placements and therefore differ when compared byte-for-byte. This has been used in digital watermarking (to tag documents with invisible identifiers) and, maliciously, to bypass string-matching filters.
Quick Facts
| Property | Value |
|---|---|
| Code point | U+200D |
| Name | ZERO WIDTH JOINER |
| Unicode category | Cf (Format character) |
| Visual width | Zero — completely invisible |
| Primary modern use | Emoji ZWJ sequences |
| Script use | Arabic, Devanagari, Sinhala ligature control |
| Grapheme cluster | ZWJ sequences form a single extended grapheme cluster |
| Introduced | Unicode 1.1 (1993) |
관련 용어
보안의 더 많은 용어
Exploiting Unicode bidirectional control characters to disguise malicious code or filenames. The …
도메인 이름에 시각적으로 유사한 유니코드 문자를 사용하여 합법적인 사이트를 사칭하는 공격. аpple.com(키릴 …
Exploiting Unicode normalization to bypass security filters. Input validated before normalization may …
U+200C. 인접 문자의 결합을 방지합니다. 페르시아어/아랍어에서 올바른 글자 형태를 위해 필수적이며, 데바나가리에서 …
서로 다른 문자 체계에서 동일하거나 매우 유사하게 보이는 문자. 예: 라틴 'a'와 …
유니코드 양방향 재정의 문자(U+202A~U+202E, U+2066~U+2069)를 사용하여 악성 파일 이름이나 코드를 위장하는 공격. …
유니코드 기능을 사용하여 사용자를 속이는 것: 가짜 도메인을 위한 동형이자, 가짜 파일 …
confusables.txt(UCD)에 정의된 시각적으로 혼동될 수 있는 문자 쌍에 대한 유니코드 공식 용어. …
서로 다른 문자 체계의 문자를 혼합하는 텍스트를 식별합니다(예: 라틴 + 키릴). 동형이자 …