安全

零宽非连接符 (ZWNJ)

U+200C,阻止相邻字符连接,在波斯语/阿拉伯语中对正确字母形式是必需的,也用于梵文中阻止连字。

· Updated

What is ZWNJ (Zero Width Non-Joiner)?

ZWNJ stands for Zero Width Non-Joiner, encoded at U+200C. Like its counterpart ZWJ (Zero Width Joiner, U+200D), ZWNJ is an invisible formatting character with zero visual width. Where ZWJ instructs rendering engines to join or ligate adjacent characters, ZWNJ does the opposite: it prevents joining, ligature formation, or cursive connection between characters that would otherwise be combined by default.

ZWNJ is primarily used in scripts that employ cursive or ligature-forming typography — most notably Arabic, Persian (Farsi), Urdu, Devanagari, and other Brahmic scripts — as well as in general typography for ligature control in Latin script.

How ZWNJ Works in Arabic and Persian

In Arabic script, most letters connect to their neighbors in a cursive flow, changing their shape based on position. Persian (Farsi) is written in Arabic script with a few additional characters. In both languages, there are grammatical and typographic situations where a word boundary or morpheme boundary should visually break the cursive connection, even though the characters are part of the same word.

For example, the Persian word می‌رود (miravad, "goes") is written with a ZWNJ between می and رود to show that these are two morphemes — a prefix and a verb — without inserting a full space. The ZWNJ breaks the cursive connection and creates a slight visual gap (a "half-space") while keeping the word as a single typographic unit for justification and line-breaking purposes. This usage is standard and required for correct Persian typography.

How ZWNJ Works in Devanagari and Brahmic Scripts

In Devanagari (used for Hindi, Sanskrit, Marathi, Nepali), ZWNJ prevents the formation of conjunct consonants. When two consonants appear together, the Devanagari rendering system normally forms a conjunct ligature — a combined glyph. Inserting ZWNJ between them preserves both consonants in their independent (virama-terminated) forms rather than combining them.

For example: क + ् + ष normally renders as the conjunct क्ष (ksha). Inserting ZWNJ (क + ् + ZWNJ + ष) renders as क् ष — the halant form of ka followed by independent sha.

ZWNJ in Latin Typography

In Latin script, ZWNJ can prevent automatic ligature formation. High-quality typography systems (TeX, OpenType) automatically ligate character pairs like fi, fl, ff, ffi, ffl. In some contexts — such as compound words in German where the ligature would cross a morpheme boundary — the ligature is typographically incorrect. ZWNJ inserted between the two letters prevents the ligature.

ZWNJ and ZWJ Comparison

Property ZWNJ (U+200C) ZWJ (U+200D)
Full name Zero Width Non-Joiner Zero Width Joiner
Effect on joining Prevents joining/ligature Encourages joining/ligature
Primary use Persian half-space, Devanagari conjunct prevention Emoji sequences, Arabic joining
Visual width Zero Zero
Unicode category Cf (Format) Cf (Format)

Security and Data Considerations

ZWNJ, like all invisible characters, can be used to insert hidden content into text. Two visually identical strings may differ because one contains ZWNJ characters. This matters for:

  • Password comparison: A password with embedded ZWNJ is technically different from one without
  • Text search: Search engines typically ignore ZWNJ for matching purposes
  • Data normalization: Applications processing user input should define a policy for stripping or preserving ZWNJ

Quick Facts

Property Value
Code point U+200C
Name ZERO WIDTH NON-JOINER
Unicode category Cf (Format character)
Visual width Zero — completely invisible
Primary script use Persian (Farsi), Arabic, Devanagari, Brahmic scripts
Persian function Half-space for morpheme boundary (می‌رود)
Latin use Prevents automatic fi/fl ligatures
Introduced Unicode 1.1 (1993)

相关术语

安全 中的更多内容

Bidi Text Attack

Exploiting Unicode bidirectional control characters to disguise malicious code or filenames. The …

IDN 同形字攻击

在域名中使用视觉上相似的Unicode字符来冒充合法网站的攻击,аpple.com(西里尔а)看起来像apple.com,浏览器通过Punycode显示规则加以防范。

Normalization Attack

Exploiting Unicode normalization to bypass security filters. Input validated before normalization may …

Unicode 欺骗

利用Unicode功能欺骗用户:同形字用于假冒域名,双向覆盖用于伪造文件扩展名,不可见字符用于隐藏文本。

双向覆盖攻击

利用Unicode双向覆盖字符(U+202A–U+202E、U+2066–U+2069)伪装恶意文件名或代码的攻击,'readme‮fdp.exe'显示为'readmeexe.pdf'。

同形字

来自不同文字系统但外观相同或非常相似的字符,如拉丁'a'与西里尔'а',用于网络钓鱼、欺骗和社会工程学攻击。

易混淆字符

Unicode对视觉上可能混淆的字符对的官方术语,定义于confusables.txt(UCD),比同形字范围更广,包含仅仅相似而非完全相同的字符。

混合文字系统检测

识别混合不同文字系统字符的文本(如拉丁文+西里尔文),是防御同形字攻击的主要手段,浏览器据此触发Punycode显示。

零宽连接符 (ZWJ)

U+200D,请求相邻字符连接,是表情符号序列的关键(👩+ZWJ+💻=👩‍💻),在印度文字中请求形成连字,也可用于隐藏文本边界。