IDN 同形字攻击
在域名中使用视觉上相似的Unicode字符来冒充合法网站的攻击,аpple.com(西里尔а)看起来像apple.com,浏览器通过Punycode显示规则加以防范。
What is an IDN Homograph Attack?
An IDN homograph attack is a phishing technique that exploits the Internationalized Domain Names in Applications (IDNA) standard to register domain names that look visually identical to legitimate domains but are composed of different Unicode characters. The attack was formally described by Evgeniy Gabrilovich and Alex Gontmakher in a 2002 paper, though the vulnerability had been anticipated when the IDN standard was being developed.
The term combines two concepts: IDN (Internationalized Domain Names, which allow non-ASCII characters in domain names) and homograph (a word that looks like another word but has different meaning — here applied to individual characters).
How IDNA Works
The Domain Name System (DNS) was designed for ASCII characters only. IDNA extends this by encoding non-ASCII domain names as ASCII-compatible encoding (ACE) using Punycode. For example:
- The domain
münchen.deis stored in DNS asxn--mnchen-3ya.de - The domain
中文.comis stored asxn--fiq228c.com
This allows speakers of all languages to register domains in their native scripts. Unfortunately, it also enables attackers to register domains that, when rendered as Unicode, are visually indistinguishable from existing domains.
Attack Mechanics
Consider the following real-world demonstration from 2017:
Security researcher Xudong Zheng registered аррlе.com — a domain where all five characters are Cyrillic lookalikes for the Latin letters a, p, p, l, e. The Punycode form is xn--80ak6aa92e.com. When Chrome and Firefox rendered this domain in their address bars (before the patch), users saw apple.com — indistinguishable from the genuine Apple website.
The attack succeeds because:
- IDNA2008 (the current standard) does not prohibit mixing visually similar characters across scripts
- Browsers historically displayed the Unicode form of domain names, not Punycode
- TLS certificates can be issued for IDN domains, so the padlock icon provides false assurance
Real-World Examples
- 2005: The first documented IDN phishing attempts targeting PayPal and eBay
- 2017: Xudong Zheng's
xn--80ak6aa92e.comdemonstration forced Chrome and Firefox to update their rendering policies - Ongoing: Security researchers regularly discover registered IDN lookalikes for banking, social media, and government domains
Browser Defenses
Browsers apply various heuristics to decide whether to display a domain as Unicode or Punycode:
- Firefox: Displays Punycode for any domain containing characters from multiple scripts unless the domain's TLD operator has whitelisted it
- Chrome: Uses a script-mixing heuristic combined with a block-list of known confusable patterns
- Safari: Converts to Punycode for mixed-script domains
These heuristics are not foolproof. Single-script attacks (all Cyrillic lookalikes) may still render as Unicode in some browsers.
Registrar Defenses
Some domain registrars and TLD operators implement IDNA-aware screening:
- Prohibiting registration of domains that are confusable with existing popular domains
- Restricting IDN registrations to a single script per label
- Requiring additional verification for IDN registrations
The .com and .net TLDs managed by Verisign apply mixed-script restrictions. However, enforcement varies widely across registrars and TLDs.
Quick Facts
| Property | Value |
|---|---|
| First described | Gabrilovich & Gontmakher, 2002 |
| Notable demonstration | Xudong Zheng, April 2017 |
| Encoding mechanism abused | IDNA / Punycode (RFC 3492) |
| Primary target | Web domain names for phishing |
| Browser mitigation | Punycode fallback for mixed-script domains |
| Certificate authority role | CAs issue certs for IDN domains — padlock does not indicate safety |
| Related standard | Unicode TR39 confusables, IDNA2008 (RFC 5891) |
相关术语
安全 中的更多内容
Exploiting Unicode bidirectional control characters to disguise malicious code or filenames. The …
Exploiting Unicode normalization to bypass security filters. Input validated before normalization may …
利用Unicode功能欺骗用户:同形字用于假冒域名,双向覆盖用于伪造文件扩展名,不可见字符用于隐藏文本。
利用Unicode双向覆盖字符(U+202A–U+202E、U+2066–U+2069)伪装恶意文件名或代码的攻击,'readmefdp.exe'显示为'readmeexe.pdf'。
来自不同文字系统但外观相同或非常相似的字符,如拉丁'a'与西里尔'а',用于网络钓鱼、欺骗和社会工程学攻击。
Unicode对视觉上可能混淆的字符对的官方术语,定义于confusables.txt(UCD),比同形字范围更广,包含仅仅相似而非完全相同的字符。
识别混合不同文字系统字符的文本(如拉丁文+西里尔文),是防御同形字攻击的主要手段,浏览器据此触发Punycode显示。
U+200D,请求相邻字符连接,是表情符号序列的关键(👩+ZWJ+💻=👩💻),在印度文字中请求形成连字,也可用于隐藏文本边界。
U+200C,阻止相邻字符连接,在波斯语/阿拉伯语中对正确字母形式是必需的,也用于梵文中阻止连字。