What is 国際化ドメイン名 (IDN)?

非ASCIIのUnicode文字を含むドメイン名で、内部的にはPunycode（xn--...）として保存されますが、ユーザーにはUnicodeで表示されます。セキュリティ上の懸念：ホモグラフ攻撃。

Unicode ドメイン名をxn--プレフィックス付きのASCII文字列に変換するASCII互換エンコーディング。münchen.de → xn--mnchen-3ya.de。

What is スクリプト?

文字が属する文字体系（例：ラテン、キリル、漢字）。Unicode 16.0は168個のスクリプトを定義し、Scriptプロパティはセキュリティと混在スクリプト検出に重要です。

セキュリティ

IDN ホモグラフ攻撃

ドメイン名に視覚的に似たUnicode文字を使って正規サイトになりすます攻撃。аpple.com（キリルа）はapple.comに見えます。ブラウザはPunycodeの表示ルールで防御します。

2024-10-14 · Updated 2025-09-22

What is an IDN Homograph Attack?

An IDN homograph attack is a phishing technique that exploits the Internationalized Domain Names in Applications (IDNA) standard to register domain names that look visually identical to legitimate domains but are composed of different Unicode characters. The attack was formally described by Evgeniy Gabrilovich and Alex Gontmakher in a 2002 paper, though the vulnerability had been anticipated when the IDN standard was being developed.

The term combines two concepts: IDN (Internationalized Domain Names, which allow non-ASCII characters in domain names) and homograph (a word that looks like another word but has different meaning — here applied to individual characters).

How IDNA Works

The Domain Name System (DNS) was designed for ASCII characters only. IDNA extends this by encoding non-ASCII domain names as ASCII-compatible encoding (ACE) using Punycode. For example:

The domain münchen.de is stored in DNS as xn--mnchen-3ya.de
The domain 中文.com is stored as xn--fiq228c.com

This allows speakers of all languages to register domains in their native scripts. Unfortunately, it also enables attackers to register domains that, when rendered as Unicode, are visually indistinguishable from existing domains.

Attack Mechanics

Consider the following real-world demonstration from 2017:

Security researcher Xudong Zheng registered аррlе.com — a domain where all five characters are Cyrillic lookalikes for the Latin letters a, p, p, l, e. The Punycode form is xn--80ak6aa92e.com. When Chrome and Firefox rendered this domain in their address bars (before the patch), users saw apple.com — indistinguishable from the genuine Apple website.

The attack succeeds because:

IDNA2008 (the current standard) does not prohibit mixing visually similar characters across scripts
Browsers historically displayed the Unicode form of domain names, not Punycode
TLS certificates can be issued for IDN domains, so the padlock icon provides false assurance

Real-World Examples

2005: The first documented IDN phishing attempts targeting PayPal and eBay
2017: Xudong Zheng's xn--80ak6aa92e.com demonstration forced Chrome and Firefox to update their rendering policies
Ongoing: Security researchers regularly discover registered IDN lookalikes for banking, social media, and government domains

Browser Defenses

Browsers apply various heuristics to decide whether to display a domain as Unicode or Punycode:

Firefox: Displays Punycode for any domain containing characters from multiple scripts unless the domain's TLD operator has whitelisted it
Chrome: Uses a script-mixing heuristic combined with a block-list of known confusable patterns
Safari: Converts to Punycode for mixed-script domains

These heuristics are not foolproof. Single-script attacks (all Cyrillic lookalikes) may still render as Unicode in some browsers.

Registrar Defenses

Some domain registrars and TLD operators implement IDNA-aware screening:

Prohibiting registration of domains that are confusable with existing popular domains
Restricting IDN registrations to a single script per label
Requiring additional verification for IDN registrations

The .com and .net TLDs managed by Verisign apply mixed-script restrictions. However, enforcement varies widely across registrars and TLDs.

Quick Facts

Property	Value
First described	Gabrilovich & Gontmakher, 2002
Notable demonstration	Xudong Zheng, April 2017
Encoding mechanism abused	IDNA / Punycode (RFC 3492)
Primary target	Web domain names for phishing
Browser mitigation	Punycode fallback for mixed-script domains
Certificate authority role	CAs issue certs for IDN domains — padlock does not indicate safety
Related standard	Unicode TR39 confusables, IDNA2008 (RFC 5891)

セキュリティのその他の用語

Bidi Text Attack

Exploiting Unicode bidirectional control characters to disguise malicious code or filenames. The …

Bidi オーバーライド攻撃

Unicode双方向オーバーライド文字（U+202A〜U+202E・U+2066〜U+2069）を使って悪意のあるファイル名やコードを偽装する攻撃。'readme‮fdp.exe'は'readmeexe.pdf'と表示されます。

Normalization Attack

Exploiting Unicode normalization to bypass security filters. Input validated before normalization may …

Unicode スプーフィング

Unicode機能を使ってユーザーを欺くこと：偽ドメインのためのホモグリフ・偽ファイル拡張子のためのBidiオーバーライド・隠しテキストのための不可視文字。

ゼロ幅接合子 (ZWJ)

U+200D。隣接する文字の結合を要求します。絵文字シーケンスに不可欠です（👩+ZWJ+💻=👩‍💻）。インド系文字では合字形成を要求します。テキスト境界を隠すためにも使われます。

ゼロ幅非接合子 (ZWNJ)

U+200C。隣接する文字の結合を防ぎます。ペルシャ語/アラビア語で正しい文字形態のために必須で、デーヴァナーガリーで合字を防ぐためにも使われます。

ホモグリフ

異なるスクリプトから来た同一または非常に似て見える文字。例：ラテン'a'とキリル'а'。フィッシング・スプーフィング・ソーシャルエンジニアリング攻撃に使われます。

混同しやすい文字

confusables.txt（UCD）で定義された、視覚的に混同しやすい文字ペアに対するUnicodeの公式用語。ホモグリフより広い概念で、単に似ているだけの文字も含みます。

混在スクリプト検出

異なるスクリプトの文字を混在させるテキストを識別します（例：ラテン＋キリル）。ホモグリフ攻撃に対する主要な防御で、ブラウザはこれを使ってPunycode表示をトリガーします。

← 用語集へ