Web & HTML

国際化ドメイン名 (IDN)

非ASCIIのUnicode文字を含むドメイン名で、内部的にはPunycode(xn--...)として保存されますが、ユーザーにはUnicodeで表示されます。セキュリティ上の懸念:ホモグラフ攻撃。

· Updated

What Are Internationalized Domain Names?

Internationalized Domain Names (IDNs) allow domain names to contain non-ASCII characters — letters from Arabic, Chinese, Cyrillic, Devanagari, Hebrew, Japanese, Korean, Thai, and hundreds of other scripts. Before IDNs, domain names were restricted to ASCII letters, digits, and hyphens (LDH characters).

IDNs let users register and navigate to domain names written entirely in their native scripts: 例え.jp, مثال.إختبار, пример.испытание. This dramatically improves accessibility for the billions of internet users whose primary scripts are not Latin.

How IDNs Work Technically

The DNS infrastructure only understands ASCII. IDNs bridge this gap using Punycode encoding: non-ASCII domain labels are converted to an ASCII-compatible encoding (ACE) prefixed with xn--. The Punycode-encoded forms travel through DNS; the original Unicode forms are displayed to users.

User sees:    例え.jp
DNS query:    xn--r8jz45g.jp

The conversion is handled by the operating system's DNS resolver or the browser's IDN processing layer. Applications interact with the human-readable Unicode form; the network sees only ASCII.

IDNA Standards

Two versions of the IDNA (Internationalizing Domain Names in Applications) protocol exist:

  • IDNA2003 (RFC 3490): First standard. Uses NAMEPREP profile of Stringprep for normalization.
  • IDNA2008 (RFC 5891/5892): Stricter. Removes some characters IDNA2003 allowed (like ßss mapping). More conservative and consistent.

Most modern systems use IDNA2008. Some compatibility issues exist between the versions for a small set of characters.

Security Considerations

IDNs introduce homograph attacks: characters from different scripts can look identical to Latin letters. For example, Cyrillic а (U+0430) looks like Latin a (U+0061). A malicious domain pаypal.com might use Cyrillic а to impersonate paypal.com.

Browsers defend against this by: - Displaying the Punycode form (xn--...) when a domain mixes scripts or contains confusable characters. - Restricting which labels show as Unicode (typically: all characters from a single script, using registered scripts for the TLD).

# This URL shows as Punycode in Chrome (mixed scripts)
http://xn--80ak6aa92e.com/

Using IDNs in Python

import idna  # pip install idna

# Encode: Unicode → Punycode
idna.encode("例え.jp")              # b"xn--r8jz45g.jp"
idna.encode("münchen.de")          # b"xn--mnchen-3ya.de"
idna.encode("пример.испытание")    # b"xn--e1afmapc.xn--80akhbyknj4f"

# Decode: Punycode → Unicode
idna.decode("xn--r8jz45g.jp")      # "例え.jp"
idna.decode("xn--mnchen-3ya.de")   # "münchen.de"

# Standard library (limited to IDNA2003)
"例え.jp".encode("idna")           # b"xn--r8jz45g.jp"
b"xn--r8jz45g.jp".decode("idna")   # "例え.jp"

IDN Email Addresses

Email also supports international addresses (EAI — Email Address Internationalization, RFC 6531). A full internationalized email address can use Unicode in both the local part and the domain:

用户@例子.广告    (Chinese)
उपयोगकर्ता@उदाहरण.भारत    (Hindi)

Support for EAI in mail clients and servers is still growing.

Quick Facts

Property Value
Full name Internationalized Domain Names in Applications (IDNA)
DNS representation Punycode-encoded ASCII (xn--...)
User-visible form Unicode characters in native script
Current standard IDNA2008 (RFC 5891/5892)
Max label length (encoded) 63 ASCII characters
Homograph attacks Mitigated by browser mixed-script detection
Python library idna (pip install idna) for IDNA2008

関連用語

Web & HTML のその他の用語

Content-Type 文字セット

レスポンスの文字エンコーディングを宣言するHTTPヘッダーパラメータ(Content-Type: text/html; charset=utf-8)。ドキュメント内のエンコーディング宣言より優先されます。

CSS content プロパティ

::beforeおよび::after疑似要素でUnicodeエスケープを使って生成コンテンツを挿入するCSSプロパティ:content: '\2713'は✓を挿入します。

CSS Text Direction

CSS properties (direction, writing-mode, unicode-bidi) controlling text layout direction. Works with Unicode …

HTML エンティティ

HTMLで文字をテキスト表現する方式。3つの形式:名前(&)・十進数(&)・16進数(&)。HTMLの構文と衝突する文字に必須です。

JavaScript Intl API

ECMAScript Internationalization API providing locale-aware string comparison (Collator), number formatting (NumberFormat), date …

Punycode

Unicode ドメイン名をxn--プレフィックス付きのASCII文字列に変換するASCII互換エンコーディング。münchen.de → xn--mnchen-3ya.de。

Unicode in CSS

CSS supports Unicode via escape sequences (\2713 for ✓), the content property …

XML 文字参照

XMLバージョンの数値文字参照:✓または✓。XMLには名前付きエンティティが5個(& < > " ')しかありませんが、HTML5は2,231個あります。

テキスト表示

デフォルトの絵文字表示の代わりに、通常は異体字セレクター15(U+FE0E)を使って文字をモノクロのテキストグリフでレンダリングすること。

パーセントエンコーディング (URL エンコーディング)

URLの非ASCII文字と予約文字を各バイトを%XXで置き換えてエンコードします。まずUTF-8に変換し、各バイトをパーセントエンコードします:é → %C3%A9。