국제화 도메인 이름 (IDN)
비ASCII 유니코드 문자를 포함하는 도메인 이름으로, 내부적으로는 Punycode(xn--...)로 저장되지만 사용자에게는 유니코드로 표시됩니다. 보안 위협: 동형이자 공격.
What Are Internationalized Domain Names?
Internationalized Domain Names (IDNs) allow domain names to contain non-ASCII characters — letters from Arabic, Chinese, Cyrillic, Devanagari, Hebrew, Japanese, Korean, Thai, and hundreds of other scripts. Before IDNs, domain names were restricted to ASCII letters, digits, and hyphens (LDH characters).
IDNs let users register and navigate to domain names written entirely in their native scripts: 例え.jp, مثال.إختبار, пример.испытание. This dramatically improves accessibility for the billions of internet users whose primary scripts are not Latin.
How IDNs Work Technically
The DNS infrastructure only understands ASCII. IDNs bridge this gap using Punycode encoding: non-ASCII domain labels are converted to an ASCII-compatible encoding (ACE) prefixed with xn--. The Punycode-encoded forms travel through DNS; the original Unicode forms are displayed to users.
User sees: 例え.jp
DNS query: xn--r8jz45g.jp
The conversion is handled by the operating system's DNS resolver or the browser's IDN processing layer. Applications interact with the human-readable Unicode form; the network sees only ASCII.
IDNA Standards
Two versions of the IDNA (Internationalizing Domain Names in Applications) protocol exist:
- IDNA2003 (RFC 3490): First standard. Uses NAMEPREP profile of Stringprep for normalization.
- IDNA2008 (RFC 5891/5892): Stricter. Removes some characters IDNA2003 allowed (like
ß→ssmapping). More conservative and consistent.
Most modern systems use IDNA2008. Some compatibility issues exist between the versions for a small set of characters.
Security Considerations
IDNs introduce homograph attacks: characters from different scripts can look identical to Latin letters. For example, Cyrillic а (U+0430) looks like Latin a (U+0061). A malicious domain pаypal.com might use Cyrillic а to impersonate paypal.com.
Browsers defend against this by:
- Displaying the Punycode form (xn--...) when a domain mixes scripts or contains confusable characters.
- Restricting which labels show as Unicode (typically: all characters from a single script, using registered scripts for the TLD).
# This URL shows as Punycode in Chrome (mixed scripts)
http://xn--80ak6aa92e.com/
Using IDNs in Python
import idna # pip install idna
# Encode: Unicode → Punycode
idna.encode("例え.jp") # b"xn--r8jz45g.jp"
idna.encode("münchen.de") # b"xn--mnchen-3ya.de"
idna.encode("пример.испытание") # b"xn--e1afmapc.xn--80akhbyknj4f"
# Decode: Punycode → Unicode
idna.decode("xn--r8jz45g.jp") # "例え.jp"
idna.decode("xn--mnchen-3ya.de") # "münchen.de"
# Standard library (limited to IDNA2003)
"例え.jp".encode("idna") # b"xn--r8jz45g.jp"
b"xn--r8jz45g.jp".decode("idna") # "例え.jp"
IDN Email Addresses
Email also supports international addresses (EAI — Email Address Internationalization, RFC 6531). A full internationalized email address can use Unicode in both the local part and the domain:
用户@例子.广告 (Chinese)
उपयोगकर्ता@उदाहरण.भारत (Hindi)
Support for EAI in mail clients and servers is still growing.
Quick Facts
| Property | Value |
|---|---|
| Full name | Internationalized Domain Names in Applications (IDNA) |
| DNS representation | Punycode-encoded ASCII (xn--...) |
| User-visible form | Unicode characters in native script |
| Current standard | IDNA2008 (RFC 5891/5892) |
| Max label length (encoded) | 63 ASCII characters |
| Homograph attacks | Mitigated by browser mixed-script detection |
| Python library | idna (pip install idna) for IDNA2008 |
관련 용어
웹 & HTML의 더 많은 용어
응답의 문자 인코딩을 선언하는 HTTP 헤더 매개변수(Content-Type: text/html; charset=utf-8). 문서 내 인코딩 …
::before 및 ::after 의사 요소를 통해 유니코드 이스케이프를 사용하여 생성된 콘텐츠를 삽입하는 …
CSS properties (direction, writing-mode, unicode-bidi) controlling text layout direction. Works with Unicode …
HTML에서 문자를 텍스트로 표현하는 방식. 세 가지 형태: 이름(&), 십진수(&), 16진수(&). HTML …
ECMAScript Internationalization API providing locale-aware string comparison (Collator), number formatting (NumberFormat), date …
유니코드 도메인 이름을 ASCII 호환 인코딩으로 변환하여 xn-- 접두사가 붙은 ASCII 문자열로 …
CSS supports Unicode via escape sequences (\2713 for ✓), the content property …
XML 버전의 숫자 문자 참조: ✓ 또는 ✓. XML에는 명명된 엔티티가 5개(& …
U+2060. 줄 바꿈을 방지하는 너비 없는 문자. 너비 없는 줄 바꿈 없는 …
사람이 읽기 쉬운 이름을 사용하는 HTML 엔티티: © → ©, — → …