网页与 HTML

Punycode

将Unicode域名转换为xn--前缀ASCII字符串的ASCII兼容编码,例如münchen.de → xn--mnchen-3ya.de。

· Updated

What Is Punycode?

Punycode is an ASCII-compatible encoding algorithm for Unicode strings, defined in RFC 3492. It converts a Unicode string — potentially containing characters from any script — into a string using only ASCII letters, digits, and hyphens. Punycode is the mechanism that makes Internationalized Domain Names (IDNs) work within the ASCII-only DNS infrastructure.

A Punycode-encoded IDN label is prefixed with xn-- to mark it as an ACE (ASCII Compatible Encoding) label. The full domain münchen.de becomes xn--mnchen-3ya.de in Punycode.

The Algorithm

Punycode represents a Unicode string in two parts, separated by a hyphen:

  1. Basic code points (ASCII characters): copied verbatim before the final hyphen.
  2. Non-basic code points: encoded as delta values (variable-length integers) appended after the hyphen.
münchen → m-u-n-c-h-e-n (basic) + ü (non-basic: ü = U+00FC)
xn--mnchen-3ya
       ^      ^
       |      encoding of position and value of ü
       basic chars minus ü

The algorithm uses a generalized variable-length integer (generalized variable-length quantity, or "base-36 with bias") to encode the insertion positions and code point values compactly.

Examples

# Python standard library
"münchen".encode("punycode")         # b"mnchen-3ya"
"例え.jp".split(".")[0].encode("punycode")  # b"r8jz45g"

# With xn-- prefix (full IDNA encoding)
"münchen.de".encode("idna")          # b"xn--mnchen-3ya.de"
"例え.jp".encode("idna")             # b"xn--r8jz45g.jp"

# Decoding
b"mnchen-3ya".decode("punycode")     # "münchen"
b"xn--mnchen-3ya.de".decode("idna")  # "münchen.de"

Common Punycode Examples

Unicode Domain Punycode
münchen.de xn--mnchen-3ya.de
例え.jp xn--r8jz45g.jp
中文.com xn--fiq228c.com
한국어.한국 xn--bj0bj06e.xn--3e0b707e
مثال.إختبار xn--mgbh0fb.xn--kgbechtv
пример.испытание xn--e1afmapc.xn--80akhbyknj4f

Punycode in Browsers

Modern browsers display the Unicode form of IDN labels when the characters all come from a single script and the label passes homograph safety checks. Mixed-script domains or suspicious lookalike characters trigger display of the Punycode form as a security warning:

User types:    https://münchen.de/
Browser shows: https://münchen.de/     (safe: all Latin)
DNS query:     xn--mnchen-3ya.de

User types:    https://pаypal.com/     (Cyrillic а)
Browser shows: https://xn--pypal-4ve.com/  (suspicious: mixed script)

Punycode for Non-DNS Uses

Although designed for DNS, Punycode can encode any Unicode string. It is sometimes used in email systems (IDNA for the domain part) and in IRI (Internationalized Resource Identifier) processing. However, other encodings like percent-encoding are preferred for URL paths and query strings.

Limitations

  • Not encryption or compression: Punycode is purely a reversible encoding for ASCII transport.
  • Label length: Each DNS label encoded in Punycode must not exceed 63 ASCII characters.
  • Readability: xn--fiq228c.com is meaningless to humans — the whole point of IDN display in browsers is to hide this from users.

Quick Facts

Property Value
RFC RFC 3492
Purpose ASCII-compatible encoding for Unicode strings in DNS
Prefix for IDN labels xn--
Character set a–z, 0–9, -
Encoding style Basic ASCII copied; non-basic encoded as delta integers
Python codec "punycode" (bare) or "idna" (with xn-- prefix)
Max encoded label 63 ASCII characters

相关术语

网页与 HTML 中的更多内容

Content-Type 字符集

声明响应字符编码的HTTP头参数(Content-Type: text/html; charset=utf-8),优先级高于文档内的编码声明。

CSS content 属性

通过::before和::after伪元素使用Unicode转义插入生成内容的CSS属性:content: '\2713'可插入✓。

CSS Text Direction

CSS properties (direction, writing-mode, unicode-bidi) controlling text layout direction. Works with Unicode …

HTML 实体

HTML中字符的文本表示方式,有三种形式:命名(&)、十进制(&)、十六进制(&),对于与HTML语法冲突的字符是必需的。

JavaScript Intl API

ECMAScript Internationalization API providing locale-aware string comparison (Collator), number formatting (NumberFormat), date …

Unicode in CSS

CSS supports Unicode via escape sequences (\2713 for ✓), the content property …

XML 字符引用

XML版本的数字字符引用:✓或✓,XML只有5个命名实体(& < > " '),而HTML5有2,231个。

变体选择符

选择特定字形变体的字符(U+FE00–U+FE0F、U+E0100–U+E01EF),VS15(U+FE0E)表示文本呈现,VS16(U+FE0F)表示表情符号呈现。

命名字符引用

使用人类可读名称的HTML实体:© → ©,— → —。HTML5定义了2,231个命名引用,且区分大小写。

国际化域名 (IDN)

包含非ASCII Unicode字符的域名,内部以Punycode(xn--...)存储,但向用户显示为Unicode,安全隐患:同形字攻击。