Tham chiếu ký tự số
HTML entity sử dụng số điểm mã Unicode: thập phân (© → ©) hoặc thập lục phân (© → ©). Hoạt động với mọi ký tự Unicode, không giống như tham chiếu tên.
What Are Numeric Character References?
Numeric character references (NCRs) are HTML escape sequences that represent any Unicode character by its code point number. They take two forms:
- Decimal:
&#N;where N is a base-10 integer — e.g.,©for © - Hexadecimal:
&#xH;where H is a base-16 integer — e.g.,©for ©
Both forms refer to the Unicode scalar value of the character. Since Unicode covers over 1.1 million code points, NCRs can represent virtually any character ever assigned — from basic Latin letters to rare CJK ideographs and emoji — using only ASCII characters in the source.
Decimal vs. Hexadecimal
Decimal NCRs (&#N;) are straightforward for readers who know the decimal code points of common characters (65 = 'A', 169 = '©'). Hexadecimal NCRs (&#xH;) align with how Unicode code points are conventionally written — U+00A9 maps directly to ©. When working with Unicode documentation or character tables that list code points in hex, the hex form is easier to use without mental conversion.
<!-- These are identical -->
A = A = A
© = © = ©
€ = € = €
😀 = 😀 = 😀
Valid Range
Valid code points for NCRs are: 1–55295 (U+0001–U+D7FF) and 57344–1114111 (U+E000–U+10FFFF). The surrogate range U+D800–U+DFFF is invalid and must not be encoded. U+0000 (NULL) is also excluded. Browsers may render other disallowed code points (such as U+0001–U+001F control characters) as the replacement character U+FFFD.
Supplementary Characters
NCRs fully support Unicode supplementary characters (code points above U+FFFF). In UTF-16 these require surrogate pairs, but in HTML you write a single NCR:
<!-- U+1F4A9 PILE OF POO — supplementary character -->
💩 <!-- decimal -->
💩 <!-- hex -->
This is one advantage of NCRs over raw UTF-16 encoding in old environments.
Practical Use
<!-- Escaping in content -->
<p>The formula is a < b < c</p>
<!-- < and < both render as < -->
<!-- Characters outside keyboard reach -->
<p>The currency symbol is ₹ (Indian Rupee)</p>
<!-- In HTML attributes -->
<input placeholder="Enter ❤ here">
<!-- Emoji -->
<title>Unicode Guide 📚</title>
# Python: convert character to NCR
char = "©"
f"&#{ord(char)};" # "©"
f"&#x{ord(char):X};" # "©"
# Python: decode NCR
import html
html.unescape("©") # "©"
html.unescape("©") # "©"
NCRs vs. Named References vs. Direct Characters
| Approach | Example | Readability | Coverage |
|---|---|---|---|
| Named reference | © |
High (for known names) | 2,231 characters |
| Decimal NCR | © |
Medium | All Unicode |
| Hex NCR | © |
Medium (for Unicode users) | All Unicode |
| Direct UTF-8 | © |
Highest | All Unicode |
In modern UTF-8 documents, direct characters are preferred. NCRs remain valuable in legacy ASCII environments and when generating HTML programmatically.
Quick Facts
| Property | Value |
|---|---|
| Decimal syntax | &#N; (N is base-10 code point) |
| Hex syntax | &#xH; or &#XH; (H is base-16 code point) |
| Valid range | U+0001–U+D7FF and U+E000–U+10FFFF |
| Covers all Unicode | Yes — any assigned code point |
| Surrogates allowed | No — invalid in HTML |
| Case of hex digits | Case-insensitive: © = © |
| Trailing semicolon | Required; optional only in certain legacy contexts |
Thuật ngữ liên quan
Thêm trong Web & HTML
Các ký tự (U+FE00–U+FE0F, U+E0100–U+E01EF) chọn biến thể glyph cụ thể. VS15 …
Tham số header HTTP khai báo mã hóa ký tự của phản …
U+2060. Ký tự không chiều rộng ngăn chặn ngắt dòng. Thay thế …
CSS properties (direction, writing-mode, unicode-bidi) controlling text layout direction. Works with Unicode …
Kết xuất một ký tự với glyph emoji màu sắc, thường sử …
Kết xuất một ký tự với glyph văn bản một màu đơn …
Tên miền chứa ký tự Unicode không phải ASCII, được lưu trữ …
ECMAScript Internationalization API providing locale-aware string comparison (Collator), number formatting (NumberFormat), date …
Mã hóa các ký tự không phải ASCII và ký tự dành …
Mã hóa tương thích ASCII của tên miền Unicode, chuyển đổi nhãn …