Referensi karakter numerik
HTML entity menggunakan nomor code point Unicode: desimal (© → ©) atau heksadesimal (© → ©). Bekerja untuk karakter Unicode apa pun, tidak seperti referensi bernama.
What Are Numeric Character References?
Numeric character references (NCRs) are HTML escape sequences that represent any Unicode character by its code point number. They take two forms:
- Decimal:
&#N;where N is a base-10 integer — e.g.,©for © - Hexadecimal:
&#xH;where H is a base-16 integer — e.g.,©for ©
Both forms refer to the Unicode scalar value of the character. Since Unicode covers over 1.1 million code points, NCRs can represent virtually any character ever assigned — from basic Latin letters to rare CJK ideographs and emoji — using only ASCII characters in the source.
Decimal vs. Hexadecimal
Decimal NCRs (&#N;) are straightforward for readers who know the decimal code points of common characters (65 = 'A', 169 = '©'). Hexadecimal NCRs (&#xH;) align with how Unicode code points are conventionally written — U+00A9 maps directly to ©. When working with Unicode documentation or character tables that list code points in hex, the hex form is easier to use without mental conversion.
<!-- These are identical -->
A = A = A
© = © = ©
€ = € = €
😀 = 😀 = 😀
Valid Range
Valid code points for NCRs are: 1–55295 (U+0001–U+D7FF) and 57344–1114111 (U+E000–U+10FFFF). The surrogate range U+D800–U+DFFF is invalid and must not be encoded. U+0000 (NULL) is also excluded. Browsers may render other disallowed code points (such as U+0001–U+001F control characters) as the replacement character U+FFFD.
Supplementary Characters
NCRs fully support Unicode supplementary characters (code points above U+FFFF). In UTF-16 these require surrogate pairs, but in HTML you write a single NCR:
<!-- U+1F4A9 PILE OF POO — supplementary character -->
💩 <!-- decimal -->
💩 <!-- hex -->
This is one advantage of NCRs over raw UTF-16 encoding in old environments.
Practical Use
<!-- Escaping in content -->
<p>The formula is a < b < c</p>
<!-- < and < both render as < -->
<!-- Characters outside keyboard reach -->
<p>The currency symbol is ₹ (Indian Rupee)</p>
<!-- In HTML attributes -->
<input placeholder="Enter ❤ here">
<!-- Emoji -->
<title>Unicode Guide 📚</title>
# Python: convert character to NCR
char = "©"
f"&#{ord(char)};" # "©"
f"&#x{ord(char):X};" # "©"
# Python: decode NCR
import html
html.unescape("©") # "©"
html.unescape("©") # "©"
NCRs vs. Named References vs. Direct Characters
| Approach | Example | Readability | Coverage |
|---|---|---|---|
| Named reference | © |
High (for known names) | 2,231 characters |
| Decimal NCR | © |
Medium | All Unicode |
| Hex NCR | © |
Medium (for Unicode users) | All Unicode |
| Direct UTF-8 | © |
Highest | All Unicode |
In modern UTF-8 documents, direct characters are preferred. NCRs remain valuable in legacy ASCII environments and when generating HTML programmatically.
Quick Facts
| Property | Value |
|---|---|
| Decimal syntax | &#N; (N is base-10 code point) |
| Hex syntax | &#xH; or &#XH; (H is base-16 code point) |
| Valid range | U+0001–U+D7FF and U+E000–U+10FFFF |
| Covers all Unicode | Yes — any assigned code point |
| Surrogates allowed | No — invalid in HTML |
| Case of hex digits | Case-insensitive: © = © |
| Trailing semicolon | Required; optional only in certain legacy contexts |
Istilah Terkait
Lainnya di Web & HTML
CSS properties (direction, writing-mode, unicode-bidi) controlling text layout direction. Works with Unicode …
Representasi tekstual sebuah karakter dalam HTML. Tiga bentuk: nama (&), desimal (&), …
Nama domain yang berisi karakter Unicode non-ASCII, disimpan secara internal sebagai Punycode …
ECMAScript Internationalization API providing locale-aware string comparison (Collator), number formatting (NumberFormat), date …
Parameter header HTTP yang mendeklarasikan encoding karakter dari sebuah respons (Content-Type: text/html; …
Karakter (U+FE00–U+FE0F, U+E0100–U+E01EF) yang memilih varian glyph tertentu. VS15 (U+FE0E) = tampilan …
U+2060. Karakter zero-width yang mencegah pemisahan baris. Pengganti modern U+FEFF (BOM) sebagai …
Encoding karakter non-ASCII dan karakter khusus dalam URL dengan mengganti setiap byte …
Properti CSS yang menyisipkan konten yang dihasilkan via pseudo-elemen ::before dan ::after …
Encoding yang kompatibel dengan ASCII untuk nama domain Unicode, mengonversi label yang …