Numerische Zeichenreferenz
HTML-Entität, die die Unicode-Codepunktnummer verwendet: dezimal (© → ©) oder hexadezimal (© → ©). Funktioniert für jedes Unicode-Zeichen, anders als benannte Referenzen.
What Are Numeric Character References?
Numeric character references (NCRs) are HTML escape sequences that represent any Unicode character by its code point number. They take two forms:
- Decimal:
&#N;where N is a base-10 integer — e.g.,©for © - Hexadecimal:
&#xH;where H is a base-16 integer — e.g.,©for ©
Both forms refer to the Unicode scalar value of the character. Since Unicode covers over 1.1 million code points, NCRs can represent virtually any character ever assigned — from basic Latin letters to rare CJK ideographs and emoji — using only ASCII characters in the source.
Decimal vs. Hexadecimal
Decimal NCRs (&#N;) are straightforward for readers who know the decimal code points of common characters (65 = 'A', 169 = '©'). Hexadecimal NCRs (&#xH;) align with how Unicode code points are conventionally written — U+00A9 maps directly to ©. When working with Unicode documentation or character tables that list code points in hex, the hex form is easier to use without mental conversion.
<!-- These are identical -->
A = A = A
© = © = ©
€ = € = €
😀 = 😀 = 😀
Valid Range
Valid code points for NCRs are: 1–55295 (U+0001–U+D7FF) and 57344–1114111 (U+E000–U+10FFFF). The surrogate range U+D800–U+DFFF is invalid and must not be encoded. U+0000 (NULL) is also excluded. Browsers may render other disallowed code points (such as U+0001–U+001F control characters) as the replacement character U+FFFD.
Supplementary Characters
NCRs fully support Unicode supplementary characters (code points above U+FFFF). In UTF-16 these require surrogate pairs, but in HTML you write a single NCR:
<!-- U+1F4A9 PILE OF POO — supplementary character -->
💩 <!-- decimal -->
💩 <!-- hex -->
This is one advantage of NCRs over raw UTF-16 encoding in old environments.
Practical Use
<!-- Escaping in content -->
<p>The formula is a < b < c</p>
<!-- < and < both render as < -->
<!-- Characters outside keyboard reach -->
<p>The currency symbol is ₹ (Indian Rupee)</p>
<!-- In HTML attributes -->
<input placeholder="Enter ❤ here">
<!-- Emoji -->
<title>Unicode Guide 📚</title>
# Python: convert character to NCR
char = "©"
f"&#{ord(char)};" # "©"
f"&#x{ord(char):X};" # "©"
# Python: decode NCR
import html
html.unescape("©") # "©"
html.unescape("©") # "©"
NCRs vs. Named References vs. Direct Characters
| Approach | Example | Readability | Coverage |
|---|---|---|---|
| Named reference | © |
High (for known names) | 2,231 characters |
| Decimal NCR | © |
Medium | All Unicode |
| Hex NCR | © |
Medium (for Unicode users) | All Unicode |
| Direct UTF-8 | © |
Highest | All Unicode |
In modern UTF-8 documents, direct characters are preferred. NCRs remain valuable in legacy ASCII environments and when generating HTML programmatically.
Quick Facts
| Property | Value |
|---|---|
| Decimal syntax | &#N; (N is base-10 code point) |
| Hex syntax | &#xH; or &#XH; (H is base-16 code point) |
| Valid range | U+0001–U+D7FF and U+E000–U+10FFFF |
| Covers all Unicode | Yes — any assigned code point |
| Surrogates allowed | No — invalid in HTML |
| Case of hex digits | Case-insensitive: © = © |
| Trailing semicolon | Required; optional only in certain legacy contexts |
Verwandte Begriffe
Mehr in Web & HTML
HTML-Entität mit einem menschenlesbaren Namen: © → ©, — → —. HTML5 …
HTTP-Header-Parameter, der die Zeichenkodierung einer Antwort deklariert (Content-Type: text/html; charset=utf-8). Überschreibt alle …
CSS properties (direction, writing-mode, unicode-bidi) controlling text layout direction. Works with Unicode …
CSS-Eigenschaft, die generierten Inhalt über die Pseudoelemente ::before und ::after mit Unicode-Escapes …
Darstellung eines Zeichens als farbiges Emoji-Glyph, typischerweise mit Variationsselektor 16 (U+FE0F). Einige …
Eine textuelle Darstellung eines Zeichens in HTML. Drei Formen: benannt (&), dezimal …
Domainnamen mit Nicht-ASCII-Unicode-Zeichen, intern als Punycode (xn--...) gespeichert, aber den Nutzern in …
ECMAScript Internationalization API providing locale-aware string comparison (Collator), number formatting (NumberFormat), date …
Kodierung von Nicht-ASCII- und reservierten Zeichen in URLs durch Ersetzen jedes Bytes …
ASCII-kompatible Kodierung von Unicode-Domainnamen, bei der internationalisierte Labels in xn--präfixierte ASCII-Zeichenketten umgewandelt …