数値文字参照
Unicodeコードポイント番号を使うHTMLエンティティ:十進数(© → ©)または16進数(© → ©)。名前付き参照と異なり、あらゆるUnicode文字に使えます。
What Are Numeric Character References?
Numeric character references (NCRs) are HTML escape sequences that represent any Unicode character by its code point number. They take two forms:
- Decimal:
&#N;where N is a base-10 integer — e.g.,©for © - Hexadecimal:
&#xH;where H is a base-16 integer — e.g.,©for ©
Both forms refer to the Unicode scalar value of the character. Since Unicode covers over 1.1 million code points, NCRs can represent virtually any character ever assigned — from basic Latin letters to rare CJK ideographs and emoji — using only ASCII characters in the source.
Decimal vs. Hexadecimal
Decimal NCRs (&#N;) are straightforward for readers who know the decimal code points of common characters (65 = 'A', 169 = '©'). Hexadecimal NCRs (&#xH;) align with how Unicode code points are conventionally written — U+00A9 maps directly to ©. When working with Unicode documentation or character tables that list code points in hex, the hex form is easier to use without mental conversion.
<!-- These are identical -->
A = A = A
© = © = ©
€ = € = €
😀 = 😀 = 😀
Valid Range
Valid code points for NCRs are: 1–55295 (U+0001–U+D7FF) and 57344–1114111 (U+E000–U+10FFFF). The surrogate range U+D800–U+DFFF is invalid and must not be encoded. U+0000 (NULL) is also excluded. Browsers may render other disallowed code points (such as U+0001–U+001F control characters) as the replacement character U+FFFD.
Supplementary Characters
NCRs fully support Unicode supplementary characters (code points above U+FFFF). In UTF-16 these require surrogate pairs, but in HTML you write a single NCR:
<!-- U+1F4A9 PILE OF POO — supplementary character -->
💩 <!-- decimal -->
💩 <!-- hex -->
This is one advantage of NCRs over raw UTF-16 encoding in old environments.
Practical Use
<!-- Escaping in content -->
<p>The formula is a < b < c</p>
<!-- < and < both render as < -->
<!-- Characters outside keyboard reach -->
<p>The currency symbol is ₹ (Indian Rupee)</p>
<!-- In HTML attributes -->
<input placeholder="Enter ❤ here">
<!-- Emoji -->
<title>Unicode Guide 📚</title>
# Python: convert character to NCR
char = "©"
f"&#{ord(char)};" # "©"
f"&#x{ord(char):X};" # "©"
# Python: decode NCR
import html
html.unescape("©") # "©"
html.unescape("©") # "©"
NCRs vs. Named References vs. Direct Characters
| Approach | Example | Readability | Coverage |
|---|---|---|---|
| Named reference | © |
High (for known names) | 2,231 characters |
| Decimal NCR | © |
Medium | All Unicode |
| Hex NCR | © |
Medium (for Unicode users) | All Unicode |
| Direct UTF-8 | © |
Highest | All Unicode |
In modern UTF-8 documents, direct characters are preferred. NCRs remain valuable in legacy ASCII environments and when generating HTML programmatically.
Quick Facts
| Property | Value |
|---|---|
| Decimal syntax | &#N; (N is base-10 code point) |
| Hex syntax | &#xH; or &#XH; (H is base-16 code point) |
| Valid range | U+0001–U+D7FF and U+E000–U+10FFFF |
| Covers all Unicode | Yes — any assigned code point |
| Surrogates allowed | No — invalid in HTML |
| Case of hex digits | Case-insensitive: © = © |
| Trailing semicolon | Required; optional only in certain legacy contexts |
関連用語
Web & HTML のその他の用語
レスポンスの文字エンコーディングを宣言するHTTPヘッダーパラメータ(Content-Type: text/html; charset=utf-8)。ドキュメント内のエンコーディング宣言より優先されます。
::beforeおよび::after疑似要素でUnicodeエスケープを使って生成コンテンツを挿入するCSSプロパティ:content: '\2713'は✓を挿入します。
CSS properties (direction, writing-mode, unicode-bidi) controlling text layout direction. Works with Unicode …
HTMLで文字をテキスト表現する方式。3つの形式:名前(&)・十進数(&)・16進数(&)。HTMLの構文と衝突する文字に必須です。
ECMAScript Internationalization API providing locale-aware string comparison (Collator), number formatting (NumberFormat), date …
Unicode ドメイン名をxn--プレフィックス付きのASCII文字列に変換するASCII互換エンコーディング。münchen.de → xn--mnchen-3ya.de。
CSS supports Unicode via escape sequences (\2713 for ✓), the content property …
XMLバージョンの数値文字参照:✓または✓。XMLには名前付きエンティティが5個(& < > " ')しかありませんが、HTML5は2,231個あります。
デフォルトの絵文字表示の代わりに、通常は異体字セレクター15(U+FE0E)を使って文字をモノクロのテキストグリフでレンダリングすること。
URLの非ASCII文字と予約文字を各バイトを%XXで置き換えてエンコードします。まずUTF-8に変換し、各バイトをパーセントエンコードします:é → %C3%A9。