مرجع الحرف الرقمي
كيان HTML باستخدام رقم نقطة ترميز Unicode: عشري (© → ©) أو ست عشري (© → ©). يعمل مع أي حرف Unicode، بخلاف المراجع المسماة.
What Are Numeric Character References?
Numeric character references (NCRs) are HTML escape sequences that represent any Unicode character by its code point number. They take two forms:
- Decimal:
&#N;where N is a base-10 integer — e.g.,©for © - Hexadecimal:
&#xH;where H is a base-16 integer — e.g.,©for ©
Both forms refer to the Unicode scalar value of the character. Since Unicode covers over 1.1 million code points, NCRs can represent virtually any character ever assigned — from basic Latin letters to rare CJK ideographs and emoji — using only ASCII characters in the source.
Decimal vs. Hexadecimal
Decimal NCRs (&#N;) are straightforward for readers who know the decimal code points of common characters (65 = 'A', 169 = '©'). Hexadecimal NCRs (&#xH;) align with how Unicode code points are conventionally written — U+00A9 maps directly to ©. When working with Unicode documentation or character tables that list code points in hex, the hex form is easier to use without mental conversion.
<!-- These are identical -->
A = A = A
© = © = ©
€ = € = €
😀 = 😀 = 😀
Valid Range
Valid code points for NCRs are: 1–55295 (U+0001–U+D7FF) and 57344–1114111 (U+E000–U+10FFFF). The surrogate range U+D800–U+DFFF is invalid and must not be encoded. U+0000 (NULL) is also excluded. Browsers may render other disallowed code points (such as U+0001–U+001F control characters) as the replacement character U+FFFD.
Supplementary Characters
NCRs fully support Unicode supplementary characters (code points above U+FFFF). In UTF-16 these require surrogate pairs, but in HTML you write a single NCR:
<!-- U+1F4A9 PILE OF POO — supplementary character -->
💩 <!-- decimal -->
💩 <!-- hex -->
This is one advantage of NCRs over raw UTF-16 encoding in old environments.
Practical Use
<!-- Escaping in content -->
<p>The formula is a < b < c</p>
<!-- < and < both render as < -->
<!-- Characters outside keyboard reach -->
<p>The currency symbol is ₹ (Indian Rupee)</p>
<!-- In HTML attributes -->
<input placeholder="Enter ❤ here">
<!-- Emoji -->
<title>Unicode Guide 📚</title>
# Python: convert character to NCR
char = "©"
f"&#{ord(char)};" # "©"
f"&#x{ord(char):X};" # "©"
# Python: decode NCR
import html
html.unescape("©") # "©"
html.unescape("©") # "©"
NCRs vs. Named References vs. Direct Characters
| Approach | Example | Readability | Coverage |
|---|---|---|---|
| Named reference | © |
High (for known names) | 2,231 characters |
| Decimal NCR | © |
Medium | All Unicode |
| Hex NCR | © |
Medium (for Unicode users) | All Unicode |
| Direct UTF-8 | © |
Highest | All Unicode |
In modern UTF-8 documents, direct characters are preferred. NCRs remain valuable in legacy ASCII environments and when generating HTML programmatically.
Quick Facts
| Property | Value |
|---|---|
| Decimal syntax | &#N; (N is base-10 code point) |
| Hex syntax | &#xH; or &#XH; (H is base-16 code point) |
| Valid range | U+0001–U+D7FF and U+E000–U+10FFFF |
| Covers all Unicode | Yes — any assigned code point |
| Surrogates allowed | No — invalid in HTML |
| Case of hex digits | Case-insensitive: © = © |
| Trailing semicolon | Required; optional only in certain legacy contexts |
المصطلحات ذات الصلة
المزيد في الويب و HTML
CSS properties (direction, writing-mode, unicode-bidi) controlling text layout direction. Works with Unicode …
أسماء نطاقات تحتوي على أحرف Unicode غير ASCII، مخزنة داخليًا كـ Punycode …
ECMAScript Internationalization API providing locale-aware string comparison (Collator), number formatting (NumberFormat), date …
ترميز متوافق مع ASCII لأسماء نطاقات Unicode، محولاً التسميات الدولية إلى نصوص …
CSS supports Unicode via escape sequences (\2713 for ✓), the content property …
ترميز أحرف غير ASCII والأحرف المحجوزة في URLs باستبدال كل بايت بـ …
خاصية CSS لإدراج محتوى مُولّد عبر عناصر pseudo ::before و::after باستخدام Unicode …
عرض الحرف بشكل إيموجي ملون، عادةً باستخدام Variation Selector 16 (U+FE0F). بعض …
عرض الحرف بشكل رسومي نصي أحادي اللون بدلاً من إيموجي ملون، عادةً …
تمثيل نصي لحرف في HTML. ثلاثة أشكال: مسمى (&)، عشري (&)، ست …