كتلة
نطاق متصل مسمّى من نقاط الرموز (مثلاً Basic Latin = U+0000–U+007F)؛ يُعرّف Unicode 16.0 ما مجموعه 336 كتلة، وكل نقطة رمز تنتمي إلى كتلة واحدة بالضبط.
What Is a Unicode Block?
A Unicode block is a named, contiguous range of code points assigned to a related group of characters. The Unicode Standard divides the entire code point space (U+0000 to U+10FFFF) into 336 blocks, each a multiple of 16 code points in size. Block boundaries are fixed—they never change between Unicode versions—though new blocks may be allocated in previously unassigned ranges.
Each block has a distinctive name that broadly describes its contents: Basic Latin (U+0000–U+007F), Greek and Coptic (U+0370–U+03FF), or CJK Unified Ideographs (U+4E00–U+9FFF). The name is informational rather than prescriptive, so a block may contain characters from multiple scripts, or even unassigned code points.
Blocks vs. Scripts
Blocks are purely positional: a character belongs to exactly one block based on its numeric value. Scripts, by contrast, reflect linguistic or cultural affiliation. The Latin Extended Additional block (U+1E00–U+1EFF) contains Latin characters, but the Letterlike Symbols block (U+2100–U+214F) holds characters from many scripts such as ℂ (DOUBLE-STRUCK CAPITAL C) and ℓ (SCRIPT SMALL L). A single block can span multiple scripts, and one script can span multiple blocks.
import unicodedata
# Look up the block of a character using the Unicode data utilities
# Python's unicodedata module does not expose block directly,
# but you can derive it from the character name prefix or use the
# 'unicode_data' third-party package.
for char in ["A", "α", "中", "😀"]:
name = unicodedata.name(char, "<unnamed>")
cp = ord(char)
print(f"U+{cp:04X} {char} {name}")
# U+0041 A LATIN CAPITAL LETTER A → Basic Latin
# U+03B1 α GREEK SMALL LETTER ALPHA → Greek and Coptic
# U+4E2D 中 CJK UNIFIED IDEOGRAPH-4E2D → CJK Unified Ideographs
# U+1F600 😀 GRINNING FACE → Emoticons
Why Blocks Matter
Blocks appear in Unicode Character Ranges used by CSS (@font-face unicode-range), regular expressions (\p{Block=CJK_Unified_Ideographs} in Perl, PCRE, or Java), and font subsetting tools. Knowing a character's block helps font engineers decide which code-point ranges to include in a subset, reducing file size while preserving coverage.
Block assignments also guide rendering engines. For example, shaping engines like HarfBuzz use block membership as one heuristic when selecting a shaping script when no explicit script tag is available.
Quick Facts
| Property | Value |
|---|---|
| Unicode property name | Block |
| Short property alias | blk |
| Number of defined blocks (Unicode 15.1) | 336 |
| Smallest block | 16 code points (many Supplement blocks) |
| Largest block | CJK Unified Ideographs, 20,902 assigned (U+4E00–U+9FFF, 8,192 range) |
| Python access | No built-in; use unicodedata.name() + range lookup or unicodeblock package |
| Regex syntax | \p{Block=Basic_Latin} (PCRE/Perl/Java) |
| Spec reference | Unicode Standard Annex #44, Blocks.txt |
المصطلحات ذات الصلة
المزيد في الخصائص
Unicode property (UAX#11) classifying characters as Narrow, Wide, Fullwidth, Halfwidth, Ambiguous, or …
Unicode property controlling how Arabic and Syriac characters connect to adjacent characters. …
Unicode property listing all scripts that use a character, broader than the …
أسماء بديلة للأحرف، نظرًا لأن أسماء Unicode لا يمكن تغييرها وفقًا لسياسة …
تحويل الحرف إلى مكوناته؛ التفكيك الكنسي يحافظ على المعنى (é → e …
تسلسلان من الأحرف متطابقان دلاليًا ويجب معاملتهما كمتساويين؛ مثال: é (U+00E9) ≡ …
تصنيف كل نقطة رمز إلى واحدة من 30 فئة (Lu, Ll, Nd, …
التفسير الرقمي للحرف إن وُجد: قيمة الرقم (0–9)، قيمة عشرية، أو قيمة …
قواعد تحويل الأحرف بين الأحرف الكبيرة والصغيرة وأحرف العنوان؛ قد تعتمد على …
تسلسلان من الأحرف لهما نفس المحتوى المجرد لكن قد يختلفان في المظهر؛ …