What is نظام الكتابة?

نظام الكتابة الذي ينتمي إليه الحرف (مثل Latin أو Cyrillic أو Han)؛ يُعرّف Unicode 16.0 ما مجموعه 168 نظام كتابة، وخاصية Script أساسية للأمان والكشف عن النصوص المختلطة.

What is الفئة العامة?

تصنيف كل نقطة رمز إلى واحدة من 30 فئة (Lu, Ll, Nd, So، إلخ) مجمّعة في 7 فئات رئيسية: Letter, Mark, Number, Punctuation, Symbol, Separator, Other.

What is Unicode Character Database (UCD)?

مجموعة ملفات بيانات قابلة للقراءة آليًا تحدد جميع خصائص أحرف Unicode، بما في ذلك UnicodeData.txt وBlocks.txt وScripts.txt وغيرها الكثير.

الخصائص

كتلة

نطاق متصل مسمّى من نقاط الرموز (مثلاً Basic Latin = U+0000–U+007F)؛ يُعرّف Unicode 16.0 ما مجموعه 336 كتلة، وكل نقطة رمز تنتمي إلى كتلة واحدة بالضبط.

2022-01-10 · Updated 2024-07-29

What Is a Unicode Block?

A Unicode block is a named, contiguous range of code points assigned to a related group of characters. The Unicode Standard divides the entire code point space (U+0000 to U+10FFFF) into 336 blocks, each a multiple of 16 code points in size. Block boundaries are fixed—they never change between Unicode versions—though new blocks may be allocated in previously unassigned ranges.

Each block has a distinctive name that broadly describes its contents: Basic Latin (U+0000–U+007F), Greek and Coptic (U+0370–U+03FF), or CJK Unified Ideographs (U+4E00–U+9FFF). The name is informational rather than prescriptive, so a block may contain characters from multiple scripts, or even unassigned code points.

Blocks vs. Scripts

Blocks are purely positional: a character belongs to exactly one block based on its numeric value. Scripts, by contrast, reflect linguistic or cultural affiliation. The Latin Extended Additional block (U+1E00–U+1EFF) contains Latin characters, but the Letterlike Symbols block (U+2100–U+214F) holds characters from many scripts such as ℂ (DOUBLE-STRUCK CAPITAL C) and ℓ (SCRIPT SMALL L). A single block can span multiple scripts, and one script can span multiple blocks.

import unicodedata

# Look up the block of a character using the Unicode data utilities
# Python's unicodedata module does not expose block directly,
# but you can derive it from the character name prefix or use the
# 'unicode_data' third-party package.

for char in ["A", "α", "中", "😀"]:
    name = unicodedata.name(char, "<unnamed>")
    cp = ord(char)
    print(f"U+{cp:04X}  {char}  {name}")

# U+0041  A  LATIN CAPITAL LETTER A       → Basic Latin
# U+03B1  α  GREEK SMALL LETTER ALPHA     → Greek and Coptic
# U+4E2D  中 CJK UNIFIED IDEOGRAPH-4E2D   → CJK Unified Ideographs
# U+1F600 😀 GRINNING FACE                → Emoticons

Why Blocks Matter

Blocks appear in Unicode Character Ranges used by CSS (@font-face unicode-range), regular expressions (\p{Block=CJK_Unified_Ideographs} in Perl, PCRE, or Java), and font subsetting tools. Knowing a character's block helps font engineers decide which code-point ranges to include in a subset, reducing file size while preserving coverage.

Block assignments also guide rendering engines. For example, shaping engines like HarfBuzz use block membership as one heuristic when selecting a shaping script when no explicit script tag is available.

Quick Facts

Property	Value
Unicode property name	`Block`
Short property alias	`blk`
Number of defined blocks (Unicode 15.1)	336
Smallest block	16 code points (many Supplement blocks)
Largest block	CJK Unified Ideographs, 20,902 assigned (U+4E00–U+9FFF, 8,192 range)
Python access	No built-in; use `unicodedata.name()` + range lookup or `unicodeblock` package
Regex syntax	`\p{Block=Basic_Latin}` (PCRE/Perl/Java)
Spec reference	Unicode Standard Annex #44, `Blocks.txt`

المصطلحات ذات الصلة

نظام الكتابة الفئة العامة Unicode Character Database (UCD)