블록
명명된 연속 코드 포인트 범위(예: 기본 라틴 = U+0000~U+007F). Unicode 16.0은 336개 블록을 정의하며, 모든 코드 포인트는 정확히 하나의 블록에 속합니다.
What Is a Unicode Block?
A Unicode block is a named, contiguous range of code points assigned to a related group of characters. The Unicode Standard divides the entire code point space (U+0000 to U+10FFFF) into 336 blocks, each a multiple of 16 code points in size. Block boundaries are fixed—they never change between Unicode versions—though new blocks may be allocated in previously unassigned ranges.
Each block has a distinctive name that broadly describes its contents: Basic Latin (U+0000–U+007F), Greek and Coptic (U+0370–U+03FF), or CJK Unified Ideographs (U+4E00–U+9FFF). The name is informational rather than prescriptive, so a block may contain characters from multiple scripts, or even unassigned code points.
Blocks vs. Scripts
Blocks are purely positional: a character belongs to exactly one block based on its numeric value. Scripts, by contrast, reflect linguistic or cultural affiliation. The Latin Extended Additional block (U+1E00–U+1EFF) contains Latin characters, but the Letterlike Symbols block (U+2100–U+214F) holds characters from many scripts such as ℂ (DOUBLE-STRUCK CAPITAL C) and ℓ (SCRIPT SMALL L). A single block can span multiple scripts, and one script can span multiple blocks.
import unicodedata
# Look up the block of a character using the Unicode data utilities
# Python's unicodedata module does not expose block directly,
# but you can derive it from the character name prefix or use the
# 'unicode_data' third-party package.
for char in ["A", "α", "中", "😀"]:
name = unicodedata.name(char, "<unnamed>")
cp = ord(char)
print(f"U+{cp:04X} {char} {name}")
# U+0041 A LATIN CAPITAL LETTER A → Basic Latin
# U+03B1 α GREEK SMALL LETTER ALPHA → Greek and Coptic
# U+4E2D 中 CJK UNIFIED IDEOGRAPH-4E2D → CJK Unified Ideographs
# U+1F600 😀 GRINNING FACE → Emoticons
Why Blocks Matter
Blocks appear in Unicode Character Ranges used by CSS (@font-face unicode-range), regular expressions (\p{Block=CJK_Unified_Ideographs} in Perl, PCRE, or Java), and font subsetting tools. Knowing a character's block helps font engineers decide which code-point ranges to include in a subset, reducing file size while preserving coverage.
Block assignments also guide rendering engines. For example, shaping engines like HarfBuzz use block membership as one heuristic when selecting a shaping script when no explicit script tag is available.
Quick Facts
| Property | Value |
|---|---|
| Unicode property name | Block |
| Short property alias | blk |
| Number of defined blocks (Unicode 15.1) | 336 |
| Smallest block | 16 code points (many Supplement blocks) |
| Largest block | CJK Unified Ideographs, 20,902 assigned (U+4E00–U+9FFF, 8,192 range) |
| Python access | No built-in; use unicodedata.name() + range lookup or unicodeblock package |
| Regex syntax | \p{Block=Basic_Latin} (PCRE/Perl/Java) |
| Spec reference | Unicode Standard Annex #44, Blocks.txt |
관련 용어
속성의 더 많은 용어
Unicode property (UAX#11) classifying characters as Narrow, Wide, Fullwidth, Halfwidth, Ambiguous, or …
Unicode property controlling how Arabic and Syriac characters connect to adjacent characters. …
Unicode property listing all scripts that use a character, broader than the …
정규 분해 과정에서 결합 기호의 순서를 제어하는 수치 값(0~254)으로, 어떤 결합 기호를 …
마침표, 쉼표, 대시, 따옴표 등 문어를 구성하고 명료하게 하는 데 사용되는 문자. …
지원하지 않는 프로세스에서 눈에 보이는 효과 없이 무시할 수 있는 문자로, 이형 …
문자가 처음 할당된 유니코드 버전. 시스템 및 소프트웨어 버전 간의 문자 지원 …
문자를 대문자, 소문자, 제목 대문자로 변환하는 규칙. 로케일에 따라 달라질 수 있으며(터키어 …
RTL 문맥에서 글리프를 수평으로 반전해야 하는 문자. 예: ( → ), [ …
문자가 속한 문자 체계(예: 라틴, 키릴, 한자). Unicode 16.0은 168개의 문자 체계를 …