Bloco
Intervalo contíguo nomeado de pontos de código (por exemplo, Basic Latin = U+0000–U+007F). O Unicode 16.0 define 336 blocos; cada ponto de código pertence exatamente a um bloco.
What Is a Unicode Block?
A Unicode block is a named, contiguous range of code points assigned to a related group of characters. The Unicode Standard divides the entire code point space (U+0000 to U+10FFFF) into 336 blocks, each a multiple of 16 code points in size. Block boundaries are fixed—they never change between Unicode versions—though new blocks may be allocated in previously unassigned ranges.
Each block has a distinctive name that broadly describes its contents: Basic Latin (U+0000–U+007F), Greek and Coptic (U+0370–U+03FF), or CJK Unified Ideographs (U+4E00–U+9FFF). The name is informational rather than prescriptive, so a block may contain characters from multiple scripts, or even unassigned code points.
Blocks vs. Scripts
Blocks are purely positional: a character belongs to exactly one block based on its numeric value. Scripts, by contrast, reflect linguistic or cultural affiliation. The Latin Extended Additional block (U+1E00–U+1EFF) contains Latin characters, but the Letterlike Symbols block (U+2100–U+214F) holds characters from many scripts such as ℂ (DOUBLE-STRUCK CAPITAL C) and ℓ (SCRIPT SMALL L). A single block can span multiple scripts, and one script can span multiple blocks.
import unicodedata
# Look up the block of a character using the Unicode data utilities
# Python's unicodedata module does not expose block directly,
# but you can derive it from the character name prefix or use the
# 'unicode_data' third-party package.
for char in ["A", "α", "中", "😀"]:
name = unicodedata.name(char, "<unnamed>")
cp = ord(char)
print(f"U+{cp:04X} {char} {name}")
# U+0041 A LATIN CAPITAL LETTER A → Basic Latin
# U+03B1 α GREEK SMALL LETTER ALPHA → Greek and Coptic
# U+4E2D 中 CJK UNIFIED IDEOGRAPH-4E2D → CJK Unified Ideographs
# U+1F600 😀 GRINNING FACE → Emoticons
Why Blocks Matter
Blocks appear in Unicode Character Ranges used by CSS (@font-face unicode-range), regular expressions (\p{Block=CJK_Unified_Ideographs} in Perl, PCRE, or Java), and font subsetting tools. Knowing a character's block helps font engineers decide which code-point ranges to include in a subset, reducing file size while preserving coverage.
Block assignments also guide rendering engines. For example, shaping engines like HarfBuzz use block membership as one heuristic when selecting a shaping script when no explicit script tag is available.
Quick Facts
| Property | Value |
|---|---|
| Unicode property name | Block |
| Short property alias | blk |
| Number of defined blocks (Unicode 15.1) | 336 |
| Smallest block | 16 code points (many Supplement blocks) |
| Largest block | CJK Unified Ideographs, 20,902 assigned (U+4E00–U+9FFF, 8,192 range) |
| Python access | No built-in; use unicodedata.name() + range lookup or unicodeblock package |
| Regex syntax | \p{Block=Basic_Latin} (PCRE/Perl/Java) |
| Spec reference | Unicode Standard Annex #44, Blocks.txt |
Termos Relacionados
Mais em Propriedades
Nomes alternativos para caracteres, pois os nomes Unicode não podem mudar conforme …
Propriedade que determina como um caractere se comporta em texto bidirecional (LTR, …
Classificação de cada ponto de código em uma das 30 categorias (Lu, …
Valor numérico (0–254) que controla a ordenação de marcas de combinação durante …
O mapeamento de um caractere para suas partes componentes. A decomposição canônica …
Unicode property (UAX#11) classifying characters as Narrow, Wide, Fullwidth, Halfwidth, Ambiguous, or …
Duas sequências de caracteres que são semanticamente idênticas e devem ser tratadas …
Duas sequências de caracteres com o mesmo conteúdo abstrato que podem diferir …
O "caractere" percebido pelo usuário — o que parece uma única unidade. …
Caracteres que não devem ter nenhum efeito visível e podem ser ignorados …