결합 클래스
정규 분해 과정에서 결합 기호의 순서를 제어하는 수치 값(0~254)으로, 어떤 결합 기호를 재정렬할 수 있는지 결정합니다.
What Is the Canonical Combining Class?
The Canonical Combining Class (CCC) is an integer property (range 0–240) assigned to every Unicode character. It specifies how combining marks—characters that attach to a preceding base character—are reordered relative to one another during Unicode Normalization. Most base characters and non-combining characters have CCC = 0 (Non-combining). Combining diacritical marks carry non-zero values that determine their stacking order.
The core rule is the Canonical Ordering Algorithm: when two adjacent combining marks both have non-zero CCC values, the one with the lower value is placed closer to the base character in the normalized form. Two marks with equal non-zero CCC values are considered blocked and their relative order is preserved.
CCC in Practice
Consider the letter a with two diacritics: a cedilla (CCC=202) and an ogonek (CCC=202 as well). Because they share the same CCC, their order is kept stable. But an above-combining mark like combining breve (CCC=228) and a below-combining mark like combining macron below (CCC=220) would sort by their values during normalization, placing the CCC=220 mark before the CCC=228 mark in NFD.
import unicodedata
marks = [
("\u0300", "COMBINING GRAVE ACCENT"), # CCC=230
("\u0327", "COMBINING CEDILLA"), # CCC=202
("\u0328", "COMBINING OGONEK"), # CCC=202
("\u0331", "COMBINING MACRON BELOW"), # CCC=220
("\u0952", "DEVANAGARI STRESS SIGN ANUDATTA"),# CCC=220
]
for char, name in marks:
ccc = unicodedata.combining(char)
print(f" CCC={ccc:3} {name}")
# CCC=230 COMBINING GRAVE ACCENT
# CCC=202 COMBINING CEDILLA
# CCC=202 COMBINING OGONEK
# CCC=220 COMBINING MACRON BELOW
# CCC=220 DEVANAGARI STRESS SIGN ANUDATTA
# Normalization puts the sequence into canonical order:
text = "a\u0328\u0300" # a + ogonek (CCC=202) + grave (CCC=230)
nfd = unicodedata.normalize("NFD", text)
# NFD preserves order here because 202 < 230, ogonek stays first
print([f"U+{ord(c):04X}" for c in nfd])
# ['U+0061', 'U+0328', 'U+0300']
Named CCC Values
A few CCC values have names defined in the standard: 0 (Not_Reordered), 1 (Overlay), 6 (Han_Reading), 7 (Nukta), 8 (Kana_Voicing), 9 (Virama), and 10 (CCC10) through 199 (CCC199) for specific positioning classes. Values 200–240 are used for particular combining categories such as Below (CCC=220), Above (CCC=230), and Double_Below (CCC=233).
Quick Facts
| Property | Value |
|---|---|
| Unicode property name | Canonical_Combining_Class |
| Short alias | ccc |
| Range | 0–240 (not all values used) |
| Value 0 | Base characters, non-combining |
| Python function | unicodedata.combining(char) → integer |
| Key use | NFD/NFC canonical ordering during normalization |
| Spec reference | Unicode Standard Section 3.11, UAX #15 |
관련 용어
속성의 더 많은 용어
Unicode property (UAX#11) classifying characters as Narrow, Wide, Fullwidth, Halfwidth, Ambiguous, or …
Unicode property controlling how Arabic and Syriac characters connect to adjacent characters. …
Unicode property listing all scripts that use a character, broader than the …
마침표, 쉼표, 대시, 따옴표 등 문어를 구성하고 명료하게 하는 데 사용되는 문자. …
지원하지 않는 프로세스에서 눈에 보이는 효과 없이 무시할 수 있는 문자로, 이형 …
문자가 처음 할당된 유니코드 버전. 시스템 및 소프트웨어 버전 간의 문자 지원 …
문자를 대문자, 소문자, 제목 대문자로 변환하는 규칙. 로케일에 따라 달라질 수 있으며(터키어 …
RTL 문맥에서 글리프를 수평으로 반전해야 하는 문자. 예: ( → ), [ …
문자가 속한 문자 체계(예: 라틴, 키릴, 한자). Unicode 16.0은 168개의 문자 체계를 …
문자를 구성 요소로 분해하는 매핑. 정규 분해는 의미를 보존(é → e + …