What is 結合文字?

直前の基本文字に付いてそれを修飾する文字。一般カテゴリ：Mn（非スペース）・Mc（スペース結合）・Me（囲み）。例：◌́（U+0301 合成アキュートアクセント）。

What is Unicode 正規化?

Unicodeテキストを標準的な正規形に変換するプロセス。4つの形式：NFC（合成）、NFD（分解）、NFKC（互換合成）、NFKD（互換分解）。

What is 正規等価?

意味的に同一で等価として扱われるべき2つの文字シーケンス。例：é（U+00E9）≡ e + ◌́（U+0065 + U+0301）。

プロパティ

結合クラス

正規分解時の結合記号の順序を制御する数値（0〜254）で、どの結合記号を並べ替えられるかを決定します。

2022-02-21 · Updated 2024-06-11

What Is the Canonical Combining Class?

The Canonical Combining Class (CCC) is an integer property (range 0–240) assigned to every Unicode character. It specifies how combining marks—characters that attach to a preceding base character—are reordered relative to one another during Unicode Normalization. Most base characters and non-combining characters have CCC = 0 (Non-combining). Combining diacritical marks carry non-zero values that determine their stacking order.

The core rule is the Canonical Ordering Algorithm: when two adjacent combining marks both have non-zero CCC values, the one with the lower value is placed closer to the base character in the normalized form. Two marks with equal non-zero CCC values are considered blocked and their relative order is preserved.

CCC in Practice

Consider the letter a with two diacritics: a cedilla (CCC=202) and an ogonek (CCC=202 as well). Because they share the same CCC, their order is kept stable. But an above-combining mark like combining breve (CCC=228) and a below-combining mark like combining macron below (CCC=220) would sort by their values during normalization, placing the CCC=220 mark before the CCC=228 mark in NFD.

import unicodedata

marks = [
    ("\u0300", "COMBINING GRAVE ACCENT"),        # CCC=230
    ("\u0327", "COMBINING CEDILLA"),              # CCC=202
    ("\u0328", "COMBINING OGONEK"),               # CCC=202
    ("\u0331", "COMBINING MACRON BELOW"),         # CCC=220
    ("\u0952", "DEVANAGARI STRESS SIGN ANUDATTA"),# CCC=220
]

for char, name in marks:
    ccc = unicodedata.combining(char)
    print(f"  CCC={ccc:3}  {name}")

# CCC=230  COMBINING GRAVE ACCENT
# CCC=202  COMBINING CEDILLA
# CCC=202  COMBINING OGONEK
# CCC=220  COMBINING MACRON BELOW
# CCC=220  DEVANAGARI STRESS SIGN ANUDATTA

# Normalization puts the sequence into canonical order:
text = "a\u0328\u0300"   # a + ogonek (CCC=202) + grave (CCC=230)
nfd = unicodedata.normalize("NFD", text)
# NFD preserves order here because 202 < 230, ogonek stays first
print([f"U+{ord(c):04X}" for c in nfd])
# ['U+0061', 'U+0328', 'U+0300']

Named CCC Values

A few CCC values have names defined in the standard: 0 (Not_Reordered), 1 (Overlay), 6 (Han_Reading), 7 (Nukta), 8 (Kana_Voicing), 9 (Virama), and 10 (CCC10) through 199 (CCC199) for specific positioning classes. Values 200–240 are used for particular combining categories such as Below (CCC=220), Above (CCC=230), and Double_Below (CCC=233).

Quick Facts

Property	Value
Unicode property name	`Canonical_Combining_Class`
Short alias	`ccc`
Range	0–240 (not all values used)
Value 0	Base characters, non-combining
Python function	`unicodedata.combining(char)` → integer
Key use	NFD/NFC canonical ordering during normalization
Spec reference	Unicode Standard Section 3.11, UAX #15