タイポグラフィ

結合文字

直前の基本文字に付いてそれを修飾する文字。一般カテゴリ:Mn(非スペース)・Mc(スペース結合)・Me(囲み)。例:◌́(U+0301 合成アキュートアクセント)。

· Updated

What is a Combining Character?

A combining character is a Unicode character that has no independent visual form of its own — instead, it attaches to and modifies the preceding character (called the base character). Combining characters implement diacritical marks, tone marks, vowel signs, and other modifier symbols that, in Unicode's model, are logically separate from the letter they modify.

The key insight is that Unicode separates the identity of a character from its rendering. A letter with a diacritic can be represented either as a single precomposed code point (é = U+00E9) or as a base letter followed by a combining mark (e + ◌́ = U+0065 U+0301). Both sequences represent the same abstract character and produce the same rendered output.

How Combining Characters Work

Combining characters have a General Category of Mn (Non-spacing Mark), Mc (Spacing Mark), or Me (Enclosing Mark). Non-spacing marks are the most common — they occupy zero advance width and position themselves relative to the base character's glyph using the font's anchor points.

When a text shaping engine encounters a base character followed by one or more combining marks, it: 1. Retrieves the base glyph from the font 2. Positions each combining glyph at the appropriate anchor (top, bottom, left, right) 3. Renders them as a single grapheme cluster

Multiple combining characters can stack on a single base:

a + ◌̂ + ◌̄ = â̄  (a with circumflex and macron)

Unicode defines a canonical ordering for combining marks based on their Combining Class value (0–255). Marks with lower combining class values (e.g., below-base nuktas) appear before marks with higher values (e.g., above-base accents) in normalized text.

Important Combining Character Ranges

Range Name Contents
U+0300–U+036F Combining Diacritical Marks Accents, umlauts, tildes
U+0591–U+05C7 Hebrew Cantillation/Vowels Nikud, cantillation marks
U+064B–U+065F Arabic Diacritics Harakat (vowel marks)
U+1AB0–U+1AFF Combining Diacritical Marks Extended Extended phonetic use
U+1DC0–U+1DFF Combining Diacritical Marks Supplement Additional marks
U+20D0–U+20FF Combining Diacritical Marks for Symbols Used with math symbols

Grapheme Clusters

A base character plus all its combining marks form a grapheme cluster — the unit that users perceive as a single character. Programming languages must account for this:

import unicodedata

# "e" + combining acute = é
s = "e\u0301"
print(len(s))           # 2 (two code points)
print(s)                # é (looks like 1 character)

# Precomposed é
s2 = "\u00e9"
print(len(s2))          # 1 (one code point)
print(s == s2)          # False! Different code points

# Normalize to compare
import unicodedata
print(unicodedata.normalize("NFC", s) == s2)  # True

JavaScript's Intl.Segmenter and Swift's String.count handle grapheme clusters correctly; many other APIs count code points instead.

Quick Facts

Property Value
Unicode category Mn (Non-spacing Mark), Mc (Spacing Mark), Me (Enclosing Mark)
Main combining block U+0300–U+036F (112 characters)
Combining class range 0 (base) to 255 (various positions)
Max stacking No hard limit; practical fonts support 2–4 layers
Normalization NFC = precomposed preferred; NFD = fully decomposed
Grapheme cluster API Python: regex module; JS: Intl.Segmenter; Swift: String
Visual indicator in charts Often shown as ◌ (dotted circle) placeholder

関連用語

タイポグラフィ のその他の用語

CSS unicode-range

CSS @font-face descriptor specifying which Unicode code points a font should cover. …

Em / En(タイポグラフィ単位)

Em:フォントサイズと等しい幅。En:Emの半分。エムダッシュ幅・エムスペース・エンスペース・CSSユニット(1em・0.5em)の定義に使われます。

Font Fallback

The mechanism by which a rendering engine substitutes glyphs from a secondary …

OpenType

Modern font format developed by Microsoft and Adobe supporting up to 65,535 …

RTL(右から左)

文字が右から左に流れるテキスト方向。アラビア語・ヘブライ語・ターナ文字などで使われ、正しい表示のために双方向アルゴリズムが必要です。

Web Fonts

Fonts downloaded by the browser to render text, declared via CSS @font-face. …

カーニング

視覚的な調和のために特定の文字ペア(例:AV・To・LT)間のスペーシングを調整すること。Unicodeの概念ではなくフォント機能ですが、Unicodeテキストのレンダリングに影響します。

グリフ

フォントによってレンダリングされる文字の視覚的表現。1つの文字が複数のグリフを持つ場合があり(合字・文脈形態)、1つのグリフが複数の文字を表す場合もあります。

スモールキャップス

小文字の高さの大文字字形。CSS:font-variant: small-caps。Unicodeにはラテン拡張(ᴀ〜ᴢ)に実際のスモールキャップス文字があります。

ゼロ幅文字

前進幅がゼロの文字 — レンダリングでは見えませんがテキスト動作に影響します。ZWSP(単語区切り)・ZWJ(結合)・ZWNJ(結合防止)・WJ(改行防止)などがあります。