Combining Character
A character that attaches to the preceding base character to modify it. General Category: Mn (nonspacing), Mc (spacing combining), Me (enclosing). Example: ◌́ (U+0301 Combining Acute).
What is a Combining Character?
A combining character is a Unicode character that has no independent visual form of its own — instead, it attaches to and modifies the preceding character (called the base character). Combining characters implement diacritical marks, tone marks, vowel signs, and other modifier symbols that, in Unicode's model, are logically separate from the letter they modify.
The key insight is that Unicode separates the identity of a character from its rendering. A letter with a diacritic can be represented either as a single precomposed code point (é = U+00E9) or as a base letter followed by a combining mark (e + ◌́ = U+0065 U+0301). Both sequences represent the same abstract character and produce the same rendered output.
How Combining Characters Work
Combining characters have a General Category of Mn (Non-spacing Mark), Mc (Spacing Mark), or Me (Enclosing Mark). Non-spacing marks are the most common — they occupy zero advance width and position themselves relative to the base character's glyph using the font's anchor points.
When a text shaping engine encounters a base character followed by one or more combining marks, it: 1. Retrieves the base glyph from the font 2. Positions each combining glyph at the appropriate anchor (top, bottom, left, right) 3. Renders them as a single grapheme cluster
Multiple combining characters can stack on a single base:
a + ◌̂ + ◌̄ = â̄ (a with circumflex and macron)
Unicode defines a canonical ordering for combining marks based on their Combining Class value (0–255). Marks with lower combining class values (e.g., below-base nuktas) appear before marks with higher values (e.g., above-base accents) in normalized text.
Important Combining Character Ranges
| Range | Name | Contents |
|---|---|---|
| U+0300–U+036F | Combining Diacritical Marks | Accents, umlauts, tildes |
| U+0591–U+05C7 | Hebrew Cantillation/Vowels | Nikud, cantillation marks |
| U+064B–U+065F | Arabic Diacritics | Harakat (vowel marks) |
| U+1AB0–U+1AFF | Combining Diacritical Marks Extended | Extended phonetic use |
| U+1DC0–U+1DFF | Combining Diacritical Marks Supplement | Additional marks |
| U+20D0–U+20FF | Combining Diacritical Marks for Symbols | Used with math symbols |
Grapheme Clusters
A base character plus all its combining marks form a grapheme cluster — the unit that users perceive as a single character. Programming languages must account for this:
import unicodedata
# "e" + combining acute = é
s = "e\u0301"
print(len(s)) # 2 (two code points)
print(s) # é (looks like 1 character)
# Precomposed é
s2 = "\u00e9"
print(len(s2)) # 1 (one code point)
print(s == s2) # False! Different code points
# Normalize to compare
import unicodedata
print(unicodedata.normalize("NFC", s) == s2) # True
JavaScript's Intl.Segmenter and Swift's String.count handle grapheme clusters correctly; many other APIs count code points instead.
Quick Facts
| Property | Value |
|---|---|
| Unicode category | Mn (Non-spacing Mark), Mc (Spacing Mark), Me (Enclosing Mark) |
| Main combining block | U+0300–U+036F (112 characters) |
| Combining class range | 0 (base) to 255 (various positions) |
| Max stacking | No hard limit; practical fonts support 2–4 layers |
| Normalization | NFC = precomposed preferred; NFD = fully decomposed |
| Grapheme cluster API | Python: regex module; JS: Intl.Segmenter; Swift: String |
| Visual indicator in charts | Often shown as ◌ (dotted circle) placeholder |
Related Terms
More in Typography
CSS @font-face descriptor specifying which Unicode code points a font should cover. …
Punctuation marks used to separate parts of a sentence or indicate ranges. …
A mark added to a letter to change pronunciation or meaning. Can …
U+2026 HORIZONTAL ELLIPSIS (…). A single character replacing three periods, typographically correct …
Em: a width equal to the font size. En: half an em. …
A specific implementation of a typeface at a particular size, weight, and …
The mechanism by which a rendering engine substitutes glyphs from a secondary …
The visual representation of a character as rendered by a font. One …
Adjusting the spacing between specific character pairs for visual harmony (e.g., AV, …
Two or more characters joined into a single glyph. Can be typographic …