Unicode for the Modern Web · บทที่ 6
Fonts and Rendering: Making It Look Right
Font fallback chains, system-ui, variable fonts, and missing glyph detection — this chapter covers everything you need to know about rendering Unicode characters correctly in web browsers.
A character exists in your database, travels correctly through your API, arrives in the browser's JavaScript engine with the right code points — and then the browser draws a small square. That square is "tofu": the visual placeholder rendered when the font has no glyph for a character. Understanding how browsers select fonts, how fonts encode Unicode coverage, and how to build robust fallback chains is the last mile of Unicode correctness: making it look right.
Unicode Coverage in Fonts
A font file contains glyphs — visual representations of characters — plus a cmap table that maps Unicode code points to glyph indices. No font covers all of Unicode's 149,813 assigned characters (as of Unicode 15.1). Even the comprehensive Noto project — Google's effort to eliminate tofu — distributes coverage across dozens of separate font files.
Typical coverage for common fonts:
| Font | Coverage |
|---|---|
| Arial | ~2,700 characters |
| Times New Roman | ~3,300 characters |
| DejaVu Sans | ~6,250 characters |
| Noto Sans (single file) | ~2,000–3,500 characters |
| All Noto Sans files combined | ~110,000+ characters |
You can inspect a font's Unicode coverage with tools:
from fontTools.ttLib import TTFont
font = TTFont('/System/Library/Fonts/Helvetica.ttc')
cmap = font.getBestCmap()
print(f"Coverage: {len(cmap)} code points")
# Check specific character
print(0x1F600 in cmap) # Is 😀 covered?
The Font Fallback Chain
When a browser renders text, it walks through the font stack for each character, using the first font in the list that has a glyph for that character:
body {
font-family:
'Inter', /* 1. Primary web font — Latin, some symbols */
system-ui, /* 2. Platform UI font — decent multilingual coverage */
'Apple Color Emoji', /* 3. macOS/iOS emoji (color, SBIX) */
'Segoe UI Emoji', /* 4. Windows emoji (color, COLR/CPAL) */
'Noto Color Emoji', /* 5. Cross-platform fallback emoji */
'Noto Sans', /* 6. Google's universal coverage */
sans-serif; /* 7. Browser last resort */
}
The browser checks each font for each character (not each word or element). A sentence mixing Latin and Arabic might use Inter for the Latin portions and Noto Sans Arabic for the Arabic portions — character by character.
Tofu (□) appears when no font in the chain has a glyph for the character. Common causes: - CJK characters without a CJK font in the stack - Newly assigned emoji not yet in any installed font - Private Use Area (PUA) characters without the proprietary font loaded - Rare scripts (Tirhuta, Nüshu, Hanifi Rohingya) with no system font
@font-face and unicode-range Subsetting
Loading the entire Noto Sans collection upfront is impractical — it totals hundreds of megabytes. The solution is unicode-range in @font-face, which triggers conditional font loading:
/* Only download this font if the page contains Devanagari */
@font-face {
font-family: 'NotoSans';
font-style: normal;
font-weight: 400;
src: url('/fonts/noto-sans-devanagari.woff2') format('woff2');
unicode-range: U+0900-097F, /* Devanagari */
U+1CD0-1CFF, /* Vedic Extensions */
U+A8E0-A8FF; /* Devanagari Extended */
font-display: swap;
}
/* Latin subset — always loads */
@font-face {
font-family: 'NotoSans';
font-style: normal;
font-weight: 400;
src: url('/fonts/noto-sans-latin.woff2') format('woff2');
unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC,
U+02C6, U+02DA, U+02DC, U+2000-206F, U+2074,
U+20AC, U+2122, U+2191, U+2193, U+2212, U+2215,
U+FEFF, U+FFFD;
font-display: swap;
}
This is exactly how Google Fonts works. When you embed a Google Fonts URL, the CSS response contains 20–40 @font-face blocks, each covering a Unicode subset, each with a conditional unicode-range. Only the subsets needed by the page's text are downloaded.
Variable Fonts
Variable fonts (OpenType 1.8, 2016) pack multiple weights, widths, and styles into a single font file using interpolatable design axes:
@font-face {
font-family: 'InterVariable';
src: url('/fonts/Inter-Variable.woff2') format('woff2');
font-weight: 100 900; /* entire weight range from one file */
}
h1 {
font-family: 'InterVariable';
font-weight: 750; /* any value between 100–900 */
font-variation-settings: 'wght' 750, 'slnt' -5;
}
For Unicode coverage, variable fonts generally cover the same code points as their static counterparts — the variability is about design axes, not character coverage. But one variable font file replacing 6–12 static weight/style files significantly reduces HTTP requests.
Color Fonts for Emoji
Emoji rendering uses four competing color font formats, all embedded in standard .ttf/.otf/.woff2 containers:
| Format | Table | Used by | Notes |
|---|---|---|---|
| COLR/CPAL | COLR + CPAL |
Microsoft (Windows), Google (Android/Chrome) | Vector, compact, v1 supports gradients |
| SVG | SVG |
Mozilla Firefox (legacy), Adobe | SVG glyphs, large files |
| SBIX | sbix |
Apple (macOS, iOS) | PNG bitmaps at multiple sizes |
| CBDT/CBLC | CBDT + CBLC |
Google (older Android) | PNG bitmaps, like SBIX |
Modern emoji fonts often include multiple tables for cross-platform compatibility. The browser/OS chooses the format it supports. COLR v1 (OpenType 1.9) is the current standard, supported in Chrome 98+, Firefox 98+, Safari 15.4+.
/* Color emoji always take precedence over text presentation
when a color emoji font is available */
.emoji {
font-family: 'Apple Color Emoji', 'Segoe UI Emoji', 'Noto Color Emoji';
}
/* Force specific emoji size — color fonts scale like vector */
.large-emoji {
font-size: 3rem;
line-height: 1;
}
OpenType Features
OpenType features are named 4-character tags that enable advanced typographic features stored in font tables:
body {
/* Enable contextual alternates and standard ligatures */
font-feature-settings: 'calt' 1, 'liga' 1;
/* Or use the higher-level property */
font-variant-ligatures: common-ligatures contextual;
}
/* Old-style figures in body text */
.body-text {
font-variant-numeric: oldstyle-nums proportional-nums;
}
/* Tabular numbers for data tables */
.data-table {
font-variant-numeric: tabular-nums;
}
/* Small caps */
.small-caps {
font-variant-caps: small-caps;
}
Key features for Unicode correctness:
- kern — kerning adjustments between specific pairs
- mark/mkmk — positioning combining marks (accents) correctly over base letters
- curs — cursive joining (Arabic, Syriac)
- init/medi/fina/isol — Arabic letter positional forms
- rtla/rtlm — right-to-left glyph alternates
font-display Strategy
Web fonts create a Flash of Invisible Text (FOIT) or Flash of Unstyled Text (FOUT). The font-display descriptor controls the trade-off:
@font-face {
font-family: 'Inter';
src: url('/fonts/Inter.woff2') format('woff2');
font-display: swap; /* show fallback immediately, swap when loaded */
}
Values:
- block — invisible text for 3s, then swap (old default, worst UX)
- swap — show fallback immediately, swap when loaded (CLS risk)
- fallback — invisible for 100ms, fallback for 3s, then keep fallback
- optional — 100ms block, then use cached font or abandon (best for performance)
For body text on content sites: swap. For headlines where the font is distinctive: optional on repeat visits.
Noto Fonts: The Universal Fallback
The Noto project (the name means "no tofu") is Google's effort to create fonts covering all of Unicode. Each Noto font file covers one or several scripts:
Noto Sans — Latin, Cyrillic, Greek, and more
Noto Sans CJK SC/TC/JP/KR — Simplified/Traditional Chinese, Japanese, Korean
Noto Sans Arabic — Arabic
Noto Sans Hebrew — Hebrew
Noto Sans Devanagari — Devanagari (Hindi, Marathi, Sanskrit)
Noto Color Emoji — Color emoji (CBDT and COLR v1)
Noto Serif * — Serif variants of all scripts
For a web application that handles user-generated content in unknown languages, including Noto as a fallback via Google Fonts or self-hosting ensures that virtually no character produces tofu:
<link rel="preconnect" href="https://fonts.googleapis.com">
<link href="https://fonts.googleapis.com/css2?family=Noto+Sans:wght@400;700&family=Noto+Sans+SC&family=Noto+Color+Emoji&display=swap" rel="stylesheet">
The conditional unicode-range loading in the Google Fonts response ensures users only download the Noto subset files that contain characters actually present on the page.