Unicode Fonts: How Characters Get Rendered
A font file only contains glyphs for a subset of Unicode characters, which means characters outside that subset fall back to other fonts or show as blank boxes, raising complex fallback and design challenges. This guide explains how Unicode font rendering works, how to choose fonts with broad Unicode coverage, and how to use font stacks and Unicode ranges in CSS.
No single font contains glyphs for all 154,000+ assigned Unicode characters. Even the most comprehensive font families cover only a fraction of the Unicode repertoire. When a browser or operating system encounters a character that the current font does not support, it falls back to another font — or displays the dreaded "tofu" rectangle (□). This guide explains how font coverage works, how fallback chains operate, which font families offer the broadest Unicode support, and practical strategies for ensuring your text renders correctly across scripts and platforms.
What Is Font Coverage?
A font's coverage is the set of Unicode code points for which it contains glyphs. No font tries to cover everything — they are designed for specific scripts and use cases:
| Font | Approximate Coverage | Primary Scripts |
|---|---|---|
| Arial | ~3,000 glyphs | Latin, Greek, Cyrillic |
| Times New Roman | ~3,200 glyphs | Latin, Greek, Cyrillic |
| Noto Sans | ~65,000 glyphs (full family) | 150+ scripts |
| Noto Serif | ~40,000 glyphs (full family) | Major scripts |
| Code2000 | ~60,000 glyphs | Broad coverage (unmaintained) |
| Unifont | ~77,000 glyphs | BMP bitmap coverage |
| Last Resort | 1,114,112 (placeholder) | All code points (boxes only) |
The gap between a typical system font (~3,000 glyphs) and the full Unicode repertoire (~154,000 assigned characters) means that font fallback is not an edge case — it is the normal state of affairs for any multilingual content.
How Font Fallback Works
When a rendering engine (browser, OS text layout) needs to display a character, it follows a fallback chain:
1. Try the specified font (e.g., "Helvetica Neue")
→ Glyph found? Display it.
→ No glyph? Continue.
2. Try each font in the CSS font-family list
font-family: "Helvetica Neue", Arial, sans-serif
→ Try Arial...
→ Try system sans-serif...
3. Try the system fallback font list
(OS-specific, script-aware)
4. Display .notdef glyph (tofu □ or �)
Browser Font Matching
Modern browsers implement the CSS Fonts specification (CSS Fonts Level 4), which defines a sophisticated font matching algorithm:
- Exact match: Check the first font in the
font-familylist - Character-by-character fallback: For each character not covered, try subsequent fonts in the list
- System font fallback: Consult the OS font database for a font that covers the missing character
- Last resort: Display
.notdef— usually a rectangle or question mark
This means a single paragraph can silently use three or four different fonts, with the browser switching between them character by character. The visual result may show subtle differences in weight, size, and baseline alignment.
The Noto Font Project
Google's Noto font family is the most ambitious attempt to eliminate tofu (the name "Noto" stands for "No Tofu"). It provides fonts for over 150 scripts:
Noto Font Architecture
| Component | Coverage | Size |
|---|---|---|
| Noto Sans | Latin, Cyrillic, Greek | ~400 KB |
| Noto Sans CJK | Chinese, Japanese, Korean | ~16 MB per variant |
| Noto Sans Arabic | Arabic, Urdu, Persian | ~200 KB |
| Noto Sans Devanagari | Hindi, Sanskrit, Marathi | ~150 KB |
| Noto Sans Hebrew | Hebrew, Yiddish | ~100 KB |
| Noto Sans Thai | Thai | ~100 KB |
| Noto Emoji | Emoji (color) | ~10 MB |
| Noto Sans Symbols | Symbols, technical | ~500 KB |
| Full family | 150+ scripts | ~1.1 GB |
Using Noto in Web Projects
You do not need to load the entire 1.1 GB family. Load only the scripts your content requires:
/* Load only what you need from Google Fonts */
@import url('https://fonts.googleapis.com/css2?family=Noto+Sans:wght@400;700&display=swap');
@import url('https://fonts.googleapis.com/css2?family=Noto+Sans+Arabic:wght@400;700&display=swap');
body {
font-family: 'Noto Sans', 'Noto Sans Arabic', sans-serif;
}
Google Fonts serves Noto fonts with unicode-range subsetting, so the browser only downloads the character subsets it actually needs.
The unicode-range Descriptor
The CSS @font-face rule supports a unicode-range descriptor that tells the browser
which characters a font file covers:
/* Latin subset */
@font-face {
font-family: 'MyFont';
src: url('myfont-latin.woff2') format('woff2');
unicode-range: U+0000-00FF, U+0131, U+0152-0153;
}
/* Cyrillic subset */
@font-face {
font-family: 'MyFont';
src: url('myfont-cyrillic.woff2') format('woff2');
unicode-range: U+0400-04FF;
}
/* CJK subset */
@font-face {
font-family: 'MyFont';
src: url('myfont-cjk.woff2') format('woff2');
unicode-range: U+4E00-9FFF;
}
The browser will only download myfont-cyrillic.woff2 if the page actually contains
Cyrillic characters. This is how Google Fonts achieves efficient loading for multilingual
sites.
Platform-Specific Font Stacks
Each operating system ships different default fonts with different Unicode coverage:
Windows
font-family:
'Segoe UI', /* Latin, Cyrillic, Greek */
'Segoe UI Emoji', /* Emoji */
'Microsoft YaHei', /* Chinese (Simplified) */
'Meiryo', /* Japanese */
'Malgun Gothic', /* Korean */
sans-serif;
macOS / iOS
font-family:
-apple-system, /* San Francisco (Latin, Cyrillic, Greek) */
'Apple Color Emoji', /* Emoji */
'PingFang SC', /* Chinese (Simplified) */
'Hiragino Sans', /* Japanese */
'Apple SD Gothic Neo', /* Korean */
sans-serif;
Linux
font-family:
'Noto Sans', /* Most scripts (if installed) */
'Noto Color Emoji', /* Emoji */
'DejaVu Sans', /* Latin, Greek, Cyrillic (common fallback) */
sans-serif;
Cross-Platform Stack
A robust cross-platform font stack for multilingual content:
font-family:
system-ui,
-apple-system,
'Segoe UI',
Roboto,
'Noto Sans',
'Liberation Sans',
sans-serif,
'Apple Color Emoji',
'Segoe UI Emoji',
'Noto Color Emoji';
Diagnosing Missing Glyphs
The "Tofu" Problem
When a character has no glyph in any available font, renderers display a fallback symbol:
| Renderer | Display | Meaning |
|---|---|---|
| Most browsers | □ (empty rectangle) | Missing glyph ("tofu") |
| Some browsers | � (replacement character) | Decoding error |
| Windows | Empty box or dotted box | Font missing glyph |
| macOS | Last Resort font (labeled box) | Font missing glyph |
Detecting Missing Glyphs Programmatically
In JavaScript, you can detect whether a font supports a character:
function fontSupportsChar(fontFamily, char) {
const canvas = document.createElement('canvas');
const ctx = canvas.getContext('2d');
const testSize = '72px';
// Render with the target font
ctx.font = `${testSize} "${fontFamily}"`;
const targetMetrics = ctx.measureText(char);
// Render with a known fallback
ctx.font = `${testSize} "serif"`;
const fallbackMetrics = ctx.measureText(char);
// If widths differ, the target font has the glyph
return targetMetrics.width !== fallbackMetrics.width;
}
Using Python's fontTools
from fontTools.ttLib import TTFont
def check_coverage(font_path: str, code_points: list[int]) -> dict[int, bool]:
"""Check which code points a font supports."""
font = TTFont(font_path)
cmap = font.getBestCmap()
return {cp: cp in cmap for cp in code_points}
# Example
result = check_coverage("NotoSans-Regular.ttf", [0x0041, 0x4E2D, 0x1F600])
# {0x0041: True, 0x4E2D: False, 0x1F600: False}
Strategies for Maximum Coverage
1. Progressive Font Loading
Load a small base font immediately and add script-specific fonts as needed:
<!-- Critical: Latin base font (small, fast) -->
<link rel="preload" href="/fonts/base-latin.woff2" as="font" crossorigin>
<!-- Deferred: Additional scripts loaded on demand -->
<link rel="stylesheet" href="/fonts/extended-scripts.css" media="print" onload="this.media='all'">
2. Font Display Strategy
Use font-display to control rendering behavior during font loading:
@font-face {
font-family: 'MyFont';
src: url('myfont.woff2') format('woff2');
font-display: swap; /* Show fallback immediately, swap when loaded */
}
| Value | Behavior |
|---|---|
auto |
Browser decides (usually block) |
block |
Hide text briefly, then show with font |
swap |
Show fallback immediately, swap when loaded |
fallback |
Short block, then fallback; swap only if loaded quickly |
optional |
Very short block; may skip custom font entirely |
3. Server-Side Script Detection
Detect which scripts a page contains and serve only the necessary font files:
import unicodedata
def detect_scripts(text: str) -> set[str]:
"""Return the set of Unicode scripts present in text."""
scripts = set()
for char in text:
try:
script = unicodedata.script(char) # Python 3.14+
if script not in ("Common", "Inherited"):
scripts.add(script)
except (ValueError, AttributeError):
pass
return scripts
# Usage
scripts = detect_scripts("Hello 世界 مرحبا")
# {'Latin', 'Han', 'Arabic'}
# → Load Noto Sans, Noto Sans CJK, Noto Sans Arabic
Key Takeaways
- No single font covers all of Unicode — font fallback is the norm, not the exception
- Noto is the most comprehensive family at 150+ scripts, but the full set is over 1 GB
- Use
unicode-rangein@font-faceto load only the subsets your content needs - Platform font stacks differ significantly — always include generic family keywords
(
sans-serif,serif,monospace) as the final fallback - Tofu (empty rectangles) means the rendering engine exhausted all fallback options
- Test with real multilingual content — Latin-only testing will never reveal fallback issues
- Progressive loading with
font-display: swapgives the best balance of performance and visual completeness
Practical Unicode のその他のガイド
Windows provides several methods for typing special characters and Unicode symbols, including …
macOS makes it easy to type special characters and Unicode symbols through …
Linux offers multiple ways to insert Unicode characters, including Ctrl+Shift+U followed by …
Typing special Unicode characters on smartphones requires different techniques than on desktop …
Mojibake is the garbled text you see when a file encoded in …
Storing Unicode text in a database requires choosing the right charset, collation, …
Modern operating systems support Unicode filenames, but different filesystems use different encodings …
Email evolved from ASCII-only systems, and supporting Unicode in email subjects, bodies, …
Internationalized Domain Names (IDNs) allow domain names to contain non-ASCII characters from …
Using Unicode symbols, special characters, and emoji in web content has important …
Unicode supports both left-to-right and right-to-left text through the bidirectional algorithm and …
Finding the exact Unicode character you need can be challenging given over …
Copying and pasting text between applications can introduce invisible characters, change normalization …
Unicode's Mathematical Alphanumeric Symbols block and other areas contain bold, italic, script, …