🔧 Practical Unicode

Unicode Fonts: How Characters Get Rendered

A font file only contains glyphs for a subset of Unicode characters, which means characters outside that subset fall back to other fonts or show as blank boxes, raising complex fallback and design challenges. This guide explains how Unicode font rendering works, how to choose fonts with broad Unicode coverage, and how to use font stacks and Unicode ranges in CSS.

·

No single font contains glyphs for all 154,000+ assigned Unicode characters. Even the most comprehensive font families cover only a fraction of the Unicode repertoire. When a browser or operating system encounters a character that the current font does not support, it falls back to another font — or displays the dreaded "tofu" rectangle (□). This guide explains how font coverage works, how fallback chains operate, which font families offer the broadest Unicode support, and practical strategies for ensuring your text renders correctly across scripts and platforms.

What Is Font Coverage?

A font's coverage is the set of Unicode code points for which it contains glyphs. No font tries to cover everything — they are designed for specific scripts and use cases:

Font Approximate Coverage Primary Scripts
Arial ~3,000 glyphs Latin, Greek, Cyrillic
Times New Roman ~3,200 glyphs Latin, Greek, Cyrillic
Noto Sans ~65,000 glyphs (full family) 150+ scripts
Noto Serif ~40,000 glyphs (full family) Major scripts
Code2000 ~60,000 glyphs Broad coverage (unmaintained)
Unifont ~77,000 glyphs BMP bitmap coverage
Last Resort 1,114,112 (placeholder) All code points (boxes only)

The gap between a typical system font (~3,000 glyphs) and the full Unicode repertoire (~154,000 assigned characters) means that font fallback is not an edge case — it is the normal state of affairs for any multilingual content.

How Font Fallback Works

When a rendering engine (browser, OS text layout) needs to display a character, it follows a fallback chain:

1. Try the specified font (e.g., "Helvetica Neue")
   → Glyph found? Display it.
   → No glyph? Continue.

2. Try each font in the CSS font-family list
   font-family: "Helvetica Neue", Arial, sans-serif
   → Try Arial...
   → Try system sans-serif...

3. Try the system fallback font list
   (OS-specific, script-aware)

4. Display .notdef glyph (tofu □ or �)

Browser Font Matching

Modern browsers implement the CSS Fonts specification (CSS Fonts Level 4), which defines a sophisticated font matching algorithm:

  1. Exact match: Check the first font in the font-family list
  2. Character-by-character fallback: For each character not covered, try subsequent fonts in the list
  3. System font fallback: Consult the OS font database for a font that covers the missing character
  4. Last resort: Display .notdef — usually a rectangle or question mark

This means a single paragraph can silently use three or four different fonts, with the browser switching between them character by character. The visual result may show subtle differences in weight, size, and baseline alignment.

The Noto Font Project

Google's Noto font family is the most ambitious attempt to eliminate tofu (the name "Noto" stands for "No Tofu"). It provides fonts for over 150 scripts:

Noto Font Architecture

Component Coverage Size
Noto Sans Latin, Cyrillic, Greek ~400 KB
Noto Sans CJK Chinese, Japanese, Korean ~16 MB per variant
Noto Sans Arabic Arabic, Urdu, Persian ~200 KB
Noto Sans Devanagari Hindi, Sanskrit, Marathi ~150 KB
Noto Sans Hebrew Hebrew, Yiddish ~100 KB
Noto Sans Thai Thai ~100 KB
Noto Emoji Emoji (color) ~10 MB
Noto Sans Symbols Symbols, technical ~500 KB
Full family 150+ scripts ~1.1 GB

Using Noto in Web Projects

You do not need to load the entire 1.1 GB family. Load only the scripts your content requires:

/* Load only what you need from Google Fonts */
@import url('https://fonts.googleapis.com/css2?family=Noto+Sans:wght@400;700&display=swap');
@import url('https://fonts.googleapis.com/css2?family=Noto+Sans+Arabic:wght@400;700&display=swap');

body {
  font-family: 'Noto Sans', 'Noto Sans Arabic', sans-serif;
}

Google Fonts serves Noto fonts with unicode-range subsetting, so the browser only downloads the character subsets it actually needs.

The unicode-range Descriptor

The CSS @font-face rule supports a unicode-range descriptor that tells the browser which characters a font file covers:

/* Latin subset */
@font-face {
  font-family: 'MyFont';
  src: url('myfont-latin.woff2') format('woff2');
  unicode-range: U+0000-00FF, U+0131, U+0152-0153;
}

/* Cyrillic subset */
@font-face {
  font-family: 'MyFont';
  src: url('myfont-cyrillic.woff2') format('woff2');
  unicode-range: U+0400-04FF;
}

/* CJK subset */
@font-face {
  font-family: 'MyFont';
  src: url('myfont-cjk.woff2') format('woff2');
  unicode-range: U+4E00-9FFF;
}

The browser will only download myfont-cyrillic.woff2 if the page actually contains Cyrillic characters. This is how Google Fonts achieves efficient loading for multilingual sites.

Platform-Specific Font Stacks

Each operating system ships different default fonts with different Unicode coverage:

Windows

font-family:
  'Segoe UI',          /* Latin, Cyrillic, Greek */
  'Segoe UI Emoji',    /* Emoji */
  'Microsoft YaHei',   /* Chinese (Simplified) */
  'Meiryo',            /* Japanese */
  'Malgun Gothic',     /* Korean */
  sans-serif;

macOS / iOS

font-family:
  -apple-system,       /* San Francisco (Latin, Cyrillic, Greek) */
  'Apple Color Emoji',  /* Emoji */
  'PingFang SC',       /* Chinese (Simplified) */
  'Hiragino Sans',     /* Japanese */
  'Apple SD Gothic Neo', /* Korean */
  sans-serif;

Linux

font-family:
  'Noto Sans',         /* Most scripts (if installed) */
  'Noto Color Emoji',  /* Emoji */
  'DejaVu Sans',       /* Latin, Greek, Cyrillic (common fallback) */
  sans-serif;

Cross-Platform Stack

A robust cross-platform font stack for multilingual content:

font-family:
  system-ui,
  -apple-system,
  'Segoe UI',
  Roboto,
  'Noto Sans',
  'Liberation Sans',
  sans-serif,
  'Apple Color Emoji',
  'Segoe UI Emoji',
  'Noto Color Emoji';

Diagnosing Missing Glyphs

The "Tofu" Problem

When a character has no glyph in any available font, renderers display a fallback symbol:

Renderer Display Meaning
Most browsers □ (empty rectangle) Missing glyph ("tofu")
Some browsers � (replacement character) Decoding error
Windows Empty box or dotted box Font missing glyph
macOS Last Resort font (labeled box) Font missing glyph

Detecting Missing Glyphs Programmatically

In JavaScript, you can detect whether a font supports a character:

function fontSupportsChar(fontFamily, char) {
  const canvas = document.createElement('canvas');
  const ctx = canvas.getContext('2d');
  const testSize = '72px';

  // Render with the target font
  ctx.font = `${testSize} "${fontFamily}"`;
  const targetMetrics = ctx.measureText(char);

  // Render with a known fallback
  ctx.font = `${testSize} "serif"`;
  const fallbackMetrics = ctx.measureText(char);

  // If widths differ, the target font has the glyph
  return targetMetrics.width !== fallbackMetrics.width;
}

Using Python's fontTools

from fontTools.ttLib import TTFont

def check_coverage(font_path: str, code_points: list[int]) -> dict[int, bool]:
    """Check which code points a font supports."""
    font = TTFont(font_path)
    cmap = font.getBestCmap()
    return {cp: cp in cmap for cp in code_points}

# Example
result = check_coverage("NotoSans-Regular.ttf", [0x0041, 0x4E2D, 0x1F600])
# {0x0041: True, 0x4E2D: False, 0x1F600: False}

Strategies for Maximum Coverage

1. Progressive Font Loading

Load a small base font immediately and add script-specific fonts as needed:

<!-- Critical: Latin base font (small, fast) -->
<link rel="preload" href="/fonts/base-latin.woff2" as="font" crossorigin>

<!-- Deferred: Additional scripts loaded on demand -->
<link rel="stylesheet" href="/fonts/extended-scripts.css" media="print" onload="this.media='all'">

2. Font Display Strategy

Use font-display to control rendering behavior during font loading:

@font-face {
  font-family: 'MyFont';
  src: url('myfont.woff2') format('woff2');
  font-display: swap;  /* Show fallback immediately, swap when loaded */
}
Value Behavior
auto Browser decides (usually block)
block Hide text briefly, then show with font
swap Show fallback immediately, swap when loaded
fallback Short block, then fallback; swap only if loaded quickly
optional Very short block; may skip custom font entirely

3. Server-Side Script Detection

Detect which scripts a page contains and serve only the necessary font files:

import unicodedata

def detect_scripts(text: str) -> set[str]:
    """Return the set of Unicode scripts present in text."""
    scripts = set()
    for char in text:
        try:
            script = unicodedata.script(char)  # Python 3.14+
            if script not in ("Common", "Inherited"):
                scripts.add(script)
        except (ValueError, AttributeError):
            pass
    return scripts

# Usage
scripts = detect_scripts("Hello 世界 مرحبا")
# {'Latin', 'Han', 'Arabic'}
# → Load Noto Sans, Noto Sans CJK, Noto Sans Arabic

Key Takeaways

  1. No single font covers all of Unicode — font fallback is the norm, not the exception
  2. Noto is the most comprehensive family at 150+ scripts, but the full set is over 1 GB
  3. Use unicode-range in @font-face to load only the subsets your content needs
  4. Platform font stacks differ significantly — always include generic family keywords (sans-serif, serif, monospace) as the final fallback
  5. Tofu (empty rectangles) means the rendering engine exhausted all fallback options
  6. Test with real multilingual content — Latin-only testing will never reveal fallback issues
  7. Progressive loading with font-display: swap gives the best balance of performance and visual completeness

المزيد في Practical Unicode

How to Type Special Characters on Windows

Windows provides several methods for typing special characters and Unicode symbols, including …

How to Type Special Characters on Mac

macOS makes it easy to type special characters and Unicode symbols through …

How to Type Special Characters on Linux

Linux offers multiple ways to insert Unicode characters, including Ctrl+Shift+U followed by …

Special Characters on Mobile (iOS/Android)

Typing special Unicode characters on smartphones requires different techniques than on desktop …

How to Fix Mojibake (Garbled Text)

Mojibake is the garbled text you see when a file encoded in …

Unicode in Databases

Storing Unicode text in a database requires choosing the right charset, collation, …

Unicode in Filenames

Modern operating systems support Unicode filenames, but different filesystems use different encodings …

Unicode in Email

Email evolved from ASCII-only systems, and supporting Unicode in email subjects, bodies, …

Unicode in Domain Names (IDN)

Internationalized Domain Names (IDNs) allow domain names to contain non-ASCII characters from …

Unicode for Accessibility

Using Unicode symbols, special characters, and emoji in web content has important …

Unicode Text Direction: LTR vs RTL

Unicode supports both left-to-right and right-to-left text through the bidirectional algorithm and …

How to Find Any Unicode Character

Finding the exact Unicode character you need can be challenging given over …

Unicode Copy and Paste Best Practices

Copying and pasting text between applications can introduce invisible characters, change normalization …

How to Create Fancy Text with Unicode

Unicode's Mathematical Alphanumeric Symbols block and other areas contain bold, italic, script, …