🖥️ Platform Guides

Unicode in Terminal / Command Line

Modern terminals support Unicode and UTF-8, but correctly displaying all Unicode characters requires a compatible terminal emulator, the right locale settings, and a font with adequate Unicode coverage. This guide covers how to configure your terminal for Unicode, insert special characters from the command line, and debug character display issues.

·

The terminal (or command line) is where developers, system administrators, and power users spend much of their time. Getting Unicode to work properly in the terminal involves a chain of components that must all agree on encoding: the shell, the terminal emulator, the locale settings, the font, and the applications running inside. When any link in this chain breaks, you get garbled text, missing characters, or mysterious question marks. This guide walks through every layer of terminal Unicode support and how to configure them correctly.

The Encoding Chain

Displaying a Unicode character in the terminal requires cooperation between multiple layers:

Application (e.g., Python, vim)
    |
    v
Shell (bash, zsh, fish)
    |
    v
Terminal Emulator (iTerm2, Windows Terminal, GNOME Terminal)
    |
    v
Font Rendering Engine
    |
    v
Display

Each layer must: 1. Encode text as UTF-8 (or another Unicode encoding) 2. Transmit the bytes correctly 3. Decode the bytes back to code points 4. Render the code points as visible glyphs

Locale Configuration

The locale tells the system what character encoding and language conventions to use. On Unix-like systems (Linux, macOS), the locale is set through environment variables:

Variable Purpose Example
LANG Default locale for all categories en_US.UTF-8
LC_ALL Override for all locale categories en_US.UTF-8
LC_CTYPE Character classification and encoding en_US.UTF-8
LC_MESSAGES Language for system messages en_US.UTF-8

Checking your locale

locale

This prints all locale variables. The critical one is LC_CTYPE — it determines character encoding. If it does not end in .UTF-8, Unicode support will be broken.

Setting UTF-8 locale

On Linux, add to your ~/.bashrc or ~/.zshrc:

export LANG=en_US.UTF-8
export LC_ALL=en_US.UTF-8

On macOS, the default locale is usually UTF-8 already. Verify with:

echo $LANG
# Should output: en_US.UTF-8 (or similar .UTF-8 locale)

Generating missing locales (Linux)

If en_US.UTF-8 is not available:

sudo locale-gen en_US.UTF-8
sudo update-locale LANG=en_US.UTF-8

Terminal Emulator Configuration

The terminal emulator is the application that displays the grid of characters. Modern terminal emulators have excellent Unicode support, but older ones may need configuration.

Terminal Unicode Support Configuration
iTerm2 (macOS) Excellent Preferences > Profiles > Terminal > Unicode
Windows Terminal Excellent Settings > Profiles > Appearance
GNOME Terminal Excellent Set > Character Encoding > UTF-8
Alacritty Excellent Built-in UTF-8, no configuration needed
Kitty Excellent Native Unicode rendering, grapheme clusters
WezTerm Excellent Full Unicode and emoji support
PuTTY Good Connection > Data > set UTF-8
cmd.exe (Windows) Limited chcp 65001 for UTF-8

Windows cmd.exe and PowerShell

Historically, the Windows console was limited to the system's active code page (e.g., 437 for US English, 932 for Japanese). To enable UTF-8:

# In cmd.exe
chcp 65001

# In PowerShell
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
$OutputEncoding = [System.Text.Encoding]::UTF8

Windows Terminal (the modern replacement) defaults to UTF-8 and handles Unicode far better than the legacy console host.

Font Selection

Even with correct encoding and locale, characters will appear as empty boxes or question marks if the terminal font does not contain the necessary glyphs.

Font Coverage Notes
JetBrains Mono NF Latin, Cyrillic, Greek + Nerd Font icons Popular for development
Fira Code Latin, Cyrillic, Greek + ligatures Programming ligatures
Noto Sans Mono Nearly all Unicode scripts Best coverage
Cascadia Code Latin, Cyrillic, Greek Microsoft's modern mono font
DejaVu Sans Mono Latin, Cyrillic, Greek, many symbols Good general coverage
Hack Latin + extended symbols Clean and readable

Nerd Fonts

Nerd Fonts are patched versions of popular programming fonts that add thousands of icons (from Font Awesome, Devicons, Powerline, etc.) into the Private Use Area (PUA) of Unicode. Many terminal themes (Oh My Zsh, Starship, Powerlevel10k) require a Nerd Font for icons to display.

Font fallback in terminals

Most modern terminals support font fallback — if the primary font lacks a glyph, the terminal checks secondary fonts:

Terminal Fallback Support
iTerm2 Automatic + configurable non-ASCII font
Windows Terminal System font fallback chain
Kitty symbol_map directive for custom fallback
Alacritty No built-in fallback (relies on fontconfig)

In iTerm2, you can set a separate Non-ASCII Font under Profiles > Text. This is useful for pairing a Latin programming font with a CJK or symbol font.

Character Width: Half-Width vs Full-Width

CJK characters (Chinese, Japanese, Korean) and certain symbols occupy two columns in the terminal grid, while Latin characters occupy one. This is called the East Asian Width property defined in Unicode.

Width Columns Examples
Narrow (Na) 1 A, B, 1, $, most Latin/Cyrillic
Wide (W) 2 漢, 字, ア, 가 (CJK ideographs, katakana, hangul)
Fullwidth (F) 2 A, B (fullwidth Latin letters)
Halfwidth (H) 1 ア, イ (halfwidth katakana)
Ambiguous (A) 1 or 2 ①, ②, some Greek (depends on context)

The Ambiguous Width Problem

Characters with Ambiguous width are the source of many alignment issues. In East Asian contexts, they are traditionally displayed as wide (2 columns), but in Western contexts as narrow (1 column). Terminals must choose, and this often causes misalignment in table-formatted output.

Most terminals default to treating Ambiguous-width characters as narrow. iTerm2 offers a "Ambiguous characters are double-width" option under Profiles > Text.

Emoji in the Terminal

Emoji are increasingly common in terminal output (git status messages, log formatters, prompt themes). However, emoji present special challenges:

Challenge Details
Width Most emoji are wide (2 columns) but some terminals miscalculate
Color vs monochrome Some terminals render color emoji, others show monochrome
Sequences Family emoji (ZWJ sequences) may render as multiple characters
Skin tones Some terminals do not support modifier sequences

Terminal emoji support

Terminal Color Emoji ZWJ Sequences
iTerm2 Yes Partial
Windows Terminal Yes Yes
Kitty Yes (via protocol) Yes
WezTerm Yes Yes
GNOME Terminal Partial Partial
Alacritty Monochrome only No

Common Problems and Fixes

Problem: Characters show as diamonds with question marks

Cause: The terminal is decoding bytes as Latin-1 but the data is UTF-8. Fix: Set the terminal encoding to UTF-8 and ensure LANG is set to a .UTF-8 locale.

Problem: Characters show as empty boxes

Cause: The font does not contain glyphs for those characters. Fix: Switch to a font with broader coverage (Noto Sans Mono) or configure font fallback.

Problem: CJK characters misalign columns in tables

Cause: The application and terminal disagree on character widths. Fix: Use the unicodedata.east_asian_width() function in Python or the wcwidth library to calculate display widths correctly.

Problem: SSH session shows garbled text

Cause: The remote server locale does not match the local terminal encoding. Fix: Ensure both local and remote systems use UTF-8:

# On remote server
export LANG=en_US.UTF-8
export LC_ALL=en_US.UTF-8

Also ensure your SSH client forwards the locale: check that SendEnv LANG LC_* is in your SSH config and AcceptEnv LANG LC_* is in the server's sshd_config.

Verifying Unicode Support

A quick diagnostic you can run in any terminal:

# Print various Unicode characters
printf 'ASCII:    Hello\n'
printf 'Latin:    cafe\xcc\x81\n'
printf 'Greek:    \xce\x91\xce\xbb\xcf\x86\xce\xb1\n'
printf 'CJK:      \xe4\xb8\xad\xe6\x96\x87\n'
printf 'Emoji:    \xf0\x9f\x98\x80\n'
printf 'Arrows:   \xe2\x86\x90 \xe2\x86\x91 \xe2\x86\x92 \xe2\x86\x93\n'

If any line shows replacement characters or boxes, you have a font or encoding issue at that layer.

Key Takeaways

  • Unicode in the terminal depends on a chain: application encoding, shell, terminal emulator, and font must all support UTF-8.
  • Set LANG=en_US.UTF-8 (or equivalent) in your shell profile — this is the single most important configuration for terminal Unicode support.
  • Choose a terminal font with broad Unicode coverage (Noto Sans Mono, JetBrains Mono NF) and configure font fallback for scripts your primary font does not cover.
  • East Asian Width causes column alignment issues — use libraries like wcwidth to calculate display widths correctly.
  • Modern terminals (iTerm2, Windows Terminal, Kitty, WezTerm) have excellent Unicode and emoji support. Upgrade from legacy terminals (cmd.exe, older xterm) if possible.

Mehr in Platform Guides