Unicode in Terminal / Command Line
Modern terminals support Unicode and UTF-8, but correctly displaying all Unicode characters requires a compatible terminal emulator, the right locale settings, and a font with adequate Unicode coverage. This guide covers how to configure your terminal for Unicode, insert special characters from the command line, and debug character display issues.
The terminal (or command line) is where developers, system administrators, and power users spend much of their time. Getting Unicode to work properly in the terminal involves a chain of components that must all agree on encoding: the shell, the terminal emulator, the locale settings, the font, and the applications running inside. When any link in this chain breaks, you get garbled text, missing characters, or mysterious question marks. This guide walks through every layer of terminal Unicode support and how to configure them correctly.
The Encoding Chain
Displaying a Unicode character in the terminal requires cooperation between multiple layers:
Application (e.g., Python, vim)
|
v
Shell (bash, zsh, fish)
|
v
Terminal Emulator (iTerm2, Windows Terminal, GNOME Terminal)
|
v
Font Rendering Engine
|
v
Display
Each layer must: 1. Encode text as UTF-8 (or another Unicode encoding) 2. Transmit the bytes correctly 3. Decode the bytes back to code points 4. Render the code points as visible glyphs
Locale Configuration
The locale tells the system what character encoding and language conventions to use. On Unix-like systems (Linux, macOS), the locale is set through environment variables:
| Variable | Purpose | Example |
|---|---|---|
LANG |
Default locale for all categories | en_US.UTF-8 |
LC_ALL |
Override for all locale categories | en_US.UTF-8 |
LC_CTYPE |
Character classification and encoding | en_US.UTF-8 |
LC_MESSAGES |
Language for system messages | en_US.UTF-8 |
Checking your locale
locale
This prints all locale variables. The critical one is LC_CTYPE — it determines
character encoding. If it does not end in .UTF-8, Unicode support will be broken.
Setting UTF-8 locale
On Linux, add to your ~/.bashrc or ~/.zshrc:
export LANG=en_US.UTF-8
export LC_ALL=en_US.UTF-8
On macOS, the default locale is usually UTF-8 already. Verify with:
echo $LANG
# Should output: en_US.UTF-8 (or similar .UTF-8 locale)
Generating missing locales (Linux)
If en_US.UTF-8 is not available:
sudo locale-gen en_US.UTF-8
sudo update-locale LANG=en_US.UTF-8
Terminal Emulator Configuration
The terminal emulator is the application that displays the grid of characters. Modern terminal emulators have excellent Unicode support, but older ones may need configuration.
| Terminal | Unicode Support | Configuration |
|---|---|---|
| iTerm2 (macOS) | Excellent | Preferences > Profiles > Terminal > Unicode |
| Windows Terminal | Excellent | Settings > Profiles > Appearance |
| GNOME Terminal | Excellent | Set > Character Encoding > UTF-8 |
| Alacritty | Excellent | Built-in UTF-8, no configuration needed |
| Kitty | Excellent | Native Unicode rendering, grapheme clusters |
| WezTerm | Excellent | Full Unicode and emoji support |
| PuTTY | Good | Connection > Data > set UTF-8 |
| cmd.exe (Windows) | Limited | chcp 65001 for UTF-8 |
Windows cmd.exe and PowerShell
Historically, the Windows console was limited to the system's active code page (e.g., 437 for US English, 932 for Japanese). To enable UTF-8:
# In cmd.exe
chcp 65001
# In PowerShell
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
$OutputEncoding = [System.Text.Encoding]::UTF8
Windows Terminal (the modern replacement) defaults to UTF-8 and handles Unicode far better than the legacy console host.
Font Selection
Even with correct encoding and locale, characters will appear as empty boxes or question marks if the terminal font does not contain the necessary glyphs.
Recommended terminal fonts
| Font | Coverage | Notes |
|---|---|---|
| JetBrains Mono NF | Latin, Cyrillic, Greek + Nerd Font icons | Popular for development |
| Fira Code | Latin, Cyrillic, Greek + ligatures | Programming ligatures |
| Noto Sans Mono | Nearly all Unicode scripts | Best coverage |
| Cascadia Code | Latin, Cyrillic, Greek | Microsoft's modern mono font |
| DejaVu Sans Mono | Latin, Cyrillic, Greek, many symbols | Good general coverage |
| Hack | Latin + extended symbols | Clean and readable |
Nerd Fonts
Nerd Fonts are patched versions of popular programming fonts that add thousands of icons (from Font Awesome, Devicons, Powerline, etc.) into the Private Use Area (PUA) of Unicode. Many terminal themes (Oh My Zsh, Starship, Powerlevel10k) require a Nerd Font for icons to display.
Font fallback in terminals
Most modern terminals support font fallback — if the primary font lacks a glyph, the terminal checks secondary fonts:
| Terminal | Fallback Support |
|---|---|
| iTerm2 | Automatic + configurable non-ASCII font |
| Windows Terminal | System font fallback chain |
| Kitty | symbol_map directive for custom fallback |
| Alacritty | No built-in fallback (relies on fontconfig) |
In iTerm2, you can set a separate Non-ASCII Font under Profiles > Text. This is useful for pairing a Latin programming font with a CJK or symbol font.
Character Width: Half-Width vs Full-Width
CJK characters (Chinese, Japanese, Korean) and certain symbols occupy two columns in the terminal grid, while Latin characters occupy one. This is called the East Asian Width property defined in Unicode.
| Width | Columns | Examples |
|---|---|---|
| Narrow (Na) | 1 | A, B, 1, $, most Latin/Cyrillic |
| Wide (W) | 2 | 漢, 字, ア, 가 (CJK ideographs, katakana, hangul) |
| Fullwidth (F) | 2 | A, B (fullwidth Latin letters) |
| Halfwidth (H) | 1 | ア, イ (halfwidth katakana) |
| Ambiguous (A) | 1 or 2 | ①, ②, some Greek (depends on context) |
The Ambiguous Width Problem
Characters with Ambiguous width are the source of many alignment issues. In East Asian contexts, they are traditionally displayed as wide (2 columns), but in Western contexts as narrow (1 column). Terminals must choose, and this often causes misalignment in table-formatted output.
Most terminals default to treating Ambiguous-width characters as narrow. iTerm2 offers a "Ambiguous characters are double-width" option under Profiles > Text.
Emoji in the Terminal
Emoji are increasingly common in terminal output (git status messages, log formatters, prompt themes). However, emoji present special challenges:
| Challenge | Details |
|---|---|
| Width | Most emoji are wide (2 columns) but some terminals miscalculate |
| Color vs monochrome | Some terminals render color emoji, others show monochrome |
| Sequences | Family emoji (ZWJ sequences) may render as multiple characters |
| Skin tones | Some terminals do not support modifier sequences |
Terminal emoji support
| Terminal | Color Emoji | ZWJ Sequences |
|---|---|---|
| iTerm2 | Yes | Partial |
| Windows Terminal | Yes | Yes |
| Kitty | Yes (via protocol) | Yes |
| WezTerm | Yes | Yes |
| GNOME Terminal | Partial | Partial |
| Alacritty | Monochrome only | No |
Common Problems and Fixes
Problem: Characters show as diamonds with question marks
Cause: The terminal is decoding bytes as Latin-1 but the data is UTF-8.
Fix: Set the terminal encoding to UTF-8 and ensure LANG is set to a .UTF-8 locale.
Problem: Characters show as empty boxes
Cause: The font does not contain glyphs for those characters. Fix: Switch to a font with broader coverage (Noto Sans Mono) or configure font fallback.
Problem: CJK characters misalign columns in tables
Cause: The application and terminal disagree on character widths.
Fix: Use the unicodedata.east_asian_width() function in Python or the wcwidth
library to calculate display widths correctly.
Problem: SSH session shows garbled text
Cause: The remote server locale does not match the local terminal encoding. Fix: Ensure both local and remote systems use UTF-8:
# On remote server
export LANG=en_US.UTF-8
export LC_ALL=en_US.UTF-8
Also ensure your SSH client forwards the locale: check that SendEnv LANG LC_* is in
your SSH config and AcceptEnv LANG LC_* is in the server's sshd_config.
Verifying Unicode Support
A quick diagnostic you can run in any terminal:
# Print various Unicode characters
printf 'ASCII: Hello\n'
printf 'Latin: cafe\xcc\x81\n'
printf 'Greek: \xce\x91\xce\xbb\xcf\x86\xce\xb1\n'
printf 'CJK: \xe4\xb8\xad\xe6\x96\x87\n'
printf 'Emoji: \xf0\x9f\x98\x80\n'
printf 'Arrows: \xe2\x86\x90 \xe2\x86\x91 \xe2\x86\x92 \xe2\x86\x93\n'
If any line shows replacement characters or boxes, you have a font or encoding issue at that layer.
Key Takeaways
- Unicode in the terminal depends on a chain: application encoding, shell, terminal emulator, and font must all support UTF-8.
- Set
LANG=en_US.UTF-8(or equivalent) in your shell profile — this is the single most important configuration for terminal Unicode support. - Choose a terminal font with broad Unicode coverage (Noto Sans Mono, JetBrains Mono NF) and configure font fallback for scripts your primary font does not cover.
- East Asian Width causes column alignment issues — use libraries like
wcwidthto calculate display widths correctly. - Modern terminals (iTerm2, Windows Terminal, Kitty, WezTerm) have excellent Unicode and emoji support. Upgrade from legacy terminals (cmd.exe, older xterm) if possible.
Platform Guides のその他のガイド
Microsoft Word supports the full Unicode character set and provides several methods …
Google Docs and Sheets use UTF-8 internally and provide a Special Characters …
PDF supports Unicode text through embedded fonts and ToUnicode maps, but many …
Microsoft Excel stores text in Unicode but has historically struggled with non-Latin …
Social media platforms handle Unicode text with varying degrees of support, affecting …
Both XML and JSON are defined to use Unicode text, but each …
Natural language processing and data science pipelines frequently encounter Unicode issues including …
QR codes can encode Unicode text using UTF-8, but many QR code …
Allowing Unicode characters in passwords increases the keyspace and can improve security, …