East Asian Width
Unicode property (UAX#11) classifying characters as Narrow, Wide, Fullwidth, Halfwidth, Ambiguous, or Neutral. Wide characters (CJK ideographs, katakana) occupy two columns in terminal emulators.
What is East Asian Width?
East Asian Width is a Unicode character property defined in UAX #11 — East Asian Width that classifies characters according to the display width they should occupy in fixed-width (monospaced) rendering environments, particularly traditional East Asian terminals and text layouts. The property answers the question: "Does this character occupy one column or two columns when displayed in a terminal or monospaced layout?"
The property was introduced because East Asian scripts — Chinese, Japanese, Korean — were historically displayed at twice the width of ASCII characters on fixed-pitch terminals. Mixing ASCII and CJK text in a single terminal line required a consistent model for how much horizontal space each character would consume.
The Six Width Categories
| Category | Property Value | Description | Examples |
|---|---|---|---|
| Narrow | N | ASCII and most Latin/Greek/Cyrillic — one column | A, a, 1, @ |
| Wide | W | Most CJK ideographs, Hangul syllables — two columns | 漢, 가, ア |
| Fullwidth | F | ASCII-range characters in their fullwidth CJK form — two columns | A, 1, ! |
| Halfwidth | H | Katakana and Hangul in their halfwidth (legacy) form — one column | ア, ヲ |
| Ambiguous | A | Characters that are narrow in Western contexts but wide in some East Asian contexts | © , ☆, α |
| Neutral | N | Non-East-Asian-specific characters with no width ambiguity, typically narrow | Arrows, math operators |
Terminal Implications
In a terminal emulator, the renderer must know the East Asian Width of every character to correctly advance the cursor. If a Wide or Fullwidth character is assumed to be narrow, subsequent characters will overwrite existing content, causing display corruption.
The POSIX standard function wcwidth() (from <wchar.h>) returns 0 for combining characters, 1 for narrow characters, and 2 for wide characters. Modern terminal emulators implement wcwidth() based on UAX #11 data.
# Python: get East Asian Width property
import unicodedata
def display_width(char: str) -> int:
# Return terminal display width of a single character.
eaw = unicodedata.east_asian_width(char)
return 2 if eaw in ("W", "F") else 1
# Examples
display_width("A") # → 1 (Narrow)
display_width("漢") # → 2 (Wide)
display_width("A") # → 2 (Fullwidth)
display_width("ア") # → 1 (Halfwidth)
The wcwidth Python package provides a conformant implementation updated with each Unicode release.
The Ambiguous Category
Characters with Ambiguous (A) width are the most problematic in practice. Their width is context-dependent:
- In an East Asian context (a terminal set to a CJK locale), they display as Wide (2 columns)
- In a Western context, they display as Narrow (1 column)
This affects many common symbols: degree sign (°), copyright symbol (©), Greek letters (α, β), and many box-drawing characters. Terminal emulators that serve mixed-locale user bases must make a policy choice, and mismatches between the application and terminal settings cause visible misalignment.
Quick Facts
| Property | Value |
|---|---|
| Defined in | UAX #11 — East Asian Width |
| Number of categories | 6 (Narrow, Wide, Fullwidth, Halfwidth, Ambiguous, Neutral) |
| Two-column characters | Wide (W) and Fullwidth (F) |
| Most problematic category | Ambiguous (A) — context-dependent |
| POSIX function | wcwidth() — returns 0, 1, or 2 |
| Python stdlib | unicodedata.east_asian_width(char) |
| Python package | wcwidth (conformant, kept updated) |
相关术语
字符属性 中的更多内容
字符首次被分配时所在的Unicode版本,有助于判断各系统和软件版本的字符支持情况。
Unicode property controlling how Arabic and Syriac characters connect to adjacent characters. …
Unicode property listing all scripts that use a character, broader than the …
将每个码位归入30个类别(Lu、Ll、Nd、So等)之一的分类体系,分为7大类:字母、标记、数字、标点、符号、分隔符和其他。
具有相同抽象内容但外观可能不同的两个字符序列,比规范等价更宽泛,例如fi ≈ fi,² ≈ 2。
将字符映射为其组成部分的过程。规范分解保留语义(é → e + ◌́),兼容分解可能改变语义(fi → fi)。
命名的连续码位范围(如基本拉丁文 = U+0000–U+007F)。Unicode 16.0定义了336个区块,每个码位恰好属于一个区块。
决定字符在双向文本中(LTR、RTL、弱、中性)行为方式的属性,由Unicode双向算法用于确定显示顺序。
由于稳定性策略规定Unicode名称不可更改,因此提供字符的备用名称,用于更正、缩写和别名。
将字符在大写、小写和标题大小写之间转换的规则,可能因区域设置而异(土耳其语I问题),也存在一对多映射(ß → SS)。