プロパティ

East Asian Width

Unicode property (UAX#11) classifying characters as Narrow, Wide, Fullwidth, Halfwidth, Ambiguous, or Neutral. Wide characters (CJK ideographs, katakana) occupy two columns in terminal emulators.

What is East Asian Width?

East Asian Width is a Unicode character property defined in UAX #11 — East Asian Width that classifies characters according to the display width they should occupy in fixed-width (monospaced) rendering environments, particularly traditional East Asian terminals and text layouts. The property answers the question: "Does this character occupy one column or two columns when displayed in a terminal or monospaced layout?"

The property was introduced because East Asian scripts — Chinese, Japanese, Korean — were historically displayed at twice the width of ASCII characters on fixed-pitch terminals. Mixing ASCII and CJK text in a single terminal line required a consistent model for how much horizontal space each character would consume.

The Six Width Categories

Category Property Value Description Examples
Narrow N ASCII and most Latin/Greek/Cyrillic — one column A, a, 1, @
Wide W Most CJK ideographs, Hangul syllables — two columns 漢, 가, ア
Fullwidth F ASCII-range characters in their fullwidth CJK form — two columns A, 1, !
Halfwidth H Katakana and Hangul in their halfwidth (legacy) form — one column ア, ヲ
Ambiguous A Characters that are narrow in Western contexts but wide in some East Asian contexts © , ☆, α
Neutral N Non-East-Asian-specific characters with no width ambiguity, typically narrow Arrows, math operators

Terminal Implications

In a terminal emulator, the renderer must know the East Asian Width of every character to correctly advance the cursor. If a Wide or Fullwidth character is assumed to be narrow, subsequent characters will overwrite existing content, causing display corruption.

The POSIX standard function wcwidth() (from <wchar.h>) returns 0 for combining characters, 1 for narrow characters, and 2 for wide characters. Modern terminal emulators implement wcwidth() based on UAX #11 data.

# Python: get East Asian Width property
import unicodedata

def display_width(char: str) -> int:
    # Return terminal display width of a single character.
    eaw = unicodedata.east_asian_width(char)
    return 2 if eaw in ("W", "F") else 1

# Examples
display_width("A")   # → 1 (Narrow)
display_width("漢")  # → 2 (Wide)
display_width("A")  # → 2 (Fullwidth)
display_width("ア")  # → 1 (Halfwidth)

The wcwidth Python package provides a conformant implementation updated with each Unicode release.

The Ambiguous Category

Characters with Ambiguous (A) width are the most problematic in practice. Their width is context-dependent:

  • In an East Asian context (a terminal set to a CJK locale), they display as Wide (2 columns)
  • In a Western context, they display as Narrow (1 column)

This affects many common symbols: degree sign (°), copyright symbol (©), Greek letters (α, β), and many box-drawing characters. Terminal emulators that serve mixed-locale user bases must make a policy choice, and mismatches between the application and terminal settings cause visible misalignment.

Quick Facts

Property Value
Defined in UAX #11 — East Asian Width
Number of categories 6 (Narrow, Wide, Fullwidth, Halfwidth, Ambiguous, Neutral)
Two-column characters Wide (W) and Fullwidth (F)
Most problematic category Ambiguous (A) — context-dependent
POSIX function wcwidth() — returns 0, 1, or 2
Python stdlib unicodedata.east_asian_width(char)
Python package wcwidth (conformant, kept updated)

関連用語

プロパティ のその他の用語

Age プロパティ

文字が最初に割り当てられたUnicodeバージョン。システムやソフトウェアバージョン間での文字サポートを判断するのに役立ちます。

Joining Type

Unicode property controlling how Arabic and Syriac characters connect to adjacent characters. …

Script Extensions

Unicode property listing all scripts that use a character, broader than the …

ケースマッピング

文字を大文字・小文字・タイトルケースに変換するルール。ロケール依存の場合があり(トルコ語のI問題)、1対多のマッピングもあります(ß → SS)。

スクリプト

文字が属する文字体系(例:ラテン、キリル、漢字)。Unicode 16.0は168個のスクリプトを定義し、Scriptプロパティはセキュリティと混在スクリプト検出に重要です。

デフォルト無視文字

サポートしていないプロセスで目に見える効果なく無視できる文字で、異体字セレクター・ゼロ幅文字・言語タグなどが含まれます。

ブロック

名前付きの連続したコードポイント範囲(例:基本ラテン = U+0000〜U+007F)。Unicode 16.0は336個のブロックを定義し、すべてのコードポイントはちょうど1つのブロックに属します。

ミラープロパティ

RTLコンテキストでグリフを水平に反転すべき文字。例:( → )、[ → ]、{ → }、« → »。

一般カテゴリー

すべてのコードポイントを30個のカテゴリ(Lu・Ll・Nd・Soなど)の1つに分類する体系で、7つの主要クラス(文字・記号・数字・句読点・記号・区切り・その他)にグループ化されています。

互換等価

同じ抽象的内容を持つが外観が異なる場合がある2つの文字シーケンス。正規等価より広い概念。例:fi ≈ fi、² ≈ 2。