จีน ญี่ปุ่น และเกาหลี คำรวมสำหรับบล็อกอักษรจีน Han ที่รวมกันและอักษรที่เกี่ยวข้องใน Unicode CJK Unified Ideographs มีอักขระมากกว่า 20,992 ตัว

What is หมวดหมู่ทั่วไป?

การจัดประเภทจุดรหัสทุกจุดเป็นหนึ่งใน 30 หมวดหมู่ (Lu, Ll, Nd, So ฯลฯ) จัดกลุ่มเป็น 7 คลาสหลัก: ตัวอักษร เครื่องหมาย ตัวเลข เครื่องหมายวรรคตอน สัญลักษณ์ ตัวแบ่ง และอื่นๆ

What is Unicode Standard Annex (UAX)?

Normative or informative documents that are integral parts of the Unicode Standard. UAX#9 (Bidi Algorithm), UAX#11 (East Asian Width), UAX#15 (Normalization Forms) are key examples.

คุณสมบัติ

East Asian Width

Unicode property (UAX#11) classifying characters as Narrow, Wide, Fullwidth, Halfwidth, Ambiguous, or Neutral. Wide characters (CJK ideographs, katakana) occupy two columns in terminal emulators.

What is East Asian Width?

East Asian Width is a Unicode character property defined in UAX #11 — East Asian Width that classifies characters according to the display width they should occupy in fixed-width (monospaced) rendering environments, particularly traditional East Asian terminals and text layouts. The property answers the question: "Does this character occupy one column or two columns when displayed in a terminal or monospaced layout?"

The property was introduced because East Asian scripts — Chinese, Japanese, Korean — were historically displayed at twice the width of ASCII characters on fixed-pitch terminals. Mixing ASCII and CJK text in a single terminal line required a consistent model for how much horizontal space each character would consume.

The Six Width Categories

Category	Property Value	Description	Examples
Narrow	N	ASCII and most Latin/Greek/Cyrillic — one column	A, a, 1, @
Wide	W	Most CJK ideographs, Hangul syllables — two columns	漢, 가, ア
Fullwidth	F	ASCII-range characters in their fullwidth CJK form — two columns	Ａ, １, ！
Halfwidth	H	Katakana and Hangul in their halfwidth (legacy) form — one column	ｱ, ｦ
Ambiguous	A	Characters that are narrow in Western contexts but wide in some East Asian contexts	© , ☆, α
Neutral	N	Non-East-Asian-specific characters with no width ambiguity, typically narrow	Arrows, math operators

Terminal Implications

In a terminal emulator, the renderer must know the East Asian Width of every character to correctly advance the cursor. If a Wide or Fullwidth character is assumed to be narrow, subsequent characters will overwrite existing content, causing display corruption.

The POSIX standard function wcwidth() (from <wchar.h>) returns 0 for combining characters, 1 for narrow characters, and 2 for wide characters. Modern terminal emulators implement wcwidth() based on UAX #11 data.

# Python: get East Asian Width property
import unicodedata

def display_width(char: str) -> int:
    # Return terminal display width of a single character.
    eaw = unicodedata.east_asian_width(char)
    return 2 if eaw in ("W", "F") else 1

# Examples
display_width("A")   # → 1 (Narrow)
display_width("漢")  # → 2 (Wide)
display_width("Ａ")  # → 2 (Fullwidth)
display_width("ｱ")  # → 1 (Halfwidth)

The wcwidth Python package provides a conformant implementation updated with each Unicode release.

The Ambiguous Category

Characters with Ambiguous (A) width are the most problematic in practice. Their width is context-dependent:

In an East Asian context (a terminal set to a CJK locale), they display as Wide (2 columns)
In a Western context, they display as Narrow (1 column)

This affects many common symbols: degree sign (°), copyright symbol (©), Greek letters (α, β), and many box-drawing characters. Terminal emulators that serve mixed-locale user bases must make a policy choice, and mismatches between the application and terminal settings cause visible misalignment.

Quick Facts

Property	Value
Defined in	UAX #11 — East Asian Width
Number of categories	6 (Narrow, Wide, Fullwidth, Halfwidth, Ambiguous, Neutral)
Two-column characters	Wide (W) and Fullwidth (F)
Most problematic category	Ambiguous (A) — context-dependent
POSIX function	`wcwidth()` — returns 0, 1, or 2
Python stdlib	`unicodedata.east_asian_width(char)`
Python package	`wcwidth` (conformant, kept updated)

คำศัพท์ที่เกี่ยวข้อง

CJK หมวดหมู่ทั่วไป Unicode Standard Annex (UAX)

เพิ่มเติมใน คุณสมบัติ

Joining Type

Unicode property controlling how Arabic and Syriac characters connect to adjacent characters. …

Script Extensions

Unicode property listing all scripts that use a character, broader than the …

กลุ่มกราฟีม

อักขระที่ผู้ใช้รับรู้ได้ — สิ่งที่รู้สึกเหมือนหน่วยเดียว อาจประกอบด้วยหลายจุดรหัส (ฐาน + เครื่องหมายรวม หรือลำดับ emoji ZWJ) 👩‍💻 = …

การแมปตัวพิมพ์

กฎสำหรับแปลงอักขระระหว่างตัวพิมพ์ใหญ่ ตัวพิมพ์เล็ก และตัวพิมพ์หัวเรื่อง อาจขึ้นอยู่กับ locale (ปัญหาตัว I ในภาษาตุรกี) และอาจเป็นแบบหนึ่ง-ต่อ-หลาย (ß → SS)

การแยกส่วน

การแมปอักขระเป็นส่วนประกอบย่อย การแยกส่วนแบบ canonical รักษาความหมาย (é → e + ́) ในขณะที่การแยกส่วนแบบ compatibility อาจเปลี่ยนความหมาย …

คลาสการรวม

ค่าตัวเลข (0–254) ที่ควบคุมลำดับของเครื่องหมายรวมระหว่างการแยกส่วนแบบ canonical กำหนดว่าเครื่องหมายรวมใดสามารถเรียงลำดับใหม่ได้

ความสมมูลความเข้ากันได้

ลำดับอักขระสองชุดที่มีเนื้อหาเชิงนามธรรมเดียวกันแต่อาจแตกต่างในรูปลักษณ์ กว้างกว่าความเท่าเทียมแบบ canonical ตัวอย่าง: ﬁ ≈ fi, ² ≈ 2

ความสมมูลมาตรฐาน

ลำดับอักขระสองชุดที่มีความหมายเหมือนกันและควรถือว่าเท่าเทียมกัน ตัวอย่าง: é (U+00E9) ≡ e + ◌́ (U+0065 + U+0301)

คุณสมบัติการสะท้อน

อักขระที่รูปร่างควรสะท้อนในแนวนอนในบริบท RTL ตัวอย่าง: ( → ), [ → ], { → }, …

คุณสมบัติเวอร์ชัน

เวอร์ชัน Unicode ที่มีการกำหนดอักขระเป็นครั้งแรก มีประโยชน์สำหรับการตรวจสอบการรองรับอักขระในระบบและซอฟต์แวร์เวอร์ชันต่างๆ

← กลับไปยังอภิธานศัพท์