What is อักขระความกว้างเป็นศูนย์?

อักขระที่มีความกว้างศูนย์ ไม่ปรากฏเมื่อเรนเดอร์แต่ส่งผลต่อพฤติกรรมของข้อความ ได้แก่ ZWSP (ตัวแบ่งคำ) ZWJ (ตัวเชื่อม) ZWNJ (ป้องกันการเชื่อม) และ WJ (ป้องกันการแบ่ง)

What is ตัวเลือกรูปแบบ?

อักขระ (U+FE00–U+FE0F, U+E0100–U+E01EF) ที่เลือกรูปแบบ glyph เฉพาะ VS15 (U+FE0E) = การแสดงผลแบบข้อความ VS16 (U+FE0F) = การแสดงผลแบบ emoji

คุณสมบัติ

ละเว้นได้โดยค่าเริ่มต้น

อักขระที่ไม่มีผลที่มองเห็นได้และสามารถละเว้นได้โดยกระบวนการที่ไม่รองรับ รวมถึง variation selectors อักขระ zero-width และแท็กภาษา

2022-05-30 · Updated 2024-08-12

What Are Default Ignorable Code Points?

A Default Ignorable Code Point is a character that should, by default, produce no visible glyph and no advance width when rendered. These characters exist to convey invisible semantic information—joining behavior, direction control, variation selection—without disturbing the visual flow of text when a renderer does not support them.

The rule is: if a process does not recognize or support a default ignorable character, it should silently discard it rather than display a replacement box (□) or a question mark. This allows documents using advanced Unicode features to degrade gracefully on older or simpler systems.

Important Default Ignorable Characters

Code Point	Name	Use
U+00AD	SOFT HYPHEN (SHY)	Line-break hint; invisible unless break occurs
U+034F	COMBINING GRAPHEME JOINER	Prevents canonical reordering
U+200B	ZERO WIDTH SPACE	Line-break opportunity with no width
U+200C	ZERO WIDTH NON-JOINER (ZWNJ)	Prevents cursive joining in Arabic/Persian
U+200D	ZERO WIDTH JOINER (ZWJ)	Forces cursive joining; used in emoji sequences
U+2060	WORD JOINER	Like NBSP but with no width
U+2061–U+2064	Function Application, etc.	Mathematical invisible operators
U+FE00–U+FE0F	Variation Selectors 1–16	Select text vs. emoji presentation
U+E0000–U+E01EF	Tags	Language tags (now largely deprecated)

# ZWJ is used to combine emoji into sequences
family_emoji = "\U0001F468\u200D\U0001F469\u200D\U0001F467"
# MAN + ZWJ + WOMAN + ZWJ + GIRL = 👨‍👩‍👧

print(len(family_emoji))           # 5 code points (including 2 ZWJ)
print(family_emoji)                # Renders as single family emoji on supported systems

# ZWNJ prevents Arabic ligature formation
# ك + ZWNJ + ا → kaf and alef do NOT join
# ك + ا           → normal: join into ـكا

# Soft hyphen: invisible but marks a valid break point
word = "antidis\u00ADestablishment\u00ADarianism"
print(word)     # Visible on most renderers without hyphens
print(len(word))  # 27 code points including 2 SHY

Testing for Default Ignorable

The Unicode property Default_Ignorable_Code_Point (DI) is a derived property. Characters with DI=Yes form a set that includes not just control and format characters but also many reserved code points in the Specials and Tag blocks.

# Using the 'regex' package for property-based matching
import regex
di_pattern = regex.compile(r'\p{Default_Ignorable_Code_Point}')

test = "Hello\u200BWorld"   # contains ZWSP
matches = di_pattern.findall(test)
print(f"Found {len(matches)} default ignorable character(s)")
# Found 1 default ignorable character(s)

Quick Facts

Property	Value
Unicode property name	`Default_Ignorable_Code_Point`
Short alias	`DI`
Type	Boolean
Expected renderer behavior	Produce no glyph, no width
Key characters	ZWJ (U+200D), ZWNJ (U+200C), VS1–VS16, SHY
Python built-in	No direct support; use `regex` package
Spec reference	Unicode Standard Section 5.21, `DerivedCoreProperties.txt`

คำศัพท์ที่เกี่ยวข้อง

อักขระความกว้างเป็นศูนย์ ตัวเลือกรูปแบบ

เพิ่มเติมใน คุณสมบัติ

East Asian Width

Unicode property (UAX#11) classifying characters as Narrow, Wide, Fullwidth, Halfwidth, Ambiguous, or …

Joining Type

Unicode property controlling how Arabic and Syriac characters connect to adjacent characters. …

Script Extensions

Unicode property listing all scripts that use a character, broader than the …

กลุ่มกราฟีม

อักขระที่ผู้ใช้รับรู้ได้ — สิ่งที่รู้สึกเหมือนหน่วยเดียว อาจประกอบด้วยหลายจุดรหัส (ฐาน + เครื่องหมายรวม หรือลำดับ emoji ZWJ) 👩‍💻 = …

การแมปตัวพิมพ์

กฎสำหรับแปลงอักขระระหว่างตัวพิมพ์ใหญ่ ตัวพิมพ์เล็ก และตัวพิมพ์หัวเรื่อง อาจขึ้นอยู่กับ locale (ปัญหาตัว I ในภาษาตุรกี) และอาจเป็นแบบหนึ่ง-ต่อ-หลาย (ß → SS)

การแยกส่วน

การแมปอักขระเป็นส่วนประกอบย่อย การแยกส่วนแบบ canonical รักษาความหมาย (é → e + ́) ในขณะที่การแยกส่วนแบบ compatibility อาจเปลี่ยนความหมาย …

คลาสการรวม

ค่าตัวเลข (0–254) ที่ควบคุมลำดับของเครื่องหมายรวมระหว่างการแยกส่วนแบบ canonical กำหนดว่าเครื่องหมายรวมใดสามารถเรียงลำดับใหม่ได้

ความสมมูลความเข้ากันได้

ลำดับอักขระสองชุดที่มีเนื้อหาเชิงนามธรรมเดียวกันแต่อาจแตกต่างในรูปลักษณ์ กว้างกว่าความเท่าเทียมแบบ canonical ตัวอย่าง: ﬁ ≈ fi, ² ≈ 2

ความสมมูลมาตรฐาน

ลำดับอักขระสองชุดที่มีความหมายเหมือนกันและควรถือว่าเท่าเทียมกัน ตัวอย่าง: é (U+00E9) ≡ e + ◌́ (U+0065 + U+0301)

คุณสมบัติการสะท้อน

อักขระที่รูปร่างควรสะท้อนในแนวนอนในบริบท RTL ตัวอย่าง: ( → ), [ → ], { → }, …

← กลับไปยังอภิธานศัพท์