What is หมวดหมู่ทั่วไป?

การจัดประเภทจุดรหัสทุกจุดเป็นหนึ่งใน 30 หมวดหมู่ (Lu, Ll, Nd, So ฯลฯ) จัดกลุ่มเป็น 7 คลาสหลัก: ตัวอักษร เครื่องหมาย ตัวเลข เครื่องหมายวรรคตอน สัญลักษณ์ ตัวแบ่ง และอื่นๆ

What is ยัติภังค์?

เครื่องหมายวรรคตอนที่ใช้แยกส่วนของประโยคหรือระบุช่วง Unicode กำหนดเส้นประหลายแบบ ได้แก่ hyphen (‐), en dash (–), em dash (—), figure dash (‒) และอื่นๆ

What is เครื่องหมายคำพูด?

เครื่องหมายวรรคตอนคู่ที่ล้อมรอบคำพูดตรงหรือข้อความอ้างอิง Unicode รวมถึงเครื่องหมายตรง ("") เครื่องหมายโค้ง ("") guillemets (« ») วงเล็บมุม CJK (「」) และรูปแบบเฉพาะของแต่ละภาษา

คุณสมบัติ

เครื่องหมายวรรคตอน

อักขระที่ใช้จัดระเบียบและชี้แจงภาษาเขียน ได้แก่ จุด ลูกน้ำ เส้นประ เครื่องหมายคำพูด และอื่นๆ หมวดหมู่ทั่วไป Unicode P ครอบคลุมเครื่องหมายวรรคตอนทั้งหมด

2022-07-30 · Updated 2024-12-19

What Is Unicode Punctuation?

Unicode organizes punctuation characters under the top-level General Category P (Punctuation), subdivided into six specific categories. Punctuation encompasses marks used to structure written language: paired delimiters, dashes, connectors, and miscellaneous marks that vary widely across the world's writing systems.

Unlike Latin-centric definitions of punctuation, Unicode's coverage includes CJK ideographic punctuation, Arabic quotation marks, Ethiopic wordspace, and hundreds of other culturally specific marks.

The Six Punctuation Subcategories

Code	Name	Examples
Pc	Connector Punctuation	_ (LOW LINE), ‿ (UNDERTIE)
Pd	Dash Punctuation	- (HYPHEN-MINUS), – (EN DASH), — (EM DASH), ― (HORIZONTAL BAR)
Ps	Open Punctuation	( [ { ⟨ 「《
Pe	Close Punctuation	) ] } ⟩ 」》
Pi	Initial Punctuation	" (LEFT DOUBLE QUOTATION MARK) « (LEFT-POINTING DOUBLE ANGLE QUOTATION MARK)
Pf	Final Punctuation	" (RIGHT DOUBLE QUOTATION MARK) » (RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK)
Po	Other Punctuation	! . , : ; ? @ # % & * / \ (and many more)

import unicodedata

punctuation_samples = [
    ("_",  "Pc - connector"),
    ("-",  "Pd - hyphen-minus"),
    ("–",  "Pd - en dash U+2013"),
    ("(",  "Ps - open paren"),
    (")",  "Pe - close paren"),
    ("「", "Ps - CJK left corner bracket"),
    ("」", "Pe - CJK right corner bracket"),
    ("\u201C", "Pi - left double quotation"),
    ("\u201D", "Pf - right double quotation"),
    ("«",  "Pi - left-pointing angle quotation"),
    ("»",  "Pf - right-pointing angle quotation"),
    ("。", "Po - CJK ideographic full stop"),
    ("，", "Po - fullwidth comma"),
    ("!",  "Po - exclamation mark"),
]

for char, label in punctuation_samples:
    gc = unicodedata.category(char)
    print(f"  U+{ord(char):04X}  {char}  {gc}  {label}")

CJK and Fullwidth Punctuation

East Asian writing uses a distinct set of punctuation marks designed for fullwidth (double-byte) layout:

。 (U+3002) IDEOGRAPHIC FULL STOP — sentence terminator in Chinese and Japanese
、 (U+3001) IDEOGRAPHIC COMMA — list separator
「」 (U+300C, U+300D) — Japanese corner brackets for quotation
『』 (U+300E, U+300F) — double corner brackets for nested quotation
· (U+00B7) vs ・ (U+30FB) — Latin middle dot vs Katakana middle dot

Fullwidth versions of ASCII punctuation (U+FF01–U+FF0F) are compatibility equivalents of their halfwidth counterparts; NFKC normalization maps them back to ASCII.

Distinguishing Similar Punctuation

Many punctuation characters look similar but have different code points, names, and semantic roles:

Character	Code Point	Name
-	U+002D	HYPHEN-MINUS
‐	U+2010	HYPHEN
–	U+2013	EN DASH
—	U+2014	EM DASH
―	U+2015	HORIZONTAL BAR
'	U+0027	APOSTROPHE
'	U+2018	LEFT SINGLE QUOTATION MARK
'	U+2019	RIGHT SINGLE QUOTATION MARK

Smart quote substitution in word processors converts ASCII ' and " to their Pi/Pf counterparts. This can cause problems in code contexts where the apostrophe is syntactically significant.

Quick Facts

Property	Value
General Category group	P (Punctuation)
Subcategories	Pc, Pd, Ps, Pe, Pi, Pf, Po (7)
Python function	`unicodedata.category(char)` starting with `P`
Regex	`\p{Punctuation}` or `\p{P}` (PCRE/Python regex)
CJK punctuation block	U+3000–U+303F (CJK Symbols and Punctuation)
Fullwidth punctuation	U+FF01–U+FF60 (compatibility equivalents)
Spec reference	Unicode Standard Chapter 6

คำศัพท์ที่เกี่ยวข้อง

หมวดหมู่ทั่วไป ยัติภังค์ เครื่องหมายคำพูด

เพิ่มเติมใน คุณสมบัติ

East Asian Width

Unicode property (UAX#11) classifying characters as Narrow, Wide, Fullwidth, Halfwidth, Ambiguous, or …

Joining Type

Unicode property controlling how Arabic and Syriac characters connect to adjacent characters. …

Script Extensions

Unicode property listing all scripts that use a character, broader than the …

กลุ่มกราฟีม

อักขระที่ผู้ใช้รับรู้ได้ — สิ่งที่รู้สึกเหมือนหน่วยเดียว อาจประกอบด้วยหลายจุดรหัส (ฐาน + เครื่องหมายรวม หรือลำดับ emoji ZWJ) 👩‍💻 = …

การแมปตัวพิมพ์

กฎสำหรับแปลงอักขระระหว่างตัวพิมพ์ใหญ่ ตัวพิมพ์เล็ก และตัวพิมพ์หัวเรื่อง อาจขึ้นอยู่กับ locale (ปัญหาตัว I ในภาษาตุรกี) และอาจเป็นแบบหนึ่ง-ต่อ-หลาย (ß → SS)

การแยกส่วน

การแมปอักขระเป็นส่วนประกอบย่อย การแยกส่วนแบบ canonical รักษาความหมาย (é → e + ́) ในขณะที่การแยกส่วนแบบ compatibility อาจเปลี่ยนความหมาย …

คลาสการรวม

ค่าตัวเลข (0–254) ที่ควบคุมลำดับของเครื่องหมายรวมระหว่างการแยกส่วนแบบ canonical กำหนดว่าเครื่องหมายรวมใดสามารถเรียงลำดับใหม่ได้

ความสมมูลความเข้ากันได้

ลำดับอักขระสองชุดที่มีเนื้อหาเชิงนามธรรมเดียวกันแต่อาจแตกต่างในรูปลักษณ์ กว้างกว่าความเท่าเทียมแบบ canonical ตัวอย่าง: ﬁ ≈ fi, ² ≈ 2

ความสมมูลมาตรฐาน

ลำดับอักขระสองชุดที่มีความหมายเหมือนกันและควรถือว่าเท่าเทียมกัน ตัวอย่าง: é (U+00E9) ≡ e + ◌́ (U+0065 + U+0301)

คุณสมบัติการสะท้อน

อักขระที่รูปร่างควรสะท้อนในแนวนอนในบริบท RTL ตัวอย่าง: ( → ), [ → ], { → }, …

← กลับไปยังอภิธานศัพท์