What is เครื่องหมายกำกับเสียง?

เครื่องหมายที่เพิ่มบนตัวอักษรเพื่อเปลี่ยนการออกเสียงหรือความหมาย อาจเป็นแบบ precomposed (é U+00E9) หรือ combining (e + ◌́ U+0065+U+0301) ได้แก่ เครื่องหมายเน้นเสียง umlaut cedilla และ tilde

What is กลุ่มกราฟีม?

อักขระที่ผู้ใช้รับรู้ได้ — สิ่งที่รู้สึกเหมือนหน่วยเดียว อาจประกอบด้วยหลายจุดรหัส (ฐาน + เครื่องหมายรวม หรือลำดับ emoji ZWJ) 👩💻 = 3 จุดรหัส, 1 grapheme

What is คลาสการรวม?

ค่าตัวเลข (0–254) ที่ควบคุมลำดับของเครื่องหมายรวมระหว่างการแยกส่วนแบบ canonical กำหนดว่าเครื่องหมายรวมใดสามารถเรียงลำดับใหม่ได้

การออกแบบตัวอักษร

อักขระรวม

อักขระที่ติดกับอักขระฐานก่อนหน้าเพื่อปรับเปลี่ยนมัน หมวดหมู่ทั่วไป: Mn (nonspacing), Mc (spacing combining), Me (enclosing) ตัวอย่าง: ◌́ (U+0301 Combining Acute)

2023-04-10 · Updated 2024-07-04

What is a Combining Character?

A combining character is a Unicode character that has no independent visual form of its own — instead, it attaches to and modifies the preceding character (called the base character). Combining characters implement diacritical marks, tone marks, vowel signs, and other modifier symbols that, in Unicode's model, are logically separate from the letter they modify.

The key insight is that Unicode separates the identity of a character from its rendering. A letter with a diacritic can be represented either as a single precomposed code point (é = U+00E9) or as a base letter followed by a combining mark (e + ◌́ = U+0065 U+0301). Both sequences represent the same abstract character and produce the same rendered output.

How Combining Characters Work

Combining characters have a General Category of Mn (Non-spacing Mark), Mc (Spacing Mark), or Me (Enclosing Mark). Non-spacing marks are the most common — they occupy zero advance width and position themselves relative to the base character's glyph using the font's anchor points.

When a text shaping engine encounters a base character followed by one or more combining marks, it: 1. Retrieves the base glyph from the font 2. Positions each combining glyph at the appropriate anchor (top, bottom, left, right) 3. Renders them as a single grapheme cluster

Multiple combining characters can stack on a single base:

a + ◌̂ + ◌̄ = â̄  (a with circumflex and macron)

Unicode defines a canonical ordering for combining marks based on their Combining Class value (0–255). Marks with lower combining class values (e.g., below-base nuktas) appear before marks with higher values (e.g., above-base accents) in normalized text.

Important Combining Character Ranges

Range	Name	Contents
U+0300–U+036F	Combining Diacritical Marks	Accents, umlauts, tildes
U+0591–U+05C7	Hebrew Cantillation/Vowels	Nikud, cantillation marks
U+064B–U+065F	Arabic Diacritics	Harakat (vowel marks)
U+1AB0–U+1AFF	Combining Diacritical Marks Extended	Extended phonetic use
U+1DC0–U+1DFF	Combining Diacritical Marks Supplement	Additional marks
U+20D0–U+20FF	Combining Diacritical Marks for Symbols	Used with math symbols

Grapheme Clusters

A base character plus all its combining marks form a grapheme cluster — the unit that users perceive as a single character. Programming languages must account for this:

import unicodedata

# "e" + combining acute = é
s = "e\u0301"
print(len(s))           # 2 (two code points)
print(s)                # é (looks like 1 character)

# Precomposed é
s2 = "\u00e9"
print(len(s2))          # 1 (one code point)
print(s == s2)          # False! Different code points

# Normalize to compare
import unicodedata
print(unicodedata.normalize("NFC", s) == s2)  # True

JavaScript's Intl.Segmenter and Swift's String.count handle grapheme clusters correctly; many other APIs count code points instead.

Quick Facts

Property	Value
Unicode category	Mn (Non-spacing Mark), Mc (Spacing Mark), Me (Enclosing Mark)
Main combining block	U+0300–U+036F (112 characters)
Combining class range	0 (base) to 255 (various positions)
Max stacking	No hard limit; practical fonts support 2–4 layers
Normalization	NFC = precomposed preferred; NFD = fully decomposed
Grapheme cluster API	Python: `regex` module; JS: `Intl.Segmenter`; Swift: `String`
Visual indicator in charts	Often shown as ◌ (dotted circle) placeholder

คำศัพท์ที่เกี่ยวข้อง

เครื่องหมายกำกับเสียง กลุ่มกราฟีม คลาสการรวม

เพิ่มเติมใน การออกแบบตัวอักษร

CSS unicode-range

CSS @font-face descriptor specifying which Unicode code points a font should cover. …

Em / En (หน่วยวรรณศิลป์)

Em: ความกว้างเท่ากับขนาดฟอนต์ En: ครึ่งหนึ่งของ em ใช้กำหนดความกว้างของ em dash, em space, en space …

Font Fallback

The mechanism by which a rendering engine substitutes glyphs from a secondary …

OpenType

Modern font format developed by Microsoft and Adobe supporting up to 65,535 …

RTL (Right-to-Left)

ทิศทางของข้อความที่อักขระไหลจากขวาไปซ้าย ใช้กับภาษาอาหรับ ฮีบรู Thaana และอักษรอื่นๆ ต้องใช้อัลกอริทึม Bidirectional เพื่อแสดงผลอย่างถูกต้อง

Web Fonts

Fonts downloaded by the browser to render text, declared via CSS @font-face. …

การปรับระยะอักษร

การปรับระยะห่างระหว่างคู่อักขระเฉพาะเพื่อความสวยงามทางสายตา (เช่น AV, To, LT) เป็นคุณสมบัติของฟอนต์ ไม่ใช่แนวคิด Unicode แต่มีผลต่อการแสดงผลข้อความ Unicode

จุดไข่ปลา

U+2026 HORIZONTAL ELLIPSIS (…) อักขระเดี่ยวที่แทนที่จุดสามจุด ถูกต้องตามหลักการพิมพ์และนับเป็น 1 อักขระแทนที่จะเป็น 3

ช่องว่างที่ไม่ตัดบรรทัด

U+00A0 ช่องว่างที่ป้องกันการขึ้นบรรทัดใหม่ที่ตำแหน่งนั้น HTML:   ใช้ระหว่างตัวเลขและหน่วย (100 km) ในชื่อเฉพาะ (Mr. Smith) และหลังคำย่อ

ตัวพิมพ์ใหญ่ขนาดเล็ก

รูปแบบตัวพิมพ์ใหญ่ที่มีความสูงเท่ากับตัวพิมพ์เล็ก CSS: font-variant: small-caps Unicode ยังมีตัวอักษรพิมพ์ใหญ่ขนาดเล็กจริงใน Latin Extended (ᴀ–ᴢ)

← กลับไปยังอภิธานศัพท์