What is Ký tự kết hợp?

Ký tự gắn vào ký tự cơ sở trước đó để thay đổi nó. Danh mục chung: Mn (không chiếm chỗ), Mc (kết hợp chiếm chỗ), Me (bao quanh). Ví dụ: ◌́ (U+0301 Combining Acute).

What is Tương đương chuẩn tắc?

Hai chuỗi ký tự có ngữ nghĩa giống hệt nhau và nên được xử lý như nhau. Ví dụ: é (U+00E9) ≡ e + ◌́ (U+0065 + U+0301).

What is Chuẩn hóa?

Quá trình chuyển đổi văn bản Unicode sang dạng chuẩn chuẩn. Bốn dạng: NFC (đã hợp thành), NFD (đã phân tích), NFKC (tương thích đã hợp thành), NFKD (tương thích đã phân tích).

Typography

Dấu phụ

Dấu được thêm vào chữ cái để thay đổi cách phát âm hoặc nghĩa. Có thể được tổng hợp sẵn (é U+00E9) hoặc kết hợp (e + ◌́ U+0065+U+0301). Bao gồm dấu huyền, dấu umlaut, dấu cedilla và dấu ngã.

2023-02-20 · Updated 2024-08-07

What is a Diacritical Mark?

A diacritical mark (also called a diacritic) is a small sign or symbol added to a letter to modify its pronunciation, indicate stress, distinguish between words that would otherwise be spelled identically, or mark grammatical features. Diacritical marks are foundational to most writing systems that use the Latin, Greek, Cyrillic, Arabic, Hebrew, and many other scripts.

Common examples in Latin-script languages include the acute accent (é), grave accent (è), circumflex (ê), umlaut (ü), tilde (ñ), cedilla (ç), and the ring above (å). These are not decorations — they represent distinct sounds and often change the meaning of a word entirely.

Precomposed vs. Combining Forms

Unicode encodes diacritical characters in two ways:

Precomposed characters are single code points that combine a base letter and its diacritic. For example, é is U+00E9 (a single code point). These exist for compatibility with legacy encodings and convenience.

Combining characters are separate diacritical marks (U+0300–U+036F) that attach to the preceding base character. The same é can be represented as U+0065 (e) followed by U+0301 (combining acute accent).

Both representations are canonically equivalent — Unicode Normalization Form C (NFC) prefers precomposed forms, while NFD decomposes them into base + combining sequences.

Diacritic	Precomposed	Base + Combining
é (e acute)	U+00E9	U+0065 + U+0301
ü (u umlaut)	U+00FC	U+0075 + U+0308
ñ (n tilde)	U+00F1	U+006E + U+0303
ç (c cedilla)	U+00E7	U+0063 + U+0327

Common Diacritical Marks

Mark	Name	Example	Used In
´	Acute accent	é, á, ó	French, Spanish, Portuguese, many others
`	Grave accent	è, à, ù	French, Italian
^	Circumflex	ê, â, ô	French, Romanian
¨	Diaeresis/Umlaut	ü, ö, ä	German, French, Swedish
~	Tilde	ñ, ã, õ	Spanish, Portuguese
¸	Cedilla	ç, ş	French, Turkish, Romanian
°	Ring above	å, ů	Swedish, Norwegian, Czech
ˇ	Caron (háček)	č, š, ž	Czech, Slovak, Slovenian

Typing Diacritical Marks

macOS: Hold a key to see a popover (e.g., hold e to choose é, è, ê). Or use Option key combos: Option+E then E = é.

Windows: Use Alt codes, the Character Map app, or configure a locale keyboard layout.

HTML entities:

&eacute;   <!-- é -->
&Uuml;     <!-- Ü -->
&ntilde;   <!-- ñ -->
&ccedil;   <!-- ç -->

Unicode escape:

"\u00e9"  # é in Python
"\u00fc"  # ü

Quick Facts

Property	Value
Unicode block (combining)	Combining Diacritical Marks: U+0300–U+036F (112 characters)
Unicode block (extended)	Combining Diacritical Marks Extended: U+1AB0–U+1AFF
Precomposed Latin range	Latin-1 Supplement U+00C0–U+00FF
Normalization preference	NFC (precomposed) for storage; NFD for processing
Languages with most diacritics	Vietnamese (5 tone marks + vowel marks), Czech, Polish
Zero-width diacritics	Combining characters attach without taking width
Stacking	Multiple combining marks can stack on one base character