गुणधर्म

विघटन

किसी वर्ण का उसके घटक भागों में मैपिंग। Canonical decomposition अर्थ को संरक्षित रखता है (é → e + ́); compatibility decomposition इसे बदल सकता है (fi → fi)।

· Updated

What Is a Decomposition Mapping?

A decomposition mapping tells you how a Unicode character can be broken down into a sequence of simpler characters. There are two kinds:

  • Canonical decomposition: the character is identical in meaning and rendering to its decomposed sequence. For example, U+00E9 LATIN SMALL LETTER E WITH ACUTE (é) canonically decomposes to U+0065 LATIN SMALL LETTER E + U+0301 COMBINING ACUTE ACCENT.
  • Compatibility decomposition: the character is only compatible (semantically similar, possibly different appearance) with its decomposed sequence. For example, the ligature U+FB01 fi (fi) compatibility-decomposes to U+0066 f + U+0069 i, and U+00B2 ² (superscript two) decomposes to U+0032 2.

Normalization Forms

The four Unicode Normalization Forms are defined in terms of decomposition and canonical composition:

Form Decomposition Composition
NFD Canonical No
NFC Canonical Yes (canonical)
NFKD Compatibility No
NFKC Compatibility Yes (canonical)
import unicodedata

samples = [
    ("\u00E9", "é  e+acute"),        # canonical
    ("\u00C5", "Å  A+ring"),         # canonical
    ("\uFB01", "fi  fi ligature"),    # compatibility
    ("\u00B2", "²  superscript 2"),  # compatibility
    ("\u2126", "Ω  OHM SIGN"),       # canonical → U+03A9 GREEK CAPITAL OMEGA
]

for char, label in samples:
    raw = unicodedata.decomposition(char)
    nfd = unicodedata.normalize("NFD", char)
    nfkd = unicodedata.normalize("NFKD", char)
    nfc = unicodedata.normalize("NFC", nfd)
    print(f"  {label}")
    print(f"    decomposition() raw : {raw!r}")
    print(f"    NFD  : {[f'U+{ord(c):04X}' for c in nfd]}")
    print(f"    NFKD : {[f'U+{ord(c):04X}' for c in nfkd]}")
    print(f"    NFC  : {[f'U+{ord(c):04X}' for c in nfc]}")

The unicodedata.decomposition() function returns a raw string from UnicodeData.txt. A leading tag in angle brackets like <compat>, <font>, <circle>, <wide>, etc. indicates a compatibility decomposition; no tag means canonical.

Practical Implications

Search and indexing: NFKC normalization lets you match file against file or against 2. Many search engines apply NFKC before indexing. Security: Compatibility decomposition can reveal confusable characters—U+2126 Ω and U+03A9 Ω look identical and are canonically equivalent, so an application that compares usernames should normalize first. Identifiers: Python 3 uses NFKC for identifier normalization (PEP 3131).

Quick Facts

Property Value
Unicode property name Decomposition_Mapping
Short alias dm
Types Canonical, Compatibility (13 tags: <compat>, <font>, <circle>, etc.)
Python function unicodedata.decomposition(char) → raw string
Normalization function unicodedata.normalize(form, string)
Forms NFD, NFC, NFKD, NFKC
Spec reference Unicode Standard Annex #15 (UAX #15)

संबंधित शब्द

गुणधर्म में और

East Asian Width

Unicode property (UAX#11) classifying characters as Narrow, Wide, Fullwidth, Halfwidth, Ambiguous, or …

Joining Type

Unicode property controlling how Arabic and Syriac characters connect to adjacent characters. …

Script Extensions

Unicode property listing all scripts that use a character, broader than the …

केस मैपिंग

वर्णों को uppercase, lowercase, और titlecase के बीच परिवर्तित करने के नियम। …

ग्राफीम क्लस्टर

उपयोगकर्ता द्वारा अनुभव किया गया 'वर्ण' — जो एक एकल इकाई के …

डिफ़ॉल्ट रूप से अनदेखा

वर्ण जिनका कोई दृश्य प्रभाव नहीं होना चाहिए और उन प्रक्रियाओं द्वारा …

दर्पण गुणधर्म

वे वर्ण जिनकी glyph को RTL संदर्भ में क्षैतिज रूप से प्रतिबिंबित …

द्विदिशीय श्रेणी

प्रॉपर्टी जो द्विदिशात्मक पाठ (LTR, RTL, weak, neutral) में वर्ण के व्यवहार …

नाम उपनाम

वर्णों के लिए वैकल्पिक नाम, क्योंकि Unicode नाम स्थिरता नीति के अनुसार …

ब्लॉक

कोड पॉइंट्स की एक नामित सन्निकट श्रृंखला (जैसे, Basic Latin = U+0000–U+007F)। …