📜 Script Stories

Devanagari Script Deep Dive

Devanagari is an abugida script used to write Hindi, Sanskrit, Marathi, and many other South Asian languages, with complex conjunct consonants and vowel diacritics that challenge text rendering engines. This guide explores the Devanagari Unicode block, how the script works, and how to render it correctly in software.

·

Devanagari is one of the world's most important writing systems, used by over 600 million people across South Asia. It is the primary script for Hindi, the most widely spoken language in India, as well as Sanskrit, Marathi, Nepali, and dozens of other languages. Unlike the Latin alphabet, Devanagari is an abugida — a writing system where each consonant carries an inherent vowel that can be modified or suppressed by diacritical marks. This design creates a rich system of vowel signs (matras), conjunct consonants, and stacking behaviors that make Devanagari one of the most technically demanding scripts for Unicode and text rendering engines.

History and Significance

Devanagari evolved from the Brahmi script, the ancestor of virtually all South Asian and Southeast Asian writing systems. The name "Devanagari" comes from Sanskrit: deva (divine) and nagari (city), suggesting "script of the divine city." The modern form of Devanagari stabilized around the 10th–12th centuries CE.

The script gained special significance as the writing system of Sanskrit, the classical language of Hindu philosophy, science, and literature. Today it is used for:

Language Speakers Country
Hindi 600M+ India
Marathi 83M+ India (Maharashtra)
Nepali 25M+ Nepal
Sanskrit Liturgical India (classical)
Bhojpuri 50M+ India, Nepal
Maithili 34M+ India, Nepal
Konkani 7M+ India (Goa)
Dogri 3M+ India (Jammu)
Bodo 1.5M+ India (Assam)

The Devanagari Writing System

Vowels (Svar)

Devanagari has 13 vowels, each with an independent form (used at the start of a word or syllable) and a dependent form (matra) used after a consonant:

Independent Matra Name Sound Unicode
(inherent) A /a/ U+0905
AA /aː/ U+0906 / U+093E
ि I /i/ U+0907 / U+093F
II /iː/ U+0908 / U+0940
U /u/ U+0909 / U+0941
UU /uː/ U+090A / U+0942
VOCALIC R /ri/ U+090B / U+0943
E /eː/ U+090F / U+0947
AI /ai/ U+0910 / U+0948
O /oː/ U+0913 / U+094B
AU /au/ U+0914 / U+094C

The vowel sign for short I (ि, U+093F) is notable because it is displayed to the left of the consonant it modifies, even though it is stored after the consonant in the character stream. This reordering is handled by the text shaping engine, not the Unicode encoding.

Consonants (Vyanjan)

Devanagari has 33 base consonants, plus additional characters for sounds borrowed from Arabic, Persian, and English. Each consonant inherently carries the vowel /a/:

क (ka)  ख (kha)  ग (ga)  घ (gha)  ङ (nga)
च (cha) छ (chha) ज (ja)  झ (jha)  ञ (nya)
ट (ta)  ठ (tha)  ड (da)  ढ (dha)  ण (na)
त (ta)  थ (tha)  द (da)  ध (dha)  न (na)
प (pa)  फ (pha)  ब (ba)  भ (bha)  म (ma)
य (ya)  र (ra)   ल (la)  व (va)
श (sha) ष (sha)  स (sa)  ह (ha)

Consonants with a nukta (dot below, U+093C) represent sounds borrowed from other languages:

Base + Nukta Sound Unicode
क़ /q/ U+0958
ख़ /x/ U+0959
ग़ /ɣ/ U+095A
ज़ /z/ U+095B
ड़ /ɽ/ U+095C
ढ़ /ɽʰ/ U+095D
फ़ /f/ U+095E

Unicode Blocks for Devanagari

Block Range Characters Purpose
Devanagari U+0900–U+097F 128 Core letters, matras, digits
Devanagari Extended U+A8E0–U+A8FF 32 Vedic extensions
Devanagari Extended-A U+11B00–U+11B5F 96 Additional signs
Vedic Extensions U+1CD0–U+1CFF 48 Vedic tone marks

The core block (U+0900–U+097F) covers all characters needed for modern Hindi, Marathi, Nepali, and Sanskrit text.

The Virama and Conjunct Consonants

The most complex aspect of Devanagari encoding is how consonant clusters are represented. When two or more consonants occur together without a vowel between them, they form a conjunct (samyuktakshar). In traditional typography, these are rendered as ligatures — merged or stacked forms.

The Halant (Virama)

The halant (्, U+094D) — also called the virama — is the key mechanism. It "kills" the inherent vowel of the preceding consonant, signaling that the consonant should combine with the next character:

क + ् + त = क्त  (kta — conjunct of ka + ta)

In Unicode encoding:

U+0915 DEVANAGARI LETTER KA
U+094D DEVANAGARI SIGN VIRAMA
U+0924 DEVANAGARI LETTER TA

The rendering engine sees this sequence and produces the conjunct form क्त instead of displaying the halant visually.

Common Conjuncts

Conjunct Letters Encoding Pronunciation
क्ष क + ् + ष U+0915 U+094D U+0937 ksha
त्र त + ् + र U+0924 U+094D U+0930 tra
ज्ञ ज + ् + ञ U+091C U+094D U+091E gya/jnya
श्र श + ् + र U+0936 U+094D U+0930 shra
द्ध द + ् + ध U+0926 U+094D U+0927 ddha
क्त क + ् + त U+0915 U+094D U+0924 kta

Some conjuncts involve three or even four consonants:

स + ् + त + ् + र = स्त्र (stra)
U+0938 U+094D U+0924 U+094D U+0930

The Special Role of Ra (र)

The consonant Ra has three special combining behaviors:

  1. Reph (Ra + Halant before a consonant): Ra appears as a hook above the following consonant cluster: र + ् + म = र्म (rma — Ra appears as reph above Ma) U+0930 U+094D U+092E

  2. Ra-matra (consonant + Halant + Ra): Ra appears as a diagonal stroke below the preceding consonant: प + ् + र = प्र (pra — Ra appears below Pa) U+092A U+094D U+0930

  3. Eyelash Ra: A special form used in Marathi (ऱ, U+0931).

The Headline (Shirorekha)

A distinctive visual feature of Devanagari is the horizontal line that runs along the top of each word, connecting all the letters. This is called the shirorekha (शिरोरेखा). In Unicode, the shirorekha is not encoded as a separate character — it is a font-level feature. The font draws horizontal strokes at the top of each glyph, and connected letters share a continuous line.

When a halant is visible (not forming a conjunct), it breaks the shirorekha at that point, visually indicating the dead consonant.

Devanagari Digits

Devanagari has its own set of digits, though Western (Arabic) digits are also commonly used in Hindi:

Devanagari Western Unicode
0 U+0966
1 U+0967
2 U+0968
3 U+0969
4 U+096A
5 U+096B
6 U+096C
7 U+096D
8 U+096E
9 U+096F

Working with Devanagari in Code

Python

import unicodedata

# Devanagari text: "namaste" in Hindi
text = "\u0928\u092E\u0938\u094D\u0924\u0947"  # नमस्ते
print(len(text))  # 6 code points

# Inspect each code point
for char in text:
    print(f"U+{ord(char):04X} {unicodedata.name(char)} "
          f"cat={unicodedata.category(char)}")
# U+0928 DEVANAGARI LETTER NA        cat=Lo
# U+092E DEVANAGARI LETTER MA        cat=Lo
# U+0938 DEVANAGARI LETTER SA        cat=Lo
# U+094D DEVANAGARI SIGN VIRAMA      cat=Mn
# U+0924 DEVANAGARI LETTER TA        cat=Lo
# U+0947 DEVANAGARI VOWEL SIGN E     cat=Mn

# Grapheme clusters (visual units) differ from code points
# "namaste" has 4 grapheme clusters: न, म, स्ते (conjunct), but
# is stored as 6 code points

JavaScript

// Grapheme segmentation for Devanagari
const text = "\u0928\u092E\u0938\u094D\u0924\u0947"; // नमस्ते

// Code point count
console.log([...text].length); // 6

// Grapheme cluster count
const segmenter = new Intl.Segmenter("hi", { granularity: "grapheme" });
const graphemes = [...segmenter.segment(text)];
console.log(graphemes.length); // 4 — correct visual unit count

// Regex: match Devanagari script
const devaPattern = /\p{Script=Devanagari}/u;
console.log(devaPattern.test(text)); // true

Normalization

Devanagari has some characters that can be represented in multiple ways. For example, the nukta consonants have both a precomposed form and a decomposed form:

U+0958 (क़) = U+0915 (क) + U+093C (़)   — NFC vs NFD

Always normalize Devanagari text to NFC before comparison, storage, or searching:

import unicodedata

# These look identical but may differ in encoding
a = "\u0958"           # Precomposed QA
b = "\u0915\u093C"     # KA + NUKTA

print(a == b)  # False — different code point sequences!
print(unicodedata.normalize("NFC", a) == unicodedata.normalize("NFC", b))  # True

Summary

Devanagari is a sophisticated abugida whose Unicode encoding reflects the script's inherent complexity. The key points for developers are:

  1. The virama (U+094D) is the glue — it joins consonants into conjuncts and is the most important character to handle correctly
  2. Grapheme clusters ≠ code points — a single visible akshar (syllable unit) may consist of multiple code points
  3. Matra reordering is a rendering concern — the short-I matra (ि) is stored after the consonant but displayed before it
  4. Normalize to NFC before comparing or searching Devanagari text
  5. Test with real Hindi text that contains conjuncts, matras, and nukta characters — ASCII-range testing is never sufficient
  6. Use proper shaping engines — HarfBuzz, CoreText, and DirectWrite all handle Devanagari correctly; custom rendering code almost certainly does not

المزيد في Script Stories