📜 Script Stories

Devanagari Script Deep Dive

Devanagari is an abugida script used to write Hindi, Sanskrit, Marathi, and many other South Asian languages, with complex conjunct consonants and vowel diacritics that challenge text rendering engines. This guide explores the Devanagari Unicode block, how the script works, and how to render it correctly in software.

Published 2023-05-18 · Updated 2024-11-25

Devanagari is one of the world's most important writing systems, used by over 600 million people across South Asia. It is the primary script for Hindi, the most widely spoken language in India, as well as Sanskrit, Marathi, Nepali, and dozens of other languages. Unlike the Latin alphabet, Devanagari is an abugida — a writing system where each consonant carries an inherent vowel that can be modified or suppressed by diacritical marks. This design creates a rich system of vowel signs (matras), conjunct consonants, and stacking behaviors that make Devanagari one of the most technically demanding scripts for Unicode and text rendering engines.

History and Significance

Devanagari evolved from the Brahmi script, the ancestor of virtually all South Asian and Southeast Asian writing systems. The name "Devanagari" comes from Sanskrit: deva (divine) and nagari (city), suggesting "script of the divine city." The modern form of Devanagari stabilized around the 10th–12th centuries CE.

The script gained special significance as the writing system of Sanskrit, the classical language of Hindu philosophy, science, and literature. Today it is used for:

Language	Speakers	Country
Hindi	600M+	India
Marathi	83M+	India (Maharashtra)
Nepali	25M+	Nepal
Sanskrit	Liturgical	India (classical)
Bhojpuri	50M+	India, Nepal
Maithili	34M+	India, Nepal
Konkani	7M+	India (Goa)
Dogri	3M+	India (Jammu)
Bodo	1.5M+	India (Assam)

The Devanagari Writing System

Vowels (Svar)

Devanagari has 13 vowels, each with an independent form (used at the start of a word or syllable) and a dependent form (matra) used after a consonant:

Independent	Matra	Name	Sound	Unicode
अ	(inherent)	A	/a/	U+0905
आ	ा	AA	/aː/	U+0906 / U+093E
इ	ि	I	/i/	U+0907 / U+093F
ई	ी	II	/iː/	U+0908 / U+0940
उ	ु	U	/u/	U+0909 / U+0941
ऊ	ू	UU	/uː/	U+090A / U+0942
ऋ	ृ	VOCALIC R	/ri/	U+090B / U+0943
ए	े	E	/eː/	U+090F / U+0947
ऐ	ै	AI	/ai/	U+0910 / U+0948
ओ	ो	O	/oː/	U+0913 / U+094B
औ	ौ	AU	/au/	U+0914 / U+094C

The vowel sign for short I (ि, U+093F) is notable because it is displayed to the left of the consonant it modifies, even though it is stored after the consonant in the character stream. This reordering is handled by the text shaping engine, not the Unicode encoding.

Consonants (Vyanjan)

Devanagari has 33 base consonants, plus additional characters for sounds borrowed from Arabic, Persian, and English. Each consonant inherently carries the vowel /a/:

क (ka)  ख (kha)  ग (ga)  घ (gha)  ङ (nga)
च (cha) छ (chha) ज (ja)  झ (jha)  ञ (nya)
ट (ta)  ठ (tha)  ड (da)  ढ (dha)  ण (na)
त (ta)  थ (tha)  द (da)  ध (dha)  न (na)
प (pa)  फ (pha)  ब (ba)  भ (bha)  म (ma)
य (ya)  र (ra)   ल (la)  व (va)
श (sha) ष (sha)  स (sa)  ह (ha)

Consonants with a nukta (dot below, U+093C) represent sounds borrowed from other languages:

Base	+ Nukta	Sound	Unicode
क	क़	/q/	U+0958
ख	ख़	/x/	U+0959
ग	ग़	/ɣ/	U+095A
ज	ज़	/z/	U+095B
ड	ड़	/ɽ/	U+095C
ढ	ढ़	/ɽʰ/	U+095D
फ	फ़	/f/	U+095E

Unicode Blocks for Devanagari

Block	Range	Characters	Purpose
Devanagari	U+0900–U+097F	128	Core letters, matras, digits
Devanagari Extended	U+A8E0–U+A8FF	32	Vedic extensions
Devanagari Extended-A	U+11B00–U+11B5F	96	Additional signs
Vedic Extensions	U+1CD0–U+1CFF	48	Vedic tone marks

The core block (U+0900–U+097F) covers all characters needed for modern Hindi, Marathi, Nepali, and Sanskrit text.

The Virama and Conjunct Consonants

The most complex aspect of Devanagari encoding is how consonant clusters are represented. When two or more consonants occur together without a vowel between them, they form a conjunct (samyuktakshar). In traditional typography, these are rendered as ligatures — merged or stacked forms.

The Halant (Virama)

The halant (्, U+094D) — also called the virama — is the key mechanism. It "kills" the inherent vowel of the preceding consonant, signaling that the consonant should combine with the next character:

क + ् + त = क्त  (kta — conjunct of ka + ta)

In Unicode encoding:

U+0915 DEVANAGARI LETTER KA
U+094D DEVANAGARI SIGN VIRAMA
U+0924 DEVANAGARI LETTER TA

The rendering engine sees this sequence and produces the conjunct form क्त instead of displaying the halant visually.

Common Conjuncts

Conjunct	Letters	Encoding	Pronunciation
क्ष	क + ् + ष	U+0915 U+094D U+0937	ksha
त्र	त + ् + र	U+0924 U+094D U+0930	tra
ज्ञ	ज + ् + ञ	U+091C U+094D U+091E	gya/jnya
श्र	श + ् + र	U+0936 U+094D U+0930	shra
द्ध	द + ् + ध	U+0926 U+094D U+0927	ddha
क्त	क + ् + त	U+0915 U+094D U+0924	kta

Some conjuncts involve three or even four consonants:

स + ् + त + ् + र = स्त्र (stra)
U+0938 U+094D U+0924 U+094D U+0930

The Special Role of Ra (र)

The consonant Ra has three special combining behaviors:

Reph (Ra + Halant before a consonant): Ra appears as a hook above the following consonant cluster: र + ् + म = र्म (rma — Ra appears as reph above Ma) U+0930 U+094D U+092E
Ra-matra (consonant + Halant + Ra): Ra appears as a diagonal stroke below the preceding consonant: प + ् + र = प्र (pra — Ra appears below Pa) U+092A U+094D U+0930
Eyelash Ra: A special form used in Marathi (ऱ, U+0931).

The Headline (Shirorekha)

A distinctive visual feature of Devanagari is the horizontal line that runs along the top of each word, connecting all the letters. This is called the shirorekha (शिरोरेखा). In Unicode, the shirorekha is not encoded as a separate character — it is a font-level feature. The font draws horizontal strokes at the top of each glyph, and connected letters share a continuous line.

When a halant is visible (not forming a conjunct), it breaks the shirorekha at that point, visually indicating the dead consonant.

Devanagari Digits

Devanagari has its own set of digits, though Western (Arabic) digits are also commonly used in Hindi:

Devanagari	Western	Unicode
०	0	U+0966
१	1	U+0967
२	2	U+0968
३	3	U+0969
४	4	U+096A
५	5	U+096B
६	6	U+096C
७	7	U+096D
८	8	U+096E
९	9	U+096F

Working with Devanagari in Code

Python

import unicodedata

# Devanagari text: "namaste" in Hindi
text = "\u0928\u092E\u0938\u094D\u0924\u0947"  # नमस्ते
print(len(text))  # 6 code points

# Inspect each code point
for char in text:
    print(f"U+{ord(char):04X} {unicodedata.name(char)} "
          f"cat={unicodedata.category(char)}")
# U+0928 DEVANAGARI LETTER NA        cat=Lo
# U+092E DEVANAGARI LETTER MA        cat=Lo
# U+0938 DEVANAGARI LETTER SA        cat=Lo
# U+094D DEVANAGARI SIGN VIRAMA      cat=Mn
# U+0924 DEVANAGARI LETTER TA        cat=Lo
# U+0947 DEVANAGARI VOWEL SIGN E     cat=Mn

# Grapheme clusters (visual units) differ from code points
# "namaste" has 4 grapheme clusters: न, म, स्ते (conjunct), but
# is stored as 6 code points

JavaScript

// Grapheme segmentation for Devanagari
const text = "\u0928\u092E\u0938\u094D\u0924\u0947"; // नमस्ते

// Code point count
console.log([...text].length); // 6

// Grapheme cluster count
const segmenter = new Intl.Segmenter("hi", { granularity: "grapheme" });
const graphemes = [...segmenter.segment(text)];
console.log(graphemes.length); // 4 — correct visual unit count

// Regex: match Devanagari script
const devaPattern = /\p{Script=Devanagari}/u;
console.log(devaPattern.test(text)); // true

Normalization

Devanagari has some characters that can be represented in multiple ways. For example, the nukta consonants have both a precomposed form and a decomposed form:

U+0958 (क़) = U+0915 (क) + U+093C (़)   — NFC vs NFD

Always normalize Devanagari text to NFC before comparison, storage, or searching:

import unicodedata

# These look identical but may differ in encoding
a = "\u0958"           # Precomposed QA
b = "\u0915\u093C"     # KA + NUKTA

print(a == b)  # False — different code point sequences!
print(unicodedata.normalize("NFC", a) == unicodedata.normalize("NFC", b))  # True

Summary

Devanagari is a sophisticated abugida whose Unicode encoding reflects the script's inherent complexity. The key points for developers are:

The virama (U+094D) is the glue — it joins consonants into conjuncts and is the most important character to handle correctly
Grapheme clusters ≠ code points — a single visible akshar (syllable unit) may consist of multiple code points
Matra reordering is a rendering concern — the short-I matra (ि) is stored after the consonant but displayed before it
Normalize to NFC before comparing or searching Devanagari text
Test with real Hindi text that contains conjuncts, matras, and nukta characters — ASCII-range testing is never sufficient
Use proper shaping engines — HarfBuzz, CoreText, and DirectWrite all handle Devanagari correctly; custom rendering code almost certainly does not