Writing Systems of the World · الفصل 5

Devanagari and the Indic Scripts

The Brahmic script family encompasses dozens of South Asian writing systems. This chapter explores Devanagari's complex conjuncts, the virama mechanism, and the challenges of rendering Indic scripts.

~4000 كلمة · ~16 دقيقة قراءة · · Updated

When the Unicode Consortium first tackled the writing systems of South Asia, they encountered a family of scripts so intricate, so internally consistent in their deep architecture, and yet so varied in their surface forms that encoding them required new conceptual frameworks that had no precedent in Latin-based computing. The Brahmic scripts — descendants of the ancient Brahmi script used in the Indian subcontinent from at least the 3rd century BCE — include Devanagari, Bengali, Tamil, Telugu, Kannada, Malayalam, Odia, Gurmukhi, Gujarati, Sinhala, Myanmar, Thai, Khmer, Tibetan, and dozens more. Together they serve as the writing systems for over a billion people and thousands of languages. Devanagari, the most widely used among them, is the starting point for understanding this entire extraordinary family.

The Brahmic Architecture

Every Brahmic script inherits a set of structural features from their common ancestor, the Brahmi script. Understanding these features is essential to understanding why Indic text rendering is among the most complex in all of Unicode.

Consonants carry an inherent vowel. Every consonant in Devanagari includes an inherent "a" vowel. The character क is not just "k" — it is "ka." To write just "k" without a following vowel, you must add the virama (halant): क् (U+0915 + U+094D).

Vowel modification. When a consonant is followed by a vowel other than the inherent "a," a matra (vowel sign) is attached to the consonant. The matras are combining characters that appear above, below, before, or after the base consonant: - कि (ki) = क + ि (U+093F, matra before the consonant in visual display) - की (kī) = क + ी (U+0940, matra after the consonant) - कु (ku) = क + ु (U+0941, matra below the consonant) - के (ke) = क + े (U+0947, matra above the consonant)

Reordering. The matra for /i/ (U+093F ि) visually appears before the consonant it follows in logical (stored) order. This means that rendering must reorder characters for correct visual display — a source of endless complexity for early font systems.

Conjuncts. When two or more consonants occur consecutively without an intervening vowel, they combine into a conjunct (संयुक्त व्यंजन). The shape of conjuncts varies widely: - Some are "stacked" vertically: क् + ष = क्ष (ksha) - Some are "half-forms": the first consonant loses its vertical stroke: न् + त = न्त (nta), न् + न = न्न (nna) - Some are ligatures with unique shapes: क् + त = क्त (kta)

The virama (U+094D) is the key to conjunct formation. Logically, a consonant cluster is stored as: consonant + virama + consonant. The rendering engine then decides whether to display this as a visible virama (in simplified rendering) or as a conjunct form (in full OpenType rendering).

Devanagari in Unicode

The Devanagari block occupies U+0900–U+097F (128 code points). Key characters:

Category Range Examples
Various signs U+0900–U+0902 ँ ं ः (chandrabindu, anusvara, visarga)
Independent vowels U+0904–U+0914 अ आ इ ई उ ऊ ए ऐ ओ औ
Consonants U+0915–U+0939 क ख ग घ ङ च छ ज झ ञ ट ठ ड ढ ण त थ द ध न प फ ब भ म य र ल व श ष स ह
Vowel signs (matras) U+093A–U+094C ि ी ु ू ृ े ै ो ौ
Virama U+094D
Additional consonants U+0958–U+095F क़ ख़ ग़ ज़ ड़ ढ़ फ़ य़
Devanagari Extended U+A8E0–U+A8FF Vedic extensions

The Devanagari Extended block and the Vedic Extensions block (U+1CD0–U+1CFF) encode additional characters needed for Vedic Sanskrit — the ancient liturgical language with a complex system of tone marks.

The Brahmic Family Tree

Brahmi, the ancient ancestor, gave rise to two major branches:

Northern Branch → Gupta script → Siddham, Sharada, Proto-Bengali, etc. → Modern scripts: - Devanagari (Hindi, Sanskrit, Marathi, Nepali, Maithili, Bodo, Dogri, Konkani, Sindhi) - Bengali-Assamese (Bengali, Assamese, Bishnupriya) - Gurmukhi (Punjabi) - Gujarati (Gujarati, Kutchi) - Odia (Odia) - Modi, Tirhuta, Sylheti Nagri, and other regional scripts

Southern Branch → Grantha, Pallava scripts → Modern scripts: - Tamil (Tamil) - Malayalam (Malayalam) - Kannada (Kannada, Tulu, Kodava) - Telugu (Telugu, Gondi) - Sinhala (Sinhala)

Each of these has its own Unicode block, and while they share the fundamental Brahmic architecture (inherent vowel, virama, matras, conjuncts), the specific characters, conjunct shapes, and rendering rules differ substantially.

Tamil: The Exception

Tamil script is unusual within the Brahmic family. Tamil has fewer consonants than Sanskrit-derived scripts (reflecting the simpler consonant inventory of classical Tamil), and its rendering model, while still Brahmic, is simpler than Devanagari — fewer conjuncts, less stacking. Tamil Brahmi inscriptions from the 3rd century BCE are among the earliest evidence of the Brahmic script family.

OpenType and Complex Text Layout

Correct rendering of Devanagari requires OpenType features — specifically the GSUB (Glyph Substitution) and GPOS (Glyph Positioning) tables in a font. Without OpenType support, Devanagari text will render as a sequence of unjoined components rather than proper conjuncts and ligature forms.

Critical OpenType features for Devanagari: - akhn (Akhands): Pre-base consonant substitutions that occur before other processing - rphf (Reph Form): The above-base form of RA (reph ◌र्) - blwf (Below-base Forms): Half-forms that appear below the base consonant - half (Half Forms): Half-forms of consonants - pres (Pre-base Substitutions): Pre-base matra combinations - abvs (Above-base Substitutions): Combining marks above the base - blws (Below-base Substitutions): Combining marks below the base - psts (Post-base Substitutions): Matra and other post-base glyph combinations

The Windows Uniscribe engine, HarfBuzz on Linux/Android, and Core Text on Apple platforms all implement these shaping rules. However, correct implementation varies in quality — particularly for less commonly used Indic scripts and archaic forms.

Unicode Indic Blocks

Script Block Range Languages
Devanagari U+0900–U+097F Hindi, Sanskrit, Nepali, Marathi
Bengali U+0980–U+09FF Bengali, Assamese
Gurmukhi U+0A00–U+0A7F Punjabi
Gujarati U+0A80–U+0AFF Gujarati
Odia U+0B00–U+0B7F Odia
Tamil U+0B80–U+0BFF Tamil
Telugu U+0C00–U+0C7F Telugu
Kannada U+0C80–U+0CFF Kannada
Malayalam U+0D00–U+0D7F Malayalam
Sinhala U+0D80–U+0DFF Sinhala

Unicode 15.0 also includes blocks for Chakma, Miao, Khmer, Myanmar, and many other Brahmic-derived scripts used for minority languages across South and Southeast Asia.

The Digital Divide in Indic Computing

Despite India's enormous tech industry, high-quality Indic computing has historically lagged behind Latin-script computing. Several barriers have contributed:

Font quality: High-quality OpenType fonts for all Indic scripts with complete conjunct rendering were long unavailable. Google's Noto project was partly motivated by closing this gap — the "no tofu" (no missing character boxes) goal is particularly relevant for the complexity of Indic scripts.

Keyboard standardization: No single keyboard layout dominates for any Indic script. Users may use transliteration (typing romanized text that gets converted), InScript (a standardized layout placed on hardware keyboards), Phonetic, or other layouts.

Legacy encodings: Before Unicode, numerous incompatible encodings existed for each Indic script. ISCII (Indian Script Code for Information Interchange) attempted standardization but was never widely adopted outside India. Legacy content in old encodings still exists and requires migration.

The ongoing work of Unicode's Script Ad hoc Committee and dedicated volunteers from India, Nepal, Sri Lanka, and the broader South Asian diaspora continues to expand and refine Indic encoding — ensuring that the extraordinary richness of South Asia's literary traditions finds its full representation in the universal character set.