📜 Script Stories

Bengali Script

Bengali is an abugida script with over 300 million speakers, used for Bengali and Assamese, featuring complex conjunct consonant forms and vowel diacritics that require OpenType rendering. This guide explores the Bengali Unicode block, the script's history and structure, and software considerations for Bengali text.

Published 2023-09-11 · Updated 2024-09-30

Bengali script (also called Bangla script) is one of the most widely used writing systems in the world, serving over 300 million people as the primary script for Bengali (the official language of Bangladesh and the Indian state of West Bengal) and Assamese. Descended from the ancient Brahmi script through Siddham and proto-Bengali forms, the modern Bengali script took shape around the 11th century CE. In Unicode, Bengali presents significant rendering challenges due to its complex conjunct consonants, vowel sign placement, and contextual shaping requirements. This guide explores the script's structure, its Unicode encoding, and the technical considerations for working with Bengali text.

History and Background

Bengali script belongs to the Eastern Nagari family of scripts, sharing ancestry with Assamese, Maithili, and Tirhuta scripts. Its evolution can be traced through several stages:

Period	Form	Notable Feature
3rd century BCE	Brahmi	Ancestor of nearly all South/Southeast Asian scripts
5th–6th century	Siddham/Gupta	Rounded letterforms emerge
11th century	Proto-Bengali	Distinctive Bengali characteristics appear
15th century	Bengali-Assamese	Modern form solidifies
19th century	Standardized Bengali	Print standardization under British Raj

The script achieved its modern printed form through the work of Ishwar Chandra Vidyasagar and the Serampore Mission Press in the 19th century. The characteristic matra (the horizontal headline connecting letters, similar to Devanagari's shirorekha) became a standard typographic feature in printed Bengali.

Script Structure

Bengali is an abugida (alphasyllabary) where each consonant letter carries an inherent vowel /a/ (or /o/ in Bengali pronunciation). Other vowels are indicated by diacritical marks (vowel signs) added to the consonant.

Vowels

Bengali has 11 vowel letters (independent forms used at the start of words or syllables) and corresponding vowel signs (dependent forms attached to consonants):

Vowel	Independent	Vowel Sign	Position	Unicode (Independent)
a	অ	(inherent)	—	U+0985
aa	আ	া	Right	U+0986
i	ই	ি	Left	U+0987
ii	ঈ	ী	Right	U+0988
u	উ	ু	Below	U+0989
uu	ঊ	ূ	Below	U+098A
ri	ঋ	ৃ	Below	U+098B
e	এ	ে	Left	U+098F
ai	ঐ	ৈ	Left	U+0990
o	ও	ো	Left + Right	U+0993
au	ঔ	ৌ	Left + Right	U+0994

Note the vowel signs for ো (o) and ৌ (au) — these are composite, appearing both to the left and right of the consonant simultaneously. In Unicode, these are encoded as two-part vowel signs: ে (U+09CB left part) + া (right part) for ো, forming a single visual unit around the consonant.

Consonants

Bengali has 35 consonant letters in the basic set:

Range	Letters	Examples
Velars	ক খ গ ঘ ঙ	ka, kha, ga, gha, nga
Palatals	চ ছ জ ঝ ঞ	cha, chha, ja, jha, nya
Retroflexes	ট ঠ ড ঢ ণ	tta, ttha, dda, ddha, nna
Dentals	ত থ দ ধ ন	ta, tha, da, dha, na
Labials	প ফ ব ভ ম	pa, pha, ba, bha, ma
Semi-vowels	য র ল	ya, ra, la
Sibilants/Fricatives	শ ষ স হ	sha, ssa, sa, ha
Additional	ড় ঢ় য়	rra, rrha, yya

Each consonant carries the inherent vowel /a/ (pronounced /o/ in standard Bengali). To suppress the inherent vowel (creating a "dead" consonant), the hasanta (virama, U+09CD) is used.

Conjunct Consonants (যুক্তাক্ষর)

One of Bengali script's most complex features is its system of conjunct consonants (juktakkhor) — ligatures formed when two or more consonants occur together without an intervening vowel. Rather than writing each consonant with a hasanta between them, Bengali typically merges them into a combined form:

ক + ্ + ষ → ক্ষ  (ksha — a single conjunct glyph)
স + ্ + ত → স্ত  (sta)
ন + ্ + ত → ন্ত  (nta)

Bengali has hundreds of conjunct forms, many of which look nothing like their component letters. This makes Bengali one of the most demanding scripts for font design — a comprehensive Bengali font must include glyphs for all common conjuncts, mapped through OpenType GSUB (Glyph Substitution) tables.

Some notable conjunct examples:

Components	Conjunct	Transliteration	Notes
ক + ্ + ত	ক্ত	kta	Common
ক + ্ + ষ	ক্ষ	ksha	Looks very different from components
জ + ্ + ঞ	জ্ঞ	gya/jnya	Completely reshaped
ঙ + ্ + ক	ঙ্ক	nka	NG + KA
হ + ্ + ন	হ্ন	hna	Subjoined form
ত + ্ + র	ত্র	tra	R takes a special below form

The Reph and Ya-phala

Two consonants have special combining behavior that appears throughout Bengali text:

Reph: When র (ra) appears before another consonant with a hasanta, it takes a special form called reph — a small hook above the following consonant cluster: র + ্ + ক → র্ক (rka, with reph above ক)
Ya-phala: When য (ya) appears after a consonant with a hasanta, it takes a subscript form called ya-phala (a curved stroke below the consonant): ক + ্ + য → ক্য (kya, with ya-phala below ক)
Ra-phala: Similarly, র after a hasanta becomes a subscript diagonal stroke: ক + ্ + র → ক্র (kra, with ra-phala below ক)

The Unicode Bengali Block

Block	Range	Characters
Bengali	U+0980 – U+09FF	96 assigned

The block is organized as follows:

Range	Content	Count
U+0981 – U+0983	Chandrabindu, anusvara, visarga	3
U+0985 – U+0994	Independent vowels	14
U+0995 – U+09B9	Consonants	35
U+09BE – U+09CC	Vowel signs (dependent)	12
U+09CD	Hasanta (virama)	1
U+09CE	Khanda Ta	1
U+09D7	AU length mark	1
U+09DC – U+09DF	Additional consonants (nukta forms)	4
U+09E0 – U+09E3	Vocalic letters and signs	4
U+09E6 – U+09EF	Bengali digits	10
U+09F0 – U+09FA	Additional signs (currency, etc.)	11

Bengali Digits

Bengali has its own numeral system, though Arabic (Western) numerals are increasingly common:

Bengali	Value	Code Point
০	0	U+09E6
১	1	U+09E7
২	2	U+09E8
৩	3	U+09E9
৪	4	U+09EA
৫	5	U+09EB
৬	6	U+09EC
৭	7	U+09ED
৮	8	U+09EE
৯	9	U+09EF

Special Characters

Chandrabindu (U+0981): Nasalization mark (ँ)
Anusvara (U+0982): Nasal sound marker (ং)
Visarga (U+0983): Aspiration marker (ঃ)
Hasanta/Virama (U+09CD): Suppresses inherent vowel, triggers conjunct formation
Khanda Ta (U+09CE): A special form of ত without inherent vowel, used word-finally
Bengali Rupee Sign (U+09F3): ৳

Text Rendering Pipeline

Rendering Bengali text correctly requires a sophisticated shaping engine. The process involves multiple steps:

1. Character Reordering

Left-position vowel signs (ি, ে, ৈ) are stored after their consonant in Unicode (logical order) but rendered before it (visual order). The rendering engine must reorder these:

Stored:   ক (U+0995) + ি (U+09BF)
Rendered: কি  (the ি appears to the left of ক)

2. Conjunct Formation

When the engine encounters a consonant + hasanta + consonant sequence, it checks the font's GSUB table for a matching conjunct glyph:

Input:  ক (U+0995) + ্ (U+09CD) + ষ (U+09B7)
Lookup: GSUB table → conjunct glyph for ক্ষ
Output: Single conjunct glyph ক্ষ

If no conjunct glyph exists in the font, the hasanta is displayed explicitly.

3. Mark Positioning

Above-marks (chandrabindu, reph) and below-marks (vowel signs ু, ূ, ra-phala) are positioned using the font's GPOS (Glyph Positioning) table.

Common Rendering Issues

Problem	Cause	Solution
Conjuncts show as base + hasanta	Font lacks GSUB rules	Use a complete Bengali font
Vowel ি appears after consonant	Shaping engine not active	Enable HarfBuzz/Uniscribe
Marks overlap	Missing GPOS data	Use a quality font (Noto Sans Bengali, SolaimanLipi)
Reph misplaced	Complex cluster not handled	Update rendering engine

Working with Bengali in Code

Python

import unicodedata

# Bengali character properties
char = "\u0995"  # ক (ka)
print(unicodedata.name(char))      # BENGALI LETTER KA
print(unicodedata.category(char))  # Lo (Letter, other)

# Check if a character is Bengali
def is_bengali(ch: str) -> bool:
    return "\u0980" <= ch <= "\u09FF"

# Iterate over a Bengali string — conjuncts are multiple code points
text = "বাংলা"  # "Bangla"
for i, ch in enumerate(text):
    print(f"  [{i}] U+{ord(ch):04X} {unicodedata.name(ch, '?')}")

JavaScript

// Regex for Bengali block
const bengaliPattern = /[\u0980-\u09FF]/;

function containsBengali(text) {
  return bengaliPattern.test(text);
}

// Grapheme clusters — important for Bengali
// "কি" is 2 code points but 1 visual unit
const segmenter = new Intl.Segmenter("bn", { granularity: "grapheme" });
const segments = [...segmenter.segment("বাংলা")];
console.log(segments.length); // Visual grapheme count

Sorting Bengali Text

Bengali sorting follows the traditional script order (vowels first, then consonants in systematic phonological order). ICU provides a Bengali-aware collator:

import icu

collator = icu.Collator.createInstance(icu.Locale("bn_BD"))
words = ["বাংলাদেশ", "আমার", "সোনার"]
sorted_words = sorted(words, key=collator.getSortKey)
print(sorted_words)  # Bengali dictionary order

Bengali vs. Assamese

Assamese uses the same script with minor differences:

Feature	Bengali	Assamese
র (ra)	Standard form	Different form (ৰ, U+09F0)
ৱ (wa)	Not used	Used (U+09F1)
Unicode block	Shared (U+0980–U+09FF)	Same block
Collation	bn locale	as locale

The shared Unicode block means Bengali and Assamese text are encoded identically at the character level, with the distinction handled by font selection and locale settings.

Key Takeaways

Bengali script is an abugida used by 300+ million people for Bengali and Assamese, encoded in the Unicode Bengali block (U+0980–U+09FF, 96 characters).
Conjunct consonants (juktakkhor) — ligatures of 2+ consonants — are the script's most complex feature, requiring extensive OpenType GSUB tables in fonts.
Vowel signs can appear left, right, above, below, or split around the consonant, and the rendering engine must reorder left-position vowels from logical to visual order.
The hasanta (virama, U+09CD) is the key combining character — it suppresses the inherent vowel and triggers conjunct formation between consonants.
Reph (র above), ya-phala (য below), and ra-phala (র below) are special combining forms that appear throughout Bengali text.
Use quality fonts with full OpenType Bengali support (Noto Sans Bengali, SolaimanLipi) and modern rendering engines (HarfBuzz) to ensure correct display.