📜 Script Stories

Bengali Script

Bengali is an abugida script with over 300 million speakers, used for Bengali and Assamese, featuring complex conjunct consonant forms and vowel diacritics that require OpenType rendering. This guide explores the Bengali Unicode block, the script's history and structure, and software considerations for Bengali text.

·

Bengali script (also called Bangla script) is one of the most widely used writing systems in the world, serving over 300 million people as the primary script for Bengali (the official language of Bangladesh and the Indian state of West Bengal) and Assamese. Descended from the ancient Brahmi script through Siddham and proto-Bengali forms, the modern Bengali script took shape around the 11th century CE. In Unicode, Bengali presents significant rendering challenges due to its complex conjunct consonants, vowel sign placement, and contextual shaping requirements. This guide explores the script's structure, its Unicode encoding, and the technical considerations for working with Bengali text.

History and Background

Bengali script belongs to the Eastern Nagari family of scripts, sharing ancestry with Assamese, Maithili, and Tirhuta scripts. Its evolution can be traced through several stages:

Period Form Notable Feature
3rd century BCE Brahmi Ancestor of nearly all South/Southeast Asian scripts
5th–6th century Siddham/Gupta Rounded letterforms emerge
11th century Proto-Bengali Distinctive Bengali characteristics appear
15th century Bengali-Assamese Modern form solidifies
19th century Standardized Bengali Print standardization under British Raj

The script achieved its modern printed form through the work of Ishwar Chandra Vidyasagar and the Serampore Mission Press in the 19th century. The characteristic matra (the horizontal headline connecting letters, similar to Devanagari's shirorekha) became a standard typographic feature in printed Bengali.

Script Structure

Bengali is an abugida (alphasyllabary) where each consonant letter carries an inherent vowel /a/ (or /o/ in Bengali pronunciation). Other vowels are indicated by diacritical marks (vowel signs) added to the consonant.

Vowels

Bengali has 11 vowel letters (independent forms used at the start of words or syllables) and corresponding vowel signs (dependent forms attached to consonants):

Vowel Independent Vowel Sign Position Unicode (Independent)
a (inherent) U+0985
aa Right U+0986
i ি Left U+0987
ii Right U+0988
u Below U+0989
uu Below U+098A
ri Below U+098B
e Left U+098F
ai Left U+0990
o Left + Right U+0993
au Left + Right U+0994

Note the vowel signs for ো (o) and ৌ (au) — these are composite, appearing both to the left and right of the consonant simultaneously. In Unicode, these are encoded as two-part vowel signs: ে (U+09CB left part) + া (right part) for ো, forming a single visual unit around the consonant.

Consonants

Bengali has 35 consonant letters in the basic set:

Range Letters Examples
Velars ক খ গ ঘ ঙ ka, kha, ga, gha, nga
Palatals চ ছ জ ঝ ঞ cha, chha, ja, jha, nya
Retroflexes ট ঠ ড ঢ ণ tta, ttha, dda, ddha, nna
Dentals ত থ দ ধ ন ta, tha, da, dha, na
Labials প ফ ব ভ ম pa, pha, ba, bha, ma
Semi-vowels য র ল ya, ra, la
Sibilants/Fricatives শ ষ স হ sha, ssa, sa, ha
Additional ড় ঢ় য় rra, rrha, yya

Each consonant carries the inherent vowel /a/ (pronounced /o/ in standard Bengali). To suppress the inherent vowel (creating a "dead" consonant), the hasanta (virama, U+09CD) is used.

Conjunct Consonants (যুক্তাক্ষর)

One of Bengali script's most complex features is its system of conjunct consonants (juktakkhor) — ligatures formed when two or more consonants occur together without an intervening vowel. Rather than writing each consonant with a hasanta between them, Bengali typically merges them into a combined form:

ক + ্ + ষ → ক্ষ  (ksha — a single conjunct glyph)
স + ্ + ত → স্ত  (sta)
ন + ্ + ত → ন্ত  (nta)

Bengali has hundreds of conjunct forms, many of which look nothing like their component letters. This makes Bengali one of the most demanding scripts for font design — a comprehensive Bengali font must include glyphs for all common conjuncts, mapped through OpenType GSUB (Glyph Substitution) tables.

Some notable conjunct examples:

Components Conjunct Transliteration Notes
ক + ্ + ত ক্ত kta Common
ক + ্ + ষ ক্ষ ksha Looks very different from components
জ + ্ + ঞ জ্ঞ gya/jnya Completely reshaped
ঙ + ্ + ক ঙ্ক nka NG + KA
হ + ্ + ন হ্ন hna Subjoined form
ত + ্ + র ত্র tra R takes a special below form

The Reph and Ya-phala

Two consonants have special combining behavior that appears throughout Bengali text:

  • Reph: When র (ra) appears before another consonant with a hasanta, it takes a special form called reph — a small hook above the following consonant cluster: র + ্ + ক → র্ক (rka, with reph above ক)

  • Ya-phala: When য (ya) appears after a consonant with a hasanta, it takes a subscript form called ya-phala (a curved stroke below the consonant): ক + ্ + য → ক্য (kya, with ya-phala below ক)

  • Ra-phala: Similarly, র after a hasanta becomes a subscript diagonal stroke: ক + ্ + র → ক্র (kra, with ra-phala below ক)

The Unicode Bengali Block

Block Range Characters
Bengali U+0980 – U+09FF 96 assigned

The block is organized as follows:

Range Content Count
U+0981 – U+0983 Chandrabindu, anusvara, visarga 3
U+0985 – U+0994 Independent vowels 14
U+0995 – U+09B9 Consonants 35
U+09BE – U+09CC Vowel signs (dependent) 12
U+09CD Hasanta (virama) 1
U+09CE Khanda Ta 1
U+09D7 AU length mark 1
U+09DC – U+09DF Additional consonants (nukta forms) 4
U+09E0 – U+09E3 Vocalic letters and signs 4
U+09E6 – U+09EF Bengali digits 10
U+09F0 – U+09FA Additional signs (currency, etc.) 11

Bengali Digits

Bengali has its own numeral system, though Arabic (Western) numerals are increasingly common:

Bengali Value Code Point
0 U+09E6
1 U+09E7
2 U+09E8
3 U+09E9
4 U+09EA
5 U+09EB
6 U+09EC
7 U+09ED
8 U+09EE
9 U+09EF

Special Characters

  • Chandrabindu (U+0981): Nasalization mark (ँ)
  • Anusvara (U+0982): Nasal sound marker (ং)
  • Visarga (U+0983): Aspiration marker (ঃ)
  • Hasanta/Virama (U+09CD): Suppresses inherent vowel, triggers conjunct formation
  • Khanda Ta (U+09CE): A special form of ত without inherent vowel, used word-finally
  • Bengali Rupee Sign (U+09F3): ৳

Text Rendering Pipeline

Rendering Bengali text correctly requires a sophisticated shaping engine. The process involves multiple steps:

1. Character Reordering

Left-position vowel signs (ি, ে, ৈ) are stored after their consonant in Unicode (logical order) but rendered before it (visual order). The rendering engine must reorder these:

Stored:   ক (U+0995) + ি (U+09BF)
Rendered: কি  (the ি appears to the left of ক)

2. Conjunct Formation

When the engine encounters a consonant + hasanta + consonant sequence, it checks the font's GSUB table for a matching conjunct glyph:

Input:  ক (U+0995) + ্ (U+09CD) + ষ (U+09B7)
Lookup: GSUB table → conjunct glyph for ক্ষ
Output: Single conjunct glyph ক্ষ

If no conjunct glyph exists in the font, the hasanta is displayed explicitly.

3. Mark Positioning

Above-marks (chandrabindu, reph) and below-marks (vowel signs ু, ূ, ra-phala) are positioned using the font's GPOS (Glyph Positioning) table.

Common Rendering Issues

Problem Cause Solution
Conjuncts show as base + hasanta Font lacks GSUB rules Use a complete Bengali font
Vowel ি appears after consonant Shaping engine not active Enable HarfBuzz/Uniscribe
Marks overlap Missing GPOS data Use a quality font (Noto Sans Bengali, SolaimanLipi)
Reph misplaced Complex cluster not handled Update rendering engine

Working with Bengali in Code

Python

import unicodedata

# Bengali character properties
char = "\u0995"  # ক (ka)
print(unicodedata.name(char))      # BENGALI LETTER KA
print(unicodedata.category(char))  # Lo (Letter, other)

# Check if a character is Bengali
def is_bengali(ch: str) -> bool:
    return "\u0980" <= ch <= "\u09FF"

# Iterate over a Bengali string — conjuncts are multiple code points
text = "বাংলা"  # "Bangla"
for i, ch in enumerate(text):
    print(f"  [{i}] U+{ord(ch):04X} {unicodedata.name(ch, '?')}")

JavaScript

// Regex for Bengali block
const bengaliPattern = /[\u0980-\u09FF]/;

function containsBengali(text) {
  return bengaliPattern.test(text);
}

// Grapheme clusters — important for Bengali
// "কি" is 2 code points but 1 visual unit
const segmenter = new Intl.Segmenter("bn", { granularity: "grapheme" });
const segments = [...segmenter.segment("বাংলা")];
console.log(segments.length); // Visual grapheme count

Sorting Bengali Text

Bengali sorting follows the traditional script order (vowels first, then consonants in systematic phonological order). ICU provides a Bengali-aware collator:

import icu

collator = icu.Collator.createInstance(icu.Locale("bn_BD"))
words = ["বাংলাদেশ", "আমার", "সোনার"]
sorted_words = sorted(words, key=collator.getSortKey)
print(sorted_words)  # Bengali dictionary order

Bengali vs. Assamese

Assamese uses the same script with minor differences:

Feature Bengali Assamese
র (ra) Standard form Different form (ৰ, U+09F0)
ৱ (wa) Not used Used (U+09F1)
Unicode block Shared (U+0980–U+09FF) Same block
Collation bn locale as locale

The shared Unicode block means Bengali and Assamese text are encoded identically at the character level, with the distinction handled by font selection and locale settings.

Key Takeaways

  • Bengali script is an abugida used by 300+ million people for Bengali and Assamese, encoded in the Unicode Bengali block (U+0980–U+09FF, 96 characters).
  • Conjunct consonants (juktakkhor) — ligatures of 2+ consonants — are the script's most complex feature, requiring extensive OpenType GSUB tables in fonts.
  • Vowel signs can appear left, right, above, below, or split around the consonant, and the rendering engine must reorder left-position vowels from logical to visual order.
  • The hasanta (virama, U+09CD) is the key combining character — it suppresses the inherent vowel and triggers conjunct formation between consonants.
  • Reph (র above), ya-phala (য below), and ra-phala (র below) are special combining forms that appear throughout Bengali text.
  • Use quality fonts with full OpenType Bengali support (Noto Sans Bengali, SolaimanLipi) and modern rendering engines (HarfBuzz) to ensure correct display.

Thêm trong Script Stories