Bengali Script
Bengali is an abugida script with over 300 million speakers, used for Bengali and Assamese, featuring complex conjunct consonant forms and vowel diacritics that require OpenType rendering. This guide explores the Bengali Unicode block, the script's history and structure, and software considerations for Bengali text.
Bengali script (also called Bangla script) is one of the most widely used writing systems in the world, serving over 300 million people as the primary script for Bengali (the official language of Bangladesh and the Indian state of West Bengal) and Assamese. Descended from the ancient Brahmi script through Siddham and proto-Bengali forms, the modern Bengali script took shape around the 11th century CE. In Unicode, Bengali presents significant rendering challenges due to its complex conjunct consonants, vowel sign placement, and contextual shaping requirements. This guide explores the script's structure, its Unicode encoding, and the technical considerations for working with Bengali text.
History and Background
Bengali script belongs to the Eastern Nagari family of scripts, sharing ancestry with Assamese, Maithili, and Tirhuta scripts. Its evolution can be traced through several stages:
| Period | Form | Notable Feature |
|---|---|---|
| 3rd century BCE | Brahmi | Ancestor of nearly all South/Southeast Asian scripts |
| 5th–6th century | Siddham/Gupta | Rounded letterforms emerge |
| 11th century | Proto-Bengali | Distinctive Bengali characteristics appear |
| 15th century | Bengali-Assamese | Modern form solidifies |
| 19th century | Standardized Bengali | Print standardization under British Raj |
The script achieved its modern printed form through the work of Ishwar Chandra Vidyasagar and the Serampore Mission Press in the 19th century. The characteristic matra (the horizontal headline connecting letters, similar to Devanagari's shirorekha) became a standard typographic feature in printed Bengali.
Script Structure
Bengali is an abugida (alphasyllabary) where each consonant letter carries an inherent vowel /a/ (or /o/ in Bengali pronunciation). Other vowels are indicated by diacritical marks (vowel signs) added to the consonant.
Vowels
Bengali has 11 vowel letters (independent forms used at the start of words or syllables) and corresponding vowel signs (dependent forms attached to consonants):
| Vowel | Independent | Vowel Sign | Position | Unicode (Independent) |
|---|---|---|---|---|
| a | অ | (inherent) | — | U+0985 |
| aa | আ | া | Right | U+0986 |
| i | ই | ি | Left | U+0987 |
| ii | ঈ | ী | Right | U+0988 |
| u | উ | ু | Below | U+0989 |
| uu | ঊ | ূ | Below | U+098A |
| ri | ঋ | ৃ | Below | U+098B |
| e | এ | ে | Left | U+098F |
| ai | ঐ | ৈ | Left | U+0990 |
| o | ও | ো | Left + Right | U+0993 |
| au | ঔ | ৌ | Left + Right | U+0994 |
Note the vowel signs for ো (o) and ৌ (au) — these are composite, appearing both to the left and right of the consonant simultaneously. In Unicode, these are encoded as two-part vowel signs: ে (U+09CB left part) + া (right part) for ো, forming a single visual unit around the consonant.
Consonants
Bengali has 35 consonant letters in the basic set:
| Range | Letters | Examples |
|---|---|---|
| Velars | ক খ গ ঘ ঙ | ka, kha, ga, gha, nga |
| Palatals | চ ছ জ ঝ ঞ | cha, chha, ja, jha, nya |
| Retroflexes | ট ঠ ড ঢ ণ | tta, ttha, dda, ddha, nna |
| Dentals | ত থ দ ধ ন | ta, tha, da, dha, na |
| Labials | প ফ ব ভ ম | pa, pha, ba, bha, ma |
| Semi-vowels | য র ল | ya, ra, la |
| Sibilants/Fricatives | শ ষ স হ | sha, ssa, sa, ha |
| Additional | ড় ঢ় য় | rra, rrha, yya |
Each consonant carries the inherent vowel /a/ (pronounced /o/ in standard Bengali). To suppress the inherent vowel (creating a "dead" consonant), the hasanta (virama, U+09CD) is used.
Conjunct Consonants (যুক্তাক্ষর)
One of Bengali script's most complex features is its system of conjunct consonants (juktakkhor) — ligatures formed when two or more consonants occur together without an intervening vowel. Rather than writing each consonant with a hasanta between them, Bengali typically merges them into a combined form:
ক + ্ + ষ → ক্ষ (ksha — a single conjunct glyph)
স + ্ + ত → স্ত (sta)
ন + ্ + ত → ন্ত (nta)
Bengali has hundreds of conjunct forms, many of which look nothing like their component letters. This makes Bengali one of the most demanding scripts for font design — a comprehensive Bengali font must include glyphs for all common conjuncts, mapped through OpenType GSUB (Glyph Substitution) tables.
Some notable conjunct examples:
| Components | Conjunct | Transliteration | Notes |
|---|---|---|---|
| ক + ্ + ত | ক্ত | kta | Common |
| ক + ্ + ষ | ক্ষ | ksha | Looks very different from components |
| জ + ্ + ঞ | জ্ঞ | gya/jnya | Completely reshaped |
| ঙ + ্ + ক | ঙ্ক | nka | NG + KA |
| হ + ্ + ন | হ্ন | hna | Subjoined form |
| ত + ্ + র | ত্র | tra | R takes a special below form |
The Reph and Ya-phala
Two consonants have special combining behavior that appears throughout Bengali text:
-
Reph: When র (ra) appears before another consonant with a hasanta, it takes a special form called reph — a small hook above the following consonant cluster: র + ্ + ক → র্ক (rka, with reph above ক)
-
Ya-phala: When য (ya) appears after a consonant with a hasanta, it takes a subscript form called ya-phala (a curved stroke below the consonant): ক + ্ + য → ক্য (kya, with ya-phala below ক)
-
Ra-phala: Similarly, র after a hasanta becomes a subscript diagonal stroke: ক + ্ + র → ক্র (kra, with ra-phala below ক)
The Unicode Bengali Block
| Block | Range | Characters |
|---|---|---|
| Bengali | U+0980 – U+09FF | 96 assigned |
The block is organized as follows:
| Range | Content | Count |
|---|---|---|
| U+0981 – U+0983 | Chandrabindu, anusvara, visarga | 3 |
| U+0985 – U+0994 | Independent vowels | 14 |
| U+0995 – U+09B9 | Consonants | 35 |
| U+09BE – U+09CC | Vowel signs (dependent) | 12 |
| U+09CD | Hasanta (virama) | 1 |
| U+09CE | Khanda Ta | 1 |
| U+09D7 | AU length mark | 1 |
| U+09DC – U+09DF | Additional consonants (nukta forms) | 4 |
| U+09E0 – U+09E3 | Vocalic letters and signs | 4 |
| U+09E6 – U+09EF | Bengali digits | 10 |
| U+09F0 – U+09FA | Additional signs (currency, etc.) | 11 |
Bengali Digits
Bengali has its own numeral system, though Arabic (Western) numerals are increasingly common:
| Bengali | Value | Code Point |
|---|---|---|
| ০ | 0 | U+09E6 |
| ১ | 1 | U+09E7 |
| ২ | 2 | U+09E8 |
| ৩ | 3 | U+09E9 |
| ৪ | 4 | U+09EA |
| ৫ | 5 | U+09EB |
| ৬ | 6 | U+09EC |
| ৭ | 7 | U+09ED |
| ৮ | 8 | U+09EE |
| ৯ | 9 | U+09EF |
Special Characters
- Chandrabindu (U+0981): Nasalization mark (ँ)
- Anusvara (U+0982): Nasal sound marker (ং)
- Visarga (U+0983): Aspiration marker (ঃ)
- Hasanta/Virama (U+09CD): Suppresses inherent vowel, triggers conjunct formation
- Khanda Ta (U+09CE): A special form of ত without inherent vowel, used word-finally
- Bengali Rupee Sign (U+09F3): ৳
Text Rendering Pipeline
Rendering Bengali text correctly requires a sophisticated shaping engine. The process involves multiple steps:
1. Character Reordering
Left-position vowel signs (ি, ে, ৈ) are stored after their consonant in Unicode (logical order) but rendered before it (visual order). The rendering engine must reorder these:
Stored: ক (U+0995) + ি (U+09BF)
Rendered: কি (the ি appears to the left of ক)
2. Conjunct Formation
When the engine encounters a consonant + hasanta + consonant sequence, it checks the font's GSUB table for a matching conjunct glyph:
Input: ক (U+0995) + ্ (U+09CD) + ষ (U+09B7)
Lookup: GSUB table → conjunct glyph for ক্ষ
Output: Single conjunct glyph ক্ষ
If no conjunct glyph exists in the font, the hasanta is displayed explicitly.
3. Mark Positioning
Above-marks (chandrabindu, reph) and below-marks (vowel signs ু, ূ, ra-phala) are positioned using the font's GPOS (Glyph Positioning) table.
Common Rendering Issues
| Problem | Cause | Solution |
|---|---|---|
| Conjuncts show as base + hasanta | Font lacks GSUB rules | Use a complete Bengali font |
| Vowel ি appears after consonant | Shaping engine not active | Enable HarfBuzz/Uniscribe |
| Marks overlap | Missing GPOS data | Use a quality font (Noto Sans Bengali, SolaimanLipi) |
| Reph misplaced | Complex cluster not handled | Update rendering engine |
Working with Bengali in Code
Python
import unicodedata
# Bengali character properties
char = "\u0995" # ক (ka)
print(unicodedata.name(char)) # BENGALI LETTER KA
print(unicodedata.category(char)) # Lo (Letter, other)
# Check if a character is Bengali
def is_bengali(ch: str) -> bool:
return "\u0980" <= ch <= "\u09FF"
# Iterate over a Bengali string — conjuncts are multiple code points
text = "বাংলা" # "Bangla"
for i, ch in enumerate(text):
print(f" [{i}] U+{ord(ch):04X} {unicodedata.name(ch, '?')}")
JavaScript
// Regex for Bengali block
const bengaliPattern = /[\u0980-\u09FF]/;
function containsBengali(text) {
return bengaliPattern.test(text);
}
// Grapheme clusters — important for Bengali
// "কি" is 2 code points but 1 visual unit
const segmenter = new Intl.Segmenter("bn", { granularity: "grapheme" });
const segments = [...segmenter.segment("বাংলা")];
console.log(segments.length); // Visual grapheme count
Sorting Bengali Text
Bengali sorting follows the traditional script order (vowels first, then consonants in systematic phonological order). ICU provides a Bengali-aware collator:
import icu
collator = icu.Collator.createInstance(icu.Locale("bn_BD"))
words = ["বাংলাদেশ", "আমার", "সোনার"]
sorted_words = sorted(words, key=collator.getSortKey)
print(sorted_words) # Bengali dictionary order
Bengali vs. Assamese
Assamese uses the same script with minor differences:
| Feature | Bengali | Assamese |
|---|---|---|
| র (ra) | Standard form | Different form (ৰ, U+09F0) |
| ৱ (wa) | Not used | Used (U+09F1) |
| Unicode block | Shared (U+0980–U+09FF) | Same block |
| Collation | bn locale | as locale |
The shared Unicode block means Bengali and Assamese text are encoded identically at the character level, with the distinction handled by font selection and locale settings.
Key Takeaways
- Bengali script is an abugida used by 300+ million people for Bengali and Assamese, encoded in the Unicode Bengali block (U+0980–U+09FF, 96 characters).
- Conjunct consonants (juktakkhor) — ligatures of 2+ consonants — are the script's most complex feature, requiring extensive OpenType GSUB tables in fonts.
- Vowel signs can appear left, right, above, below, or split around the consonant, and the rendering engine must reorder left-position vowels from logical to visual order.
- The hasanta (virama, U+09CD) is the key combining character — it suppresses the inherent vowel and triggers conjunct formation between consonants.
- Reph (র above), ya-phala (য below), and ra-phala (র below) are special combining forms that appear throughout Bengali text.
- Use quality fonts with full OpenType Bengali support (Noto Sans Bengali, SolaimanLipi) and modern rendering engines (HarfBuzz) to ensure correct display.
Plus dans Script Stories
Arabic is the third most widely used writing system in the world, …
Devanagari is an abugida script used to write Hindi, Sanskrit, Marathi, and …
Greek is one of the oldest alphabetic writing systems and gave Unicode …
Cyrillic is used to write Russian, Ukrainian, Bulgarian, Serbian, and over 50 …
Hebrew is an abjad script written right-to-left, used for Biblical Hebrew, Modern …
Thai is an abugida script with no spaces between words, complex vowel …
Japanese is unique in using three scripts simultaneously — Hiragana, Katakana, and …
Hangul was invented in 1443 by King Sejong as a scientific alphabet …
Tamil is one of the oldest living writing systems, with a literary …
The Armenian alphabet was created in 405 AD by the monk Mesrop …
Georgian has three distinct historical scripts — Mkhedruli, Asomtavruli, and Nuskhuri — …
The Ethiopic script (Ge'ez) is an abugida used to write Amharic, Tigrinya, …
Unicode encodes dozens of historic and extinct scripts — from Cuneiform and …
There are hundreds of writing systems in use around the world today, …