Script Stories
Writing system deep dives with history and culture
15 руководств в этой серии
Arabic is the third most widely used writing system in the world, written right-to-left with letters that change shape depending on their position in a word — a complexity that Unicode handles with contextual shaping. This guide explores the Arabic script in Unicode, covering the Arabic block, presentation forms, bidirectional handling, and software support.
Devanagari is an abugida script used to write Hindi, Sanskrit, Marathi, and many other South Asian languages, with complex conjunct consonants and vowel diacritics that challenge text rendering engines. This guide explores the Devanagari Unicode block, how the script works, and how to render it correctly in software.
Greek is one of the oldest alphabetic writing systems and gave Unicode many of its mathematical symbols, with the Greek and Coptic block serving both modern Greek text and ancient Coptic liturgical use. This guide explores the Greek and Coptic Unicode block, the history of the script, and how Greek letters are used in mathematics and science.
Cyrillic is used to write Russian, Ukrainian, Bulgarian, Serbian, and over 50 other languages, making it one of the most widely used scripts in Unicode with characters spread across several Cyrillic blocks. This guide explores the Cyrillic script's history, its Unicode representation, and the challenges of supporting the full range of Cyrillic-using languages.
Hebrew is an abjad script written right-to-left, used for Biblical Hebrew, Modern Hebrew, and Yiddish, with optional vowel diacritics called niqqud that are encoded as combining characters. This guide covers the Hebrew Unicode block, how the bidirectional algorithm handles Hebrew text, and the history of this ancient script.
Thai is an abugida script with no spaces between words, complex vowel placement above and below consonants, and tone marks that affect meaning — all of which require sophisticated Unicode rendering. This guide explores the Thai Unicode block, how Thai text is encoded and segmented, and the challenges of Thai natural language processing.
Japanese is unique in using three scripts simultaneously — Hiragana, Katakana, and Kanji (CJK ideographs) — alongside Latin text, making it one of the most complex writing systems to support in Unicode. This guide explains how each Japanese script is encoded in Unicode, how they interact, and what developers need to know about Japanese text handling.
Hangul was invented in 1443 by King Sejong as a scientific alphabet where syllable blocks are algorithmically composed from individual jamo (consonants and vowels), a structure Unicode mirrors with both jamo and precomposed syllable encodings. This guide tells the story of Hangul, explains its unique Unicode encoding, and covers Korean text processing.
Bengali is an abugida script with over 300 million speakers, used for Bengali and Assamese, featuring complex conjunct consonant forms and vowel diacritics that require OpenType rendering. This guide explores the Bengali Unicode block, the script's history and structure, and software considerations for Bengali text.
Tamil is one of the oldest living writing systems, with a literary tradition spanning over 2,000 years, and its script encodes a relatively small set of characters that combine to form a large syllabic inventory. This guide explores the Tamil Unicode block, its classical and modern character sets, and considerations for Tamil text in digital applications.
The Armenian alphabet was created in 405 AD by the monk Mesrop Mashtots and has remained largely unchanged for over 1,600 years, used exclusively to write the Armenian language. This guide explores the Armenian Unicode block, the script's unique structure including its capital and lowercase forms, and the history of this ancient alphabet.
Georgian has three distinct historical scripts — Mkhedruli, Asomtavruli, and Nuskhuri — all encoded in Unicode, with modern Georgian using Mkhedruli for everyday text and the others reserved for ecclesiastical purposes. This guide tells the story of Georgian script, explores its Unicode blocks, and explains how the three scripts relate to each other.
The Ethiopic script (Ge'ez) is an abugida used to write Amharic, Tigrinya, Oromo, and many other languages of the Horn of Africa, with Unicode's Ethiopic block containing over 500 characters. This guide explores the history and structure of Ethiopic script, its Unicode encoding, and the challenges of digital Ethiopic text.
Unicode encodes dozens of historic and extinct scripts — from Cuneiform and Egyptian Hieroglyphs to Linear B and Gothic — preserving them digitally for scholars and researchers. This guide surveys the historic scripts in Unicode, explains why they were included, and describes the academic and cultural communities that depend on them.
There are hundreds of writing systems in use around the world today, from alphabets and syllabaries to abjads and abugidas, and Unicode aims to encode all of them in a single standard. This overview explains the major types of writing systems, how they are classified, and which ones are currently supported or missing from Unicode.