The Endangered Scripts: Preserving Languages Through Unicode — Writing Systems of the World

In the great archive of human writing, Unicode is engaged in a race against time. Every year, somewhere in the world, the last speaker of a language dies. When languages die, their writing systems — if they had them — risk disappearing too. Some scripts are in imminent danger; others were nearly lost before being rescued by dedicated communities and the painstaking process of Unicode encoding. The story of endangered and minority scripts in Unicode is a story of linguists, activists, community scholars, and international cooperation working together to ensure that the full range of human graphical ingenuity is preserved in the universal character set.

The Threat Landscape

Of the approximately 7,000 living languages in the world, over half are expected to go extinct by the end of this century. Most of these languages have never been written at all, or only recently acquired writing systems. But several hundred have indigenous or traditional scripts that developed over centuries — scripts used in manuscripts, inscriptions, literature, and religious practice. For these, extinction of the language risks extinction of the script.

The threats are varied: - Colonial displacement: Indigenous scripts were suppressed and replaced by Latin (in the Americas, Africa, the Pacific) or Arabic/Perso-Arabic (in Central Asia) - Prestige languages: Major national languages with well-resourced digital tools outcompete minority language scripts for practical use - Literacy policy: Many countries teach national language literacy using a single script, leaving minority script literacy to informal transmission - Digital barriers: If a script isn't in Unicode, it effectively doesn't exist in the digital world — no fonts, no keyboards, no text processing

Tifinagh: The Berber Script

The Berber (Amazigh) peoples of North Africa — spanning Morocco, Algeria, Tunisia, Libya, Mali, Niger, and Egypt — have one of the world's oldest writing traditions. Ancient Libyan inscriptions, dating to at least the 3rd century BCE, use a script ancestral to Tifinagh — still used by the Tuareg of the Sahara and now being revived as a prestige script for the broader Berber identity movement.

The Neo-Tifinagh variant, developed in the 1960s by the Berber cultural organization IRCAM (Institut Royal de la Culture Amazighe) in Morocco, was standardized and encoded in Unicode 4.1 (2005) in the Tifinagh block (U+2D30–U+2D7F, 59 characters).

This Unicode inclusion had concrete political and cultural effects: Morocco recognized Tifinagh as an official script for the Amazigh language in its 2011 constitution, and Tifinagh is now taught in Moroccan schools. Without Unicode encoding, digital implementation at national scale would have been far more difficult. The script's inclusion in Unicode was itself part of the advocacy for official recognition — a demonstration that the script was technically viable for modern digital use.

Cherokee: A Syllabary Created in Five Years

The Cherokee syllabary is one of the most remarkable feats of writing system creation in recorded history. Around 1821, Sequoyah (also known as George Guess, c. 1775–1843), a Cherokee silversmith who was himself illiterate, completed a 85-character syllabary for the Cherokee language. Sequoyah had no formal education in linguistics — he apparently created the syllabary through years of careful observation of spoken Cherokee and systematic experimentation with characters.

Within months of its completion, Cherokee literacy spread rapidly through the nation. By the 1820s, the Cherokee Phoenix newspaper was published in both Cherokee and English. The syllabary was used for religious texts, legal documents, and personal correspondence.

After the Trail of Tears (1838–1839), which forcibly removed the Cherokee from their eastern homeland, literacy continued — but decades of forced assimilation policies in U.S. government boarding schools suppressed the language. By the late 20th century, Cherokee was endangered.

Unicode encodes Cherokee in two blocks: - Cherokee (U+13A0–U+13FF): The original 85 characters, encoded in Unicode 3.0 (1999) - Cherokee Supplement (U+AB70–U+ABBF): Lowercase forms added in Unicode 8.0 (2015) to enable mixed-case typography for modern usage

The lowercase addition was driven by the Cherokee Nation's language revitalization efforts, which identified the lack of case distinction as a barrier to modern digital communication. Today, the Eastern Band of Cherokee Indians and the Cherokee Nation in Oklahoma both operate language revitalization programs, with Duolingo offering a Cherokee course and digital materials in the syllabary widely available.

Unified Canadian Aboriginal Syllabics

In the 1840s, Methodist missionary James Evans developed a syllabic writing system for Cree, adapting shorthand principles to create a geometric system where syllables are written as geometric shapes rotated to indicate vowels — a square, triangle, and wedge, each rotated 90 degrees for each of four vowel positions.

Evans' syllabics spread across the Canadian Arctic and Subarctic, adapted for Ojibwe, Inuktitut, Naskapi, and dozens of other First Nations languages. Today, Inuktitut is the most widely used language written in syllabics, spoken by the Inuit peoples of Nunavut, Nunatsiavut, and Nunavik in Canada.

Unicode encodes the Unified Canadian Aboriginal Syllabics block (U+1400–U+167F, 640 characters) — a large and complex block accommodating the variants used by different language communities. The block includes: - Core vowel-syllable characters for different consonants - Final consonant marks - Language-specific extended characters for Inuktitut, Ojibwe, Cree variants, etc.

A UCAS Extended block (U+18B0–U+18FF) adds further characters needed for Eastern/Western Ojibwe and other variants.

N'Ko: The Script That Arrived by Mail

N'Ko (ߒߞߏ) is a script invented in 1949 by Solomana Kante, a Guinean intellectual who reportedly created it after receiving a letter claiming that Africans had no indigenous writing. In response, Kante — working without formal linguistic training — devised a right-to-left alphabet for the Manding language family (including Mandinka, Bambara, Dyula, and Soninke).

What makes N'Ko remarkable is its speed of adoption and its community-driven digital presence. Despite being only 75 years old, N'Ko has an active community of writers, publishers, and educators. The Internet Sacred Text Archive contains Quranic translations in N'Ko. N'Ko-language Wikipedia exists. Keyboards are available.

N'Ko was encoded in Unicode 5.0 (2006) in the N'Ko block (U+07C0–U+07FF, 59 characters). Its right-to-left directionality, like Arabic and Hebrew, requires bidirectional rendering support — and because N'Ko is not as well-supported as Arabic in shaping engines, applications sometimes fall back to incorrect left-to-right rendering.

Vai: West Africa's Rediscovered Script

The Vai syllabary was created around 1833 by Momolu Duwalu Bukele in what is now Liberia. Vai is a tonal Mande language, and the syllabary — developed without knowledge of other writing systems — encodes syllable sounds without representing tones explicitly (tones being inferred from context, as in many African language writing systems).

Vai literacy was vigorous in the 19th and early 20th centuries, used for personal correspondence, traditional religious texts, and record-keeping. Colonialism and the prestige of English-medium education suppressed Vai literacy significantly.

Unicode encoded Vai in Unicode 5.1 (2008) in the Vai block (U+A500–U+A63F, 300 characters). The relatively large block reflects the syllabary's size — Vai has many distinct syllable types for its consonant × vowel combinations.

Osage: The Newest Endangered Script Success Story

Osage (𐓂𐓣𐓤𐓘𐓞𐓙) is a writing system developed in 2012 by Herman Mongrain Lookout for the Osage Nation (based in Oklahoma, USA). Lookout designed a new script from scratch rather than adapting an existing system — creating an alphabet with distinctive geometric letterforms that work well in both print and digital contexts.

The script was encoded in Unicode 9.0 (2016) — only four years after its creation — in the Osage block (U+10480–U+1049F, 40 characters). The fast encoding reflects improved Unicode processes and the strong advocacy of the Osage Nation.

Osage is an example of how Unicode can actively support language revitalization, not just preserve historical scripts: by encoding Osage quickly, Unicode enabled font development, keyboard design, and digital material creation that are now supporting Osage language classes for tribal youth.

Adlam: Writing for the Fulani

Adlam (𞤀𞤣𞤤𞤢𞤥) was developed in the 1980s by Ibrahima Barry and Abdoulaye Barry, two brothers from Guinea who were teenagers when they created the script for the Fulani (Fula/Fulbe) people. Fulani is spoken by over 40 million people across West Africa — one of the continent's largest language communities.

Adlam encodes a right-to-left alphabet that captures the phonology of Fulani, including sounds absent from the Latin alphabet. It has seen remarkable adoption: as of the 2010s, millions of Fulani people use Adlam for WhatsApp messages, social media posts, and community publications.

Unicode encoded Adlam in Unicode 9.0 (2016) in the Adlam block (U+1E900–U+1E95F, 88 characters). Meta (Facebook/WhatsApp) and Microsoft have added Adlam keyboard support, accelerating its digital adoption.

The Script Encoding Initiative

The primary academic driver behind endangered script encoding is the Script Encoding Initiative (SEI), housed at the University of California, Berkeley, and directed by Deborah Anderson. Since 2002, SEI has funded and coordinated the encoding proposals that brought dozens of scripts into Unicode:

Scripts encoded through SEI support include: Pau Cin Hau, Sora Sompeng, Mro, Warang Citi, Khojki, Khudawadi, Mahajani, Tirhuta, Siddham, and many more. SEI's process involves working with communities, documenting scripts from manuscripts and living practitioners, preparing formal Unicode proposals, and shepherding proposals through the Unicode Technical Committee review process.

The process is painstaking: a typical proposal takes 2–5 years from initial contact to Unicode inclusion. But each encoded script represents a permanent digital infrastructure commitment — once in Unicode, a script has fonts, keyboards, OCR research, and digital text possible in ways that simply weren't before.

The endangered scripts section of Unicode is ultimately a testament to humanity's graphical creativity and diversity — and to the conviction, embedded in the Unicode project from its founding, that all human writing deserves a home in the universal character set.