Cyrillic: The Script That Spans Continents — Writing Systems of the World

From the cobblestoned streets of Moscow to the steppe cities of Kazakhstan, from the Orthodox churches of Serbia to the mountain villages of Bulgaria, one script serves as the common writing system for a vast sweep of Eurasia: Cyrillic. Named after Saint Cyril — a 9th-century Byzantine scholar who may or may not have personally designed it — the Cyrillic alphabet today serves over a dozen official languages across Russia, Eastern Europe, the Balkans, and Central Asia, written by approximately 250 million people as their primary script. In Unicode, Cyrillic is one of the most complex scripts to encode correctly, partly because of its sheer breadth of languages and variants, and partly because of the security implications of its visual similarity to the Latin alphabet.

The Saints and the Script

In 862 CE, the Byzantine emperor Michael III sent two scholars — brothers Cyril (born Constantine) and Methodius — to the Great Moravian Empire (roughly modern Czech Republic and Slovakia) to introduce Christianity in the local Slavic language. To do this, they needed to translate the Bible and liturgical texts, which in turn required a writing system for Old Church Slavonic.

The script Cyril is said to have created was actually Glagolitic — a highly distinctive, curvilinear alphabet with no clear parallels to existing scripts. After Cyril's death in 869, his disciples in Bulgaria developed Cyrillic based primarily on the Greek alphabet, with additional letters created for sounds in Slavic languages that Greek lacked. This script, named in Cyril's honor, spread rapidly through the Orthodox Slavic world and ultimately supplanted Glagolitic almost everywhere.

The Cyrillic alphabet's Greek foundations are immediately apparent: А, В, Г, Д, Е, З, И, К, Л, М, Н, О, П, Р, С, Т, У, Ф, Х — many of these letters are visually identical or near-identical to their Greek counterparts. Letters like Б, Ж, Ц, Ч, Ш, Щ, Ъ, Ы, Ь, Э, Ю, Я were created for Slavic sounds absent from Greek.

Cyrillic Across Languages

As Cyrillic spread through the Russian Empire and later the Soviet Union, it was adapted for dozens of non-Slavic languages. The Soviet language policy of the 1930s–1940s systematically cyrillicized the writing systems of Central Asian and Caucasian peoples, often replacing existing Arabic, Latin, or indigenous scripts. The result was an explosion of Cyrillic variants:

Language	Unique Cyrillic Letters	Notes
Russian	— (standard Cyrillic)	33 letters
Ukrainian	Ї, І, Є, Ґ	Four letters absent from Russian Cyrillic
Belarusian	Ў	Short U for /w/ sound
Bulgarian	—	No Ъ as a vowel; different usage of some letters
Serbian	Ђ, Ж, Љ, Њ, Ћ, Ч, Џ	Diaphonemic distinctions
Macedonian	Ѓ, Ѕ, Ј, Љ, Њ, Ќ, Џ	Distinct from Serbian Cyrillic
Mongolian	No unique; extended usage	Uses Cyrillic since 1940s
Kazakh	9 unique letters	Transitioning to Latin
Bashkir, Tatar	Multiple unique	Tat includes Arabic-influenced sounds
Chuvash	Ӑ, Ӗ, Ҫ, Ӳ	Distinct Turkic sounds

The Unicode Cyrillic block (U+0400–U+04FF) and its extensions handle this diversity:

Block	Range	Count	Content
Cyrillic	U+0400–U+04FF	256	Modern + most extended Cyrillic
Cyrillic Supplement	U+0500–U+052F	48	Languages of the Russian Federation
Cyrillic Extended-A	U+2DE0–U+2DFF	32	Old Church Slavonic, historical
Cyrillic Extended-B	U+A640–U+A69F	96	Old Cyrillic, extended Old Slavic
Cyrillic Extended-C	U+1C80–U+1C8F	9	Lowercase forms of historical letters

Russian Orthographic Reform

Pre-revolutionary Russian (before 1918) used four additional letters that were abolished by Soviet decree: Ѣ (yat, U+0462), Ѳ (fita, U+0472), І (decimal i, U+0456), and Ѵ (izhitsa, U+0474). These appear in historical texts, pre-revolutionary reprints, and Church Slavonic documents. Unicode encodes them in the Cyrillic Supplement and main blocks, ensuring that digitized pre-1918 Russian texts can be represented faithfully.

The Confusables Problem

The visual overlap between Cyrillic and Latin letters is even more extensive than the Greek-Latin overlap, creating significant security concerns:

Cyrillic	Unicode	Latin	Unicode	Appearance
а	U+0430	a	U+0061	Identical (lowercase)
е	U+0435	e	U+0065	Identical
о	U+043E	o	U+006F	Identical
р	U+0440	p	U+0070	Identical
с	U+0441	c	U+0063	Identical
у	U+0443	y	U+0079	Identical
х	U+0445	x	U+0078	Identical
В	U+0412	B	U+0042	Identical (uppercase)
Е	U+0415	E	U+0045	Identical
М	U+041C	M	U+004D	Identical
Н	U+041D	H	U+0048	Identical
О	U+041E	O	U+004F	Identical
Р	U+0420	P	U+0050	Identical
С	U+0421	C	U+0043	Identical
Т	U+0422	T	U+0054	Identical
Х	U+0425	X	U+0058	Identical

This overlap is not coincidental — both scripts ultimately derive from the same Greek ancestor. But it creates real security vulnerabilities:

IDN Homograph Attacks: A domain like рaypal.com (Cyrillic р instead of Latin p) is visually indistinguishable from paypal.com in many fonts. In 2017, a security researcher registered аррlе.com (using Cyrillic а and р) and demonstrated how convincing such an attack could be.

Mitigations: ICANN's guidelines restrict mixing scripts in a single domain label. Modern browsers display punycode (the ACE encoding of IDN labels) when mixed-script domains are detected. Unicode's confusables data (in confusables.txt) documents character pairs and groups that are visually similar, providing a reference for security-sensitive applications.

Cyrillic for Non-Slavic Languages

Some of the most phonologically interesting Cyrillic letters were created for non-Slavic Central Asian and Siberian languages:

Ғ (U+0492): Used in Kazakh, Uzbek, Tajik — a voiced uvular fricative
Қ (U+049A): Voiceless uvular stop, common in Turkic languages
Ң (U+04A2): Velar nasal, for -ng- sounds
Ү (U+04AE): Close back unrounded vowel
Ӑ (U+04D0): Short A for Chuvash and Mari
Ӡ (U+04E1): Abkhaz letter
Ꚑ (U+A691): Cyrillic Extended-B letters for Caucasian languages

The Post-Soviet Script Shifts

Several former Soviet republics have shifted away from Cyrillic since 1991, motivated by desires to distance their national identity from Russian cultural influence and to improve compatibility with the Latin-dominant internet:

Moldova: Switched from Cyrillic back to Latin (Romanian uses Latin) in 1989
Azerbaijan: Switched to Latin in 1991, fully by 2001
Uzbekistan: Gradually transitioning to Latin since 1993 (still ongoing)
Turkmenistan: Switched to Latin in 1993
Kazakhstan: Announced transition to Latin in 2017, ongoing implementation

Mongolia, though not a former Soviet republic per se, still uses Soviet-introduced Cyrillic for standard Mongolian — though the traditional Mongolian script (vertical, encoded in Unicode at U+1800–U+18AF) is officially co-official and seeing a revival.

These geopolitical transitions create encoding challenges for digital archives, legacy databases, and historical documents. Unicode encodes all the necessary characters for both the Cyrillic and Latin forms of these languages, but accurate representation requires knowing which orthographic era a document comes from.