Greek and Coptic
Greek is one of the oldest alphabetic writing systems and gave Unicode many of its mathematical symbols, with the Greek and Coptic block serving both modern Greek text and ancient Coptic liturgical use. This guide explores the Greek and Coptic Unicode block, the history of the script, and how Greek letters are used in mathematics and science.
Greek is one of the oldest writing systems in continuous use. For over 2,700 years, the Greek alphabet has served as the script for one of the world's foundational literary and philosophical traditions — and its influence extends far beyond the Greek language. Greek letters are the standard notation of mathematics, physics, and engineering worldwide. The alphabet also gave birth to the Latin and Cyrillic scripts, making it the ancestor of writing systems used by billions. In Unicode, Greek shares a block with Coptic, the latest descendant of the Egyptian language, creating a fascinating intersection of ancient and modern. This guide explores the Greek and Coptic Unicode block, the extended Greek blocks, and the many roles Greek characters play in modern computing.
A Brief History
The Greek alphabet emerged around 800 BCE, adapted from the Phoenician consonantal script. The Greeks' crucial innovation was the systematic introduction of vowel letters — they repurposed Phoenician consonants that had no equivalent in Greek to represent vowel sounds. This made Greek the first true alphabet (as opposed to an abjad or abugida), where both consonants and vowels have dedicated letters.
The word "alphabet" itself comes from the first two Greek letters: alpha (α) and beta (β).
Over the centuries, Greek script evolved through several stages:
| Period | Script Form | Key Feature |
|---|---|---|
| 800–400 BCE | Archaic Greek | Multiple local variants |
| 403 BCE | Ionic alphabet adopted | Athens standardizes on 24 letters |
| 4th c. BCE – 8th c. CE | Greek majuscule (uncial) | All uppercase, no spaces |
| 9th c. CE onwards | Greek minuscule | Lowercase develops, accents added |
| 1982 | Monotonic reform | Greece simplifies to single accent |
The Greek Alphabet
Modern Greek uses 24 letters:
| Upper | Lower | Name | Unicode (Upper) | Unicode (Lower) |
|---|---|---|---|---|
| Α | α | Alpha | U+0391 | U+03B1 |
| Β | β | Beta | U+0392 | U+03B2 |
| Γ | γ | Gamma | U+0393 | U+03B3 |
| Δ | δ | Delta | U+0394 | U+03B4 |
| Ε | ε | Epsilon | U+0395 | U+03B5 |
| Ζ | ζ | Zeta | U+0396 | U+03B6 |
| Η | η | Eta | U+0397 | U+03B7 |
| Θ | θ | Theta | U+0398 | U+03B8 |
| Ι | ι | Iota | U+0399 | U+03B9 |
| Κ | κ | Kappa | U+039A | U+03BA |
| Λ | λ | Lambda | U+039B | U+03BB |
| Μ | μ | Mu | U+039C | U+03BC |
| Ν | ν | Nu | U+039D | U+03BD |
| Ξ | ξ | Xi | U+039E | U+03BE |
| Ο | ο | Omicron | U+039F | U+03BF |
| Π | π | Pi | U+03A0 | U+03C0 |
| Ρ | ρ | Rho | U+03A1 | U+03C1 |
| Σ | σ/ς | Sigma | U+03A3 | U+03C3/U+03C2 |
| Τ | τ | Tau | U+03A4 | U+03C4 |
| Υ | υ | Upsilon | U+03A5 | U+03C5 |
| Φ | φ | Phi | U+03A6 | U+03C6 |
| Χ | χ | Chi | U+03A7 | U+03C7 |
| Ψ | ψ | Psi | U+03A8 | U+03C8 |
| Ω | ω | Omega | U+03A9 | U+03C9 |
Final Sigma
Greek lowercase sigma has two forms: medial sigma (σ, U+03C3) used within words, and final sigma (ς, U+03C2) used at the end of words. Unicode encodes these as separate characters. Case conversion must account for this:
# Python handles final sigma correctly in case folding
word = "\u03BB\u03CC\u03B3\u03BF\u03C2" # λόγος
print(word.upper()) # ΛΟΓΟΣ — both sigmas become Σ
print(word.lower()) # λόγος — final sigma preserved
print(word.casefold()) # λόγοσ — casefold uses medial sigma (for comparison)
Unicode Blocks for Greek
| Block | Range | Characters | Purpose |
|---|---|---|---|
| Greek and Coptic | U+0370–U+03FF | 135 | Modern Greek letters + Coptic legacy |
| Greek Extended | U+1F00–U+1FFF | 233 | Polytonic Greek (ancient accents) |
| Coptic | U+2C80–U+2CFF | 123 | Dedicated Coptic characters |
| Coptic Epact Numbers | U+102E0–U+102FF | 28 | Coptic calendar numbers |
Greek and Coptic Block (U+0370–U+03FF)
This primary block contains:
- 24 modern Greek uppercase and lowercase letters
- Accented letters for monotonic Greek (ά, έ, ή, ί, ό, ύ, ώ)
- Diacritics: tonos (accent), dialytika (dieresis)
- The final sigma (ς)
- Archaic letters: digamma (Ϝ), koppa (Ϟ), sampi (Ϡ), stigma (Ϛ)
- Coptic letters that were historically unified with Greek (e.g., U+03E2 Ϣ)
Greek Extended Block (U+1F00–U+1FFF)
This block supports polytonic Greek — the traditional accent system used in Ancient Greek and in formal Greek writing before the 1982 reform. Polytonic Greek uses three accent marks, two breathing marks, and the iota subscript:
| Diacritic | Name | Example | Purpose |
|---|---|---|---|
| ´ | Oxia (acute) | ά | Rising pitch |
| ` | Varia (grave) | ὰ | Falling pitch |
| ˜ | Perispomeni (circumflex) | ᾶ | Rising-falling pitch |
| ʽ | Dasia (rough breathing) | ἁ | Initial /h/ sound |
| ʼ | Psili (smooth breathing) | ἀ | No initial /h/ |
| ͅ | Ypogegrammeni (iota subscript) | ᾳ | Historical diphthong |
The Greek Extended block provides precomposed characters for all combinations of these diacritics on vowels:
U+1F00 ἀ GREEK SMALL LETTER ALPHA WITH PSILI
U+1F01 ἁ GREEK SMALL LETTER ALPHA WITH DASIA
U+1F04 ἄ GREEK SMALL LETTER ALPHA WITH PSILI AND OXIA
U+1F05 ἅ GREEK SMALL LETTER ALPHA WITH DASIA AND OXIA
U+1F80 ᾀ GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI
U+1F86 ᾆ GREEK SMALL LETTER ALPHA WITH PSILI AND PERISPOMENI AND YPOGEGRAMMENI
Greek and Coptic: Why One Block?
When Unicode was first designed, Coptic characters were "unified" with Greek — Coptic letters that looked similar to Greek letters were given the same code points. This was a practical decision but created problems:
- Coptic and Greek are different scripts used by different communities
- Font selection broke — a Coptic text would render with Greek fonts
- Sorting and collation rules differ between the two scripts
Unicode 4.1 (2005) resolved this by adding a dedicated Coptic block (U+2C80–U+2CFF) with separate code points for all Coptic letters. The Coptic letters remaining in the Greek and Coptic block are kept for backward compatibility but are considered deprecated for Coptic use.
What is Coptic?
Coptic is the latest stage of the ancient Egyptian language, written with a script derived from the Greek alphabet plus six or seven additional letters from Demotic Egyptian. Coptic ceased to be a spoken vernacular language around the 17th century but remains the liturgical language of the Coptic Orthodox Church, used by approximately 15–20 million Coptic Christians in Egypt.
# Coptic-specific letters (not in Greek)
U+2C80 Ⲁ COPTIC CAPITAL LETTER ALFA
U+2C81 ⲁ COPTIC SMALL LETTER ALFA
U+2CA0 Ⲡ COPTIC CAPITAL LETTER PI
U+2CA2 Ⲣ COPTIC CAPITAL LETTER RO
U+2CB6 Ⳇ COPTIC CAPITAL LETTER CRYPTOGRAMMIC SHEI
Greek in Mathematics and Science
Greek letters are the lingua franca of mathematical and scientific notation. Unicode provides these characters in multiple contexts:
From the Greek Block (Plain Text)
These are the standard Greek letters used in running text:
| Symbol | Code Point | Common Use |
|---|---|---|
| α | U+03B1 | Angles, alpha particles, significance level |
| β | U+03B2 | Beta coefficients, beta particles |
| γ | U+03B3 | Gamma rays, Euler–Mascheroni constant |
| δ | U+03B4 | Small changes (calculus), Kronecker delta |
| ε | U+03B5 | Arbitrarily small quantities (analysis) |
| θ | U+03B8 | Angles (trigonometry) |
| λ | U+03BB | Wavelength, lambda calculus, eigenvalues |
| μ | U+03BC | Micro- prefix, mean (statistics) |
| π | U+03C0 | Pi (3.14159...) |
| σ | U+03C3 | Standard deviation, summation (upper: Σ) |
| φ | U+03C6 | Golden ratio, phase angle, Euler's totient |
| ω | U+03C9 | Angular frequency |
| Δ | U+0394 | Change/difference |
| Σ | U+03A3 | Summation |
| Π | U+03A0 | Product |
| Ω | U+03A9 | Ohm (also U+2126 OHM SIGN for compatibility) |
Mathematical Alphanumeric Symbols
For mathematical typography that requires distinct styles, Unicode provides styled variants in the Mathematical Alphanumeric Symbols block (U+1D400–U+1D7FF):
| Style | Example | Range |
|---|---|---|
| Bold | 𝛂 𝛃 𝛄 | U+1D6C2–U+1D6DB |
| Italic | 𝛼 𝛽 𝛾 | U+1D6FC–U+1D715 |
| Bold Italic | 𝜶 𝜷 𝜸 | U+1D736–U+1D74F |
These are used in formal mathematical typesetting to distinguish between different uses of the same letter.
Confusable Characters
Greek letters are a major source of homoglyph attacks because many look identical to Latin letters:
| Greek | Latin | Identical? |
|---|---|---|
| Α (U+0391) | A (U+0041) | Visually identical |
| Β (U+0392) | B (U+0042) | Visually identical |
| Ε (U+0395) | E (U+0045) | Visually identical |
| Η (U+0397) | H (U+0048) | Visually identical |
| Ι (U+0399) | I (U+0049) | Visually identical |
| Κ (U+039A) | K (U+004B) | Visually identical |
| Μ (U+039C) | M (U+004D) | Visually identical |
| Ν (U+039D) | N (U+004E) | Visually identical |
| Ο (U+039F) | O (U+004F) | Visually identical |
| Ρ (U+03A1) | P (U+0050) | Visually identical |
| Τ (U+03A4) | T (U+0054) | Visually identical |
| Χ (U+03A7) | X (U+0058) | Visually identical |
| ο (U+03BF) | o (U+006F) | Visually identical |
| ν (U+03BD) | v (U+0076) | Very similar |
This is why the Unicode Consortium publishes the confusables.txt file and why IDNA (Internationalized Domain Names) restricts mixing Greek and Latin characters in the same domain label.
# Detecting mixed scripts (potential homoglyph attack)
import unicodedata
def get_script(char: str) -> str:
# Simplified — in practice use the Unicode Script property
cp = ord(char)
if 0x0370 <= cp <= 0x03FF or 0x1F00 <= cp <= 0x1FFF:
return "Greek"
elif 0x0041 <= cp <= 0x024F:
return "Latin"
return "Other"
text = "\u0391pple" # Greek Alpha + "pple"
scripts = {get_script(c) for c in text if c.isalpha()}
if len(scripts) > 1:
print(f"Mixed scripts detected: {scripts}")
# Mixed scripts detected: {'Greek', 'Latin'}
Working with Greek Text in Code
Python
import unicodedata
# Modern Greek (monotonic)
text = "\u039A\u03B1\u03BB\u03B7\u03BC\u03AD\u03C1\u03B1" # Καλημέρα (Good morning)
print(text.upper()) # ΚΑΛΗΜΕΡΑ
print(text.lower()) # καλημέρα
# Check for Greek script
for ch in text:
print(f"U+{ord(ch):04X} {unicodedata.name(ch)}")
# Ancient Greek (polytonic)
ancient = "\u1F08\u03BD\u03B4\u03C1\u03CE\u03C0\u03BF\u03C5" # Ἀνδρώπου
JavaScript
// Greek regex matching
const greekPattern = /\p{Script=Greek}/u;
const text = "\u039A\u03B1\u03BB\u03B7\u03BC\u03AD\u03C1\u03B1";
console.log(greekPattern.test(text)); // true
// Normalize polytonic to monotonic (approximate)
const polytonic = "\u1F08\u03BD\u03B4\u03C1\u03CE\u03C0\u03BF\u03C5";
const nfd = polytonic.normalize("NFD");
// Remove combining marks except tonos
const monotonic = nfd.replace(/[\u0300\u0301\u0342\u0313\u0314\u0345]/g, "");
Summary
Greek is far more than a modern language script — it is a cornerstone of global scientific and mathematical notation, the ancestor of Latin and Cyrillic, and a writing system with nearly three millennia of continuous history. Key takeaways for developers:
- Greek and Coptic share a Unicode block but are separate scripts — use the dedicated Coptic block (U+2C80–U+2CFF) for Coptic text
- Final sigma (ς, U+03C2) must be handled correctly in case conversion and text processing
- Polytonic Greek uses the Greek Extended block (U+1F00–U+1FFF) with complex combinations of breathing marks and accents
- Greek–Latin confusables are a security concern for domain names, usernames, and any mixed-script context
- Mathematical Greek uses standard Greek code points in plain text; use the Mathematical Alphanumeric Symbols block only for styled variants
- Normalize polytonic text carefully — NFC and NFD produce different code point sequences that must be handled consistently
Thêm trong Script Stories
Arabic is the third most widely used writing system in the world, …
Devanagari is an abugida script used to write Hindi, Sanskrit, Marathi, and …
Cyrillic is used to write Russian, Ukrainian, Bulgarian, Serbian, and over 50 …
Hebrew is an abjad script written right-to-left, used for Biblical Hebrew, Modern …
Thai is an abugida script with no spaces between words, complex vowel …
Japanese is unique in using three scripts simultaneously — Hiragana, Katakana, and …
Hangul was invented in 1443 by King Sejong as a scientific alphabet …
Bengali is an abugida script with over 300 million speakers, used for …
Tamil is one of the oldest living writing systems, with a literary …
The Armenian alphabet was created in 405 AD by the monk Mesrop …
Georgian has three distinct historical scripts — Mkhedruli, Asomtavruli, and Nuskhuri — …
The Ethiopic script (Ge'ez) is an abugida used to write Amharic, Tigrinya, …
Unicode encodes dozens of historic and extinct scripts — from Cuneiform and …
There are hundreds of writing systems in use around the world today, …