📜 Script Stories

Greek and Coptic

Greek is one of the oldest alphabetic writing systems and gave Unicode many of its mathematical symbols, with the Greek and Coptic block serving both modern Greek text and ancient Coptic liturgical use. This guide explores the Greek and Coptic Unicode block, the history of the script, and how Greek letters are used in mathematics and science.

·

Greek is one of the oldest writing systems in continuous use. For over 2,700 years, the Greek alphabet has served as the script for one of the world's foundational literary and philosophical traditions — and its influence extends far beyond the Greek language. Greek letters are the standard notation of mathematics, physics, and engineering worldwide. The alphabet also gave birth to the Latin and Cyrillic scripts, making it the ancestor of writing systems used by billions. In Unicode, Greek shares a block with Coptic, the latest descendant of the Egyptian language, creating a fascinating intersection of ancient and modern. This guide explores the Greek and Coptic Unicode block, the extended Greek blocks, and the many roles Greek characters play in modern computing.

A Brief History

The Greek alphabet emerged around 800 BCE, adapted from the Phoenician consonantal script. The Greeks' crucial innovation was the systematic introduction of vowel letters — they repurposed Phoenician consonants that had no equivalent in Greek to represent vowel sounds. This made Greek the first true alphabet (as opposed to an abjad or abugida), where both consonants and vowels have dedicated letters.

The word "alphabet" itself comes from the first two Greek letters: alpha (α) and beta (β).

Over the centuries, Greek script evolved through several stages:

Period Script Form Key Feature
800–400 BCE Archaic Greek Multiple local variants
403 BCE Ionic alphabet adopted Athens standardizes on 24 letters
4th c. BCE – 8th c. CE Greek majuscule (uncial) All uppercase, no spaces
9th c. CE onwards Greek minuscule Lowercase develops, accents added
1982 Monotonic reform Greece simplifies to single accent

The Greek Alphabet

Modern Greek uses 24 letters:

Upper Lower Name Unicode (Upper) Unicode (Lower)
Α α Alpha U+0391 U+03B1
Β β Beta U+0392 U+03B2
Γ γ Gamma U+0393 U+03B3
Δ δ Delta U+0394 U+03B4
Ε ε Epsilon U+0395 U+03B5
Ζ ζ Zeta U+0396 U+03B6
Η η Eta U+0397 U+03B7
Θ θ Theta U+0398 U+03B8
Ι ι Iota U+0399 U+03B9
Κ κ Kappa U+039A U+03BA
Λ λ Lambda U+039B U+03BB
Μ μ Mu U+039C U+03BC
Ν ν Nu U+039D U+03BD
Ξ ξ Xi U+039E U+03BE
Ο ο Omicron U+039F U+03BF
Π π Pi U+03A0 U+03C0
Ρ ρ Rho U+03A1 U+03C1
Σ σ/ς Sigma U+03A3 U+03C3/U+03C2
Τ τ Tau U+03A4 U+03C4
Υ υ Upsilon U+03A5 U+03C5
Φ φ Phi U+03A6 U+03C6
Χ χ Chi U+03A7 U+03C7
Ψ ψ Psi U+03A8 U+03C8
Ω ω Omega U+03A9 U+03C9

Final Sigma

Greek lowercase sigma has two forms: medial sigma (σ, U+03C3) used within words, and final sigma (ς, U+03C2) used at the end of words. Unicode encodes these as separate characters. Case conversion must account for this:

# Python handles final sigma correctly in case folding
word = "\u03BB\u03CC\u03B3\u03BF\u03C2"  # λόγος
print(word.upper())   # ΛΟΓΟΣ — both sigmas become Σ
print(word.lower())   # λόγος — final sigma preserved
print(word.casefold())  # λόγοσ — casefold uses medial sigma (for comparison)

Unicode Blocks for Greek

Block Range Characters Purpose
Greek and Coptic U+0370–U+03FF 135 Modern Greek letters + Coptic legacy
Greek Extended U+1F00–U+1FFF 233 Polytonic Greek (ancient accents)
Coptic U+2C80–U+2CFF 123 Dedicated Coptic characters
Coptic Epact Numbers U+102E0–U+102FF 28 Coptic calendar numbers

Greek and Coptic Block (U+0370–U+03FF)

This primary block contains:

  • 24 modern Greek uppercase and lowercase letters
  • Accented letters for monotonic Greek (ά, έ, ή, ί, ό, ύ, ώ)
  • Diacritics: tonos (accent), dialytika (dieresis)
  • The final sigma (ς)
  • Archaic letters: digamma (Ϝ), koppa (Ϟ), sampi (Ϡ), stigma (Ϛ)
  • Coptic letters that were historically unified with Greek (e.g., U+03E2 Ϣ)

Greek Extended Block (U+1F00–U+1FFF)

This block supports polytonic Greek — the traditional accent system used in Ancient Greek and in formal Greek writing before the 1982 reform. Polytonic Greek uses three accent marks, two breathing marks, and the iota subscript:

Diacritic Name Example Purpose
´ Oxia (acute) ά Rising pitch
` Varia (grave) Falling pitch
˜ Perispomeni (circumflex) Rising-falling pitch
ʽ Dasia (rough breathing) Initial /h/ sound
ʼ Psili (smooth breathing) No initial /h/
ͅ Ypogegrammeni (iota subscript) Historical diphthong

The Greek Extended block provides precomposed characters for all combinations of these diacritics on vowels:

U+1F00  ἀ  GREEK SMALL LETTER ALPHA WITH PSILI
U+1F01  ἁ  GREEK SMALL LETTER ALPHA WITH DASIA
U+1F04  ἄ  GREEK SMALL LETTER ALPHA WITH PSILI AND OXIA
U+1F05  ἅ  GREEK SMALL LETTER ALPHA WITH DASIA AND OXIA
U+1F80  ᾀ  GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAMMENI
U+1F86  ᾆ  GREEK SMALL LETTER ALPHA WITH PSILI AND PERISPOMENI AND YPOGEGRAMMENI

Greek and Coptic: Why One Block?

When Unicode was first designed, Coptic characters were "unified" with Greek — Coptic letters that looked similar to Greek letters were given the same code points. This was a practical decision but created problems:

  1. Coptic and Greek are different scripts used by different communities
  2. Font selection broke — a Coptic text would render with Greek fonts
  3. Sorting and collation rules differ between the two scripts

Unicode 4.1 (2005) resolved this by adding a dedicated Coptic block (U+2C80–U+2CFF) with separate code points for all Coptic letters. The Coptic letters remaining in the Greek and Coptic block are kept for backward compatibility but are considered deprecated for Coptic use.

What is Coptic?

Coptic is the latest stage of the ancient Egyptian language, written with a script derived from the Greek alphabet plus six or seven additional letters from Demotic Egyptian. Coptic ceased to be a spoken vernacular language around the 17th century but remains the liturgical language of the Coptic Orthodox Church, used by approximately 15–20 million Coptic Christians in Egypt.

# Coptic-specific letters (not in Greek)
U+2C80  Ⲁ  COPTIC CAPITAL LETTER ALFA
U+2C81  ⲁ  COPTIC SMALL LETTER ALFA
U+2CA0  Ⲡ  COPTIC CAPITAL LETTER PI
U+2CA2  Ⲣ  COPTIC CAPITAL LETTER RO
U+2CB6  Ⳇ  COPTIC CAPITAL LETTER CRYPTOGRAMMIC SHEI

Greek in Mathematics and Science

Greek letters are the lingua franca of mathematical and scientific notation. Unicode provides these characters in multiple contexts:

From the Greek Block (Plain Text)

These are the standard Greek letters used in running text:

Symbol Code Point Common Use
α U+03B1 Angles, alpha particles, significance level
β U+03B2 Beta coefficients, beta particles
γ U+03B3 Gamma rays, Euler–Mascheroni constant
δ U+03B4 Small changes (calculus), Kronecker delta
ε U+03B5 Arbitrarily small quantities (analysis)
θ U+03B8 Angles (trigonometry)
λ U+03BB Wavelength, lambda calculus, eigenvalues
μ U+03BC Micro- prefix, mean (statistics)
π U+03C0 Pi (3.14159...)
σ U+03C3 Standard deviation, summation (upper: Σ)
φ U+03C6 Golden ratio, phase angle, Euler's totient
ω U+03C9 Angular frequency
Δ U+0394 Change/difference
Σ U+03A3 Summation
Π U+03A0 Product
Ω U+03A9 Ohm (also U+2126 OHM SIGN for compatibility)

Mathematical Alphanumeric Symbols

For mathematical typography that requires distinct styles, Unicode provides styled variants in the Mathematical Alphanumeric Symbols block (U+1D400–U+1D7FF):

Style Example Range
Bold 𝛂 𝛃 𝛄 U+1D6C2–U+1D6DB
Italic 𝛼 𝛽 𝛾 U+1D6FC–U+1D715
Bold Italic 𝜶 𝜷 𝜸 U+1D736–U+1D74F

These are used in formal mathematical typesetting to distinguish between different uses of the same letter.

Confusable Characters

Greek letters are a major source of homoglyph attacks because many look identical to Latin letters:

Greek Latin Identical?
Α (U+0391) A (U+0041) Visually identical
Β (U+0392) B (U+0042) Visually identical
Ε (U+0395) E (U+0045) Visually identical
Η (U+0397) H (U+0048) Visually identical
Ι (U+0399) I (U+0049) Visually identical
Κ (U+039A) K (U+004B) Visually identical
Μ (U+039C) M (U+004D) Visually identical
Ν (U+039D) N (U+004E) Visually identical
Ο (U+039F) O (U+004F) Visually identical
Ρ (U+03A1) P (U+0050) Visually identical
Τ (U+03A4) T (U+0054) Visually identical
Χ (U+03A7) X (U+0058) Visually identical
ο (U+03BF) o (U+006F) Visually identical
ν (U+03BD) v (U+0076) Very similar

This is why the Unicode Consortium publishes the confusables.txt file and why IDNA (Internationalized Domain Names) restricts mixing Greek and Latin characters in the same domain label.

# Detecting mixed scripts (potential homoglyph attack)
import unicodedata

def get_script(char: str) -> str:
    # Simplified — in practice use the Unicode Script property
    cp = ord(char)
    if 0x0370 <= cp <= 0x03FF or 0x1F00 <= cp <= 0x1FFF:
        return "Greek"
    elif 0x0041 <= cp <= 0x024F:
        return "Latin"
    return "Other"

text = "\u0391pple"  # Greek Alpha + "pple"
scripts = {get_script(c) for c in text if c.isalpha()}
if len(scripts) > 1:
    print(f"Mixed scripts detected: {scripts}")
    # Mixed scripts detected: {'Greek', 'Latin'}

Working with Greek Text in Code

Python

import unicodedata

# Modern Greek (monotonic)
text = "\u039A\u03B1\u03BB\u03B7\u03BC\u03AD\u03C1\u03B1"  # Καλημέρα (Good morning)
print(text.upper())  # ΚΑΛΗΜΕΡΑ
print(text.lower())  # καλημέρα

# Check for Greek script
for ch in text:
    print(f"U+{ord(ch):04X} {unicodedata.name(ch)}")

# Ancient Greek (polytonic)
ancient = "\u1F08\u03BD\u03B4\u03C1\u03CE\u03C0\u03BF\u03C5"  # Ἀνδρώπου

JavaScript

// Greek regex matching
const greekPattern = /\p{Script=Greek}/u;
const text = "\u039A\u03B1\u03BB\u03B7\u03BC\u03AD\u03C1\u03B1";
console.log(greekPattern.test(text)); // true

// Normalize polytonic to monotonic (approximate)
const polytonic = "\u1F08\u03BD\u03B4\u03C1\u03CE\u03C0\u03BF\u03C5";
const nfd = polytonic.normalize("NFD");
// Remove combining marks except tonos
const monotonic = nfd.replace(/[\u0300\u0301\u0342\u0313\u0314\u0345]/g, "");

Summary

Greek is far more than a modern language script — it is a cornerstone of global scientific and mathematical notation, the ancestor of Latin and Cyrillic, and a writing system with nearly three millennia of continuous history. Key takeaways for developers:

  1. Greek and Coptic share a Unicode block but are separate scripts — use the dedicated Coptic block (U+2C80–U+2CFF) for Coptic text
  2. Final sigma (ς, U+03C2) must be handled correctly in case conversion and text processing
  3. Polytonic Greek uses the Greek Extended block (U+1F00–U+1FFF) with complex combinations of breathing marks and accents
  4. Greek–Latin confusables are a security concern for domain names, usernames, and any mixed-script context
  5. Mathematical Greek uses standard Greek code points in plain text; use the Mathematical Alphanumeric Symbols block only for styled variants
  6. Normalize polytonic text carefully — NFC and NFD produce different code point sequences that must be handled consistently

Mehr in Script Stories