Roman Numeral Symbols
Unicode includes a Number Forms block with precomposed Roman numeral characters such as Ⅰ Ⅱ Ⅲ Ⅳ, distinct from the Latin letters I, V, X, and L that are commonly used as substitutes. This guide explains the Unicode Roman numeral characters, when to use them, and provides copy-paste support.
Roman numerals have been in continuous use for over two thousand years, from the inscriptions of the Roman Republic to the copyright dates on modern films. Unicode provides two approaches to representing Roman numerals: using ordinary Latin letters (I, V, X, L, C, D, M) or using dedicated precomposed Roman numeral characters from the Number Forms block. This guide explains both approaches, catalogs every precomposed Roman numeral character in Unicode, and helps developers choose the right representation for their context.
Quick Copy-Paste Table: Precomposed Roman Numerals
| Symbol | Name | Code Point | HTML Entity | Value |
|---|---|---|---|---|
| Ⅰ | Roman Numeral One | U+2160 | Ⅰ |
1 |
| Ⅱ | Roman Numeral Two | U+2161 | Ⅱ |
2 |
| Ⅲ | Roman Numeral Three | U+2162 | Ⅲ |
3 |
| Ⅳ | Roman Numeral Four | U+2163 | Ⅳ |
4 |
| Ⅴ | Roman Numeral Five | U+2164 | Ⅴ |
5 |
| Ⅵ | Roman Numeral Six | U+2165 | Ⅵ |
6 |
| Ⅶ | Roman Numeral Seven | U+2166 | Ⅶ |
7 |
| Ⅷ | Roman Numeral Eight | U+2167 | Ⅷ |
8 |
| Ⅸ | Roman Numeral Nine | U+2168 | Ⅸ |
9 |
| Ⅹ | Roman Numeral Ten | U+2169 | Ⅹ |
10 |
| Ⅺ | Roman Numeral Eleven | U+216A | Ⅺ |
11 |
| Ⅻ | Roman Numeral Twelve | U+216B | Ⅻ |
12 |
| Ⅼ | Roman Numeral Fifty | U+216C | Ⅼ |
50 |
| Ⅽ | Roman Numeral One Hundred | U+216D | Ⅽ |
100 |
| Ⅾ | Roman Numeral Five Hundred | U+216E | Ⅾ |
500 |
| Ⅿ | Roman Numeral One Thousand | U+216F | Ⅿ |
1000 |
Lowercase Precomposed Roman Numerals
| Symbol | Name | Code Point | Value |
|---|---|---|---|
| ⅰ | Small Roman Numeral One | U+2170 | 1 |
| ⅱ | Small Roman Numeral Two | U+2171 | 2 |
| ⅲ | Small Roman Numeral Three | U+2172 | 3 |
| ⅳ | Small Roman Numeral Four | U+2173 | 4 |
| ⅴ | Small Roman Numeral Five | U+2174 | 5 |
| ⅵ | Small Roman Numeral Six | U+2175 | 6 |
| ⅶ | Small Roman Numeral Seven | U+2176 | 7 |
| ⅷ | Small Roman Numeral Eight | U+2177 | 8 |
| ⅸ | Small Roman Numeral Nine | U+2178 | 9 |
| ⅹ | Small Roman Numeral Ten | U+2179 | 10 |
| ⅺ | Small Roman Numeral Eleven | U+217A | 11 |
| ⅻ | Small Roman Numeral Twelve | U+217B | 12 |
| ⅼ | Small Roman Numeral Fifty | U+217C | 50 |
| ⅽ | Small Roman Numeral One Hundred | U+217D | 100 |
| ⅾ | Small Roman Numeral Five Hundred | U+217E | 500 |
| ⅿ | Small Roman Numeral One Thousand | U+217F | 1000 |
The Number Forms Block (U+2150–U+218F)
All precomposed Roman numerals live in the Number Forms block (U+2150–U+218F), which also contains vulgar fractions (like ½, ⅓, ¼). The block provides:
- Uppercase I–XII (U+2160–U+216B): Composite characters for 1–12
- Uppercase L, C, D, M (U+216C–U+216F): Single-value characters for 50, 100, 500, 1000
- Lowercase i–xii (U+2170–U+217B): Small versions of 1–12
- Lowercase l, c, d, m (U+217C–U+217F): Small versions of 50, 100, 500, 1000
Why 1–12 Are Special
Unicode provides precomposed characters for 1 through 12 specifically because these values appear most frequently in typographic contexts: clock faces (I–XII), book chapter numbers, outline numbering, and list items. The precomposed forms like Ⅲ (U+2162) are single characters that render as a ligature — the three vertical strokes are part of one glyph, with font-specific kerning and spacing.
For numbers beyond 12, you combine the individual characters. For example, "14" would be ⅩⅣ (U+2169 + U+2163) — two precomposed characters — or simply written as XIV using Latin letters.
Precomposed vs Latin Letters: Which to Use
The fundamental question: should you write Roman numerals using precomposed Unicode characters (Ⅳ, U+2163) or ordinary Latin letters (IV)?
Comparison
| Aspect | Precomposed (Ⅳ) | Latin Letters (IV) |
|---|---|---|
| Character count | 1 code point | 2 code points |
| Searchability | Poor (rare encoding) | Excellent |
| Font support | Variable | Universal |
| Copy-paste | May paste as single char | Predictable behavior |
| Sorting | Has numeric value | Sorts as letters |
| Screen readers | May read as "Roman numeral four" | Reads as "I V" |
| Compatibility | Older systems may not support | Works everywhere |
The Unicode Consortium's Recommendation
The Unicode Standard itself states that these characters exist primarily for compatibility with East Asian encoding standards (like JIS X 0208 and KS X 1001) that included precomposed Roman numerals for vertical text layout. The Standard recommends using ordinary Latin letters for Roman numerals in most contexts.
From the Unicode Standard, Chapter 22:
"For most purposes, it is preferable to compose the Roman numerals from sequences of the appropriate Latin letters."
When Precomposed Characters Make Sense
Despite the general recommendation, precomposed Roman numerals are useful in:
-
CJK vertical text: In Japanese, Chinese, and Korean vertical writing, a precomposed Ⅲ occupies a single character cell and rotates correctly. Writing "III" with three Latin I characters in vertical text creates three separate rotated letters.
-
Clock faces: The sequence Ⅰ through Ⅻ represents the 12 positions on an analog clock. Using precomposed characters ensures consistent glyph design.
-
Semantic markup: When you need software to recognize that a character is a Roman numeral (not a Latin letter), the precomposed form carries that semantic information in its Unicode properties.
Unicode Properties of Roman Numerals
Each precomposed Roman numeral carries a numeric value in Unicode's character database, making programmatic conversion straightforward:
import unicodedata
# Precomposed Roman numeral has numeric value
char = "\u2163" # Ⅳ
name = unicodedata.name(char) # "ROMAN NUMERAL FOUR"
value = unicodedata.numeric(char) # 4.0
category = unicodedata.category(char) # "Nl" (Number, letter)
# Latin letter "I" has NO numeric value for Roman numeral
latin_i = "I"
category_i = unicodedata.category(latin_i) # "Lu" (Letter, uppercase)
# unicodedata.numeric(latin_i) raises ValueError
The General Category for precomposed Roman numerals is Nl (Number, letter), while ordinary Latin letters used as Roman numerals have category Lu (Letter, uppercase). This distinction allows programs to identify precomposed Roman numerals programmatically.
Case Mapping
Precomposed Roman numerals support case conversion:
upper = "\u2160" # Ⅰ (uppercase)
lower = upper.lower() # ⅰ (U+2170, lowercase)
back = lower.upper() # Ⅰ (U+2160, uppercase)
# Works for all 16 pairs
roman_12 = "\u216B" # Ⅻ
roman_12_lower = roman_12.lower() # ⅻ (U+217B)
This case mapping is correctly defined in Unicode's CaseFolding.txt and SpecialCasing.txt data files.
Compatibility Decomposition
Each precomposed Roman numeral has a compatibility decomposition to its constituent Latin letters. Under NFKD (Compatibility Decomposition) or NFKC (Compatibility Composition) normalization:
| Precomposed | Decomposes To | Normalization |
|---|---|---|
| Ⅰ (U+2160) | I (U+0049) | NFKD/NFKC |
| Ⅱ (U+2161) | II (U+0049 U+0049) | NFKD/NFKC |
| Ⅲ (U+2162) | III | NFKD/NFKC |
| Ⅳ (U+2163) | IV | NFKD/NFKC |
| Ⅻ (U+216B) | XII | NFKD/NFKC |
import unicodedata
roman = "\u2162" # Ⅲ
decomposed = unicodedata.normalize("NFKD", roman)
print(decomposed) # "III" (three Latin I characters)
print(len(roman), len(decomposed)) # 1, 3
This means that NFKC/NFKD normalization will destroy the distinction between precomposed Roman numerals and Latin letters. If your application applies NFKC normalization (common in search indexing), all precomposed Roman numerals will be converted to their Latin letter equivalents.
Roman Numeral Values Beyond the Basic Set
Standard Roman numeral values and their Unicode representations:
| Value | Uppercase | Lowercase | Precomposed? |
|---|---|---|---|
| 1 | Ⅰ (U+2160) | ⅰ (U+2170) | Yes |
| 2 | Ⅱ (U+2161) | ⅱ (U+2171) | Yes |
| 3 | Ⅲ (U+2162) | ⅲ (U+2172) | Yes |
| 4 | Ⅳ (U+2163) | ⅳ (U+2173) | Yes |
| 5 | Ⅴ (U+2164) | ⅴ (U+2174) | Yes |
| 6 | Ⅵ (U+2165) | ⅵ (U+2175) | Yes |
| 7 | Ⅶ (U+2166) | ⅶ (U+2176) | Yes |
| 8 | Ⅷ (U+2167) | ⅷ (U+2177) | Yes |
| 9 | Ⅸ (U+2168) | ⅸ (U+2178) | Yes |
| 10 | Ⅹ (U+2169) | ⅹ (U+2179) | Yes |
| 11 | Ⅺ (U+216A) | ⅺ (U+217A) | Yes |
| 12 | Ⅻ (U+216B) | ⅻ (U+217B) | Yes |
| 13 | XIII | xiii | No — use Latin letters |
| 14 | XIV | xiv | No — use Latin letters |
| 50 | Ⅼ (U+216C) | ⅼ (U+217C) | Yes (single value) |
| 100 | Ⅽ (U+216D) | ⅽ (U+217D) | Yes (single value) |
| 500 | Ⅾ (U+216E) | ⅾ (U+217E) | Yes (single value) |
| 1000 | Ⅿ (U+216F) | ⅿ (U+217F) | Yes (single value) |
For composite values like 14 (XIV), 27 (XXVII), or 2024 (MMXXIV), you can either use Latin letters or combine precomposed characters:
# Using Latin letters (recommended)
year = "MMXXIV" # 2024
# Using precomposed characters (CJK/vertical text)
year_precomposed = "\u216F\u216F\u2169\u2169\u2163" # Ⅿ Ⅿ Ⅹ Ⅹ Ⅳ = MMXXIV
Apostrophic and Vinculum Notation
Classical and medieval Roman numeral notation used additional marks for large numbers that are not encoded as dedicated Unicode characters:
-
Vinculum (overline): A bar above a numeral multiplies it by 1,000. V-with-overline = 5,000. Unicode has no precomposed "Roman numeral with vinculum," so you must use combining characters: V + U+0305 (COMBINING OVERLINE) = V̅.
-
Apostrophic notation: CIↃ for 1,000, CCIↃↃ for 10,000. The reverse C character Ↄ is encoded at U+2183 (ROMAN NUMERAL REVERSED ONE HUNDRED).
# Vinculum (overline) for large Roman numerals
five_thousand = "V\u0305" # V̅ = 5,000
ten_thousand = "X\u0305" # X̅ = 10,000
fifty_thousand = "L\u0305" # L̅ = 50,000
one_million = "M\u0305" # M̅ = 1,000,000
Practical Conversion Code
def int_to_roman(num: int) -> str:
# Standard Roman numeral conversion using Latin letters
val = [1000, 900, 500, 400, 100, 90, 50, 40, 10, 9, 5, 4, 1]
syms = ["M", "CM", "D", "CD", "C", "XC", "L", "XL", "X", "IX", "V", "IV", "I"]
result = ""
for i, v in enumerate(val):
while num >= v:
result += syms[i]
num -= v
return result
def int_to_roman_unicode(num: int) -> str:
# Using precomposed Unicode characters (1-12 only)
if 1 <= num <= 12:
return chr(0x215F + num)
# Fallback to Latin letters for larger values
return int_to_roman(num)
print(int_to_roman(2024)) # "MMXXIV"
print(int_to_roman_unicode(7)) # "Ⅶ"
print(int_to_roman_unicode(14)) # "XIV" (fallback)
Key Takeaways
- Unicode provides precomposed Roman numerals Ⅰ–Ⅻ (1–12) plus L, C, D, M in both uppercase and lowercase — 32 characters total in the Number Forms block (U+2150–U+218F).
- These exist primarily for CJK compatibility (vertical text, fixed-width cells). The Unicode Consortium recommends using Latin letters (I, V, X) for most contexts.
- Precomposed characters have General Category Nl (Number, letter) and carry numeric
values accessible via
unicodedata.numeric(). - NFKC/NFKD normalization decomposes precomposed Roman numerals into Latin letters, which can break applications that depend on the distinction.
- For values 13 and above, there are no precomposed multi-value characters — combine individual characters or use Latin letters.
- The combining overline (U+0305) can be used with Latin letters to represent vinculum notation for large numbers (V̅ = 5,000).
Symbol Reference içinde daha fazlası
Unicode contains hundreds of arrow symbols spanning simple directional arrows, double arrows, …
Unicode provides multiple check mark and tick symbols ranging from the classic …
Unicode includes a rich collection of star shapes — from the simple …
Unicode contains dozens of heart symbols including the classic ♥, black and …
Unicode's Currency Symbols block and surrounding areas contain dedicated characters for over …
Unicode has dedicated blocks for mathematical operators, arrows, letterlike symbols, and alphanumeric …
Beyond the ASCII parentheses and square brackets, Unicode includes angle brackets, curly …
Unicode offers a wide variety of bullet point characters beyond the standard …
Unicode's Box Drawing block contains 128 characters for drawing lines, corners, intersections, …
Unicode includes musical note symbols such as ♩♪♫♬ in the Miscellaneous Symbols …
Unicode includes precomposed fraction characters for common fractions like ½ ¼ ¾ …
Unicode provides precomposed superscript and subscript digits and letters — such as …
Unicode contains dozens of circle symbols including filled circles, outlined circles, circles …
Unicode includes filled squares, outlined squares, small squares, medium squares, dashed squares, …
Unicode provides a comprehensive set of triangle symbols in all orientations — …
Unicode includes filled and outline diamond shapes, lozenge characters, and playing card …
Unicode provides various cross and X mark characters including the heavy ballot …
The hyphen-minus on your keyboard is just one of Unicode's many dash …
Unicode defines typographic quotation marks — curly quotes — for dozens of …
Unicode includes dedicated characters for the copyright symbol ©, registered trademark ®, …
The degree symbol ° (U+00B0) and dedicated Celsius ℃ and Fahrenheit ℉ …
Unicode's Enclosed Alphanumerics block provides circled numbers ①②③, parenthesized numbers ⑴⑵⑶, and …
Greek letters like α β γ δ π Σ Ω are widely …
The Unicode Dingbats block (U+2700–U+27BF) contains 192 decorative symbols originally from the …
Unicode includes a Playing Cards block with characters for all 52 standard …
Unicode provides characters for all six chess piece types in both white …
Unicode's Miscellaneous Symbols block includes the 12 zodiac signs ♈♉♊♋♌♍♎♏♐♑♒♓, planetary symbols, …
Unicode's Braille Patterns block (U+2800–U+28FF) encodes all 256 possible combinations of the …
Unicode's Geometric Shapes block contains 96 characters covering circles, squares, triangles, diamonds, …
The Unicode Letterlike Symbols block contains mathematical and technical symbols derived from …
Unicode's Miscellaneous Technical block contains symbols from computing, electronics, and engineering, including …
Diacritics are accent marks and other marks that attach to letters to …
Unicode defines dozens of invisible characters beyond the ordinary space, including zero-width …
Unicode includes warning and hazard symbols such as the universal caution ⚠ …
Unicode's Miscellaneous Symbols block includes sun ☀, cloud ☁, rain ☂, snow …
Unicode includes symbols for many of the world's major religions including the …
Unicode includes the traditional male ♂ and female ♀ symbols from astronomy, …
Apple's macOS uses Unicode characters for keyboard modifier keys such as ⌘ …
Unicode symbols like ▶ ◀ ► ★ ✦ ⚡ ✈ and hundreds …