Latin-1 Supplement Block
The Latin-1 Supplement block (U+0080–U+00FF) extends ASCII with accented Latin characters for Western European languages and matches the original ISO 8859-1 encoding. This guide explores the characters in the Latin-1 Supplement, their linguistic uses, and why this block is important for European language support.
The Latin-1 Supplement block (U+0080–U+00FF) extends Basic Latin with 128 additional characters covering Western European languages and a set of widely-used symbols. Together, the Basic Latin block and Latin-1 Supplement form the first 256 Unicode code points — a range that maps exactly to the ISO 8859-1 (Latin-1) character encoding, cementing the historical bridge between 8-bit encodings and Unicode.
Origins: ISO 8859-1 and the 8-Bit Era
When personal computers spread through the 1980s, ASCII's 128 characters were insufficient for European languages. ISO 8859-1, published in 1987, filled code points 128–255 with accented Latin characters and symbols needed for Western European languages including French, German, Spanish, Portuguese, Swedish, Norwegian, Danish, Dutch, and Finnish.
Unicode's decision to mirror ISO 8859-1 exactly in U+0080–U+00FF guaranteed that any Latin-1 encoded byte can be interpreted as the Unicode code point with the same numeric value. This equivalence made migration from 8-bit Western European encodings to Unicode straightforward — a critical factor in Unicode's adoption.
Block Layout
| Range | Category | Contents |
|---|---|---|
| U+0080–U+009F | C1 control characters | 32 non-printable control codes |
| U+00A0–U+00BF | Symbols and punctuation | Non-breaking space, currency, fractions, special punctuation |
| U+00C0–U+00D6 | Uppercase letters | À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö |
| U+00D7 | Symbol | Multiplication sign × |
| U+00D8–U+00DE | Uppercase letters | Ø Ù Ú Û Ü Ý Þ |
| U+00DF–U+00F6 | Lowercase letters | ß à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö |
| U+00F7 | Symbol | Division sign ÷ |
| U+00F8–U+00FF | Lowercase letters | ø ù ú û ü ý þ ÿ |
C1 Control Characters (U+0080–U+009F)
These 32 code points are the "C1 control characters," a second set of control codes standardized in ISO 6429. In legacy systems they had specific terminal-control meanings, but in modern Unicode contexts they appear almost exclusively in ISO 8859-1 misinterpretations: Windows-1252 (CP1252) assigns printable characters to most of these positions, so documents labeled as Latin-1 but actually encoded in Windows-1252 will display characters like € (U+20AC), „ (U+201E), and … (U+2026) in place of C1 controls.
The notable C1 characters include U+0085 NEXT LINE (NEL), which Unicode formally recognizes as a line-ending character alongside LF and CRLF.
Widely-Used Symbols
Spacing and Punctuation
- U+00A0 NO-BREAK SPACE — visually identical to a regular space but prevents line wrapping; commonly needed in "10 km", "Mr. Smith"
- U+00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
«— used as opening quotation mark in French, Russian, and many other languages - U+00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
»— corresponding closing guillemet - U+00AD SOFT HYPHEN — invisible hint to a line-breaking algorithm that a hyphen may be inserted here
Editorial and Legal Symbols
- U+00A7 SECTION SIGN
§— marks sections in legal documents, statutes, and academic citations - U+00A9 COPYRIGHT SIGN
©— copyright notice - U+00AE REGISTERED SIGN
®— registered trademark - U+00B6 PILCROW SIGN
¶— paragraph mark, used in word processors to show paragraph breaks - U+00B0 DEGREE SIGN
°— temperatures, angles, geographic coordinates - U+00B5 MICRO SIGN
µ— SI prefix for micro (10⁻⁶); note that U+03BC GREEK SMALL LETTER MU is the preferred character for the micro prefix in technical contexts
Mathematical Symbols
- U+00B1 PLUS-MINUS SIGN
±— tolerance ranges, measurement uncertainty - U+00D7 MULTIPLICATION SIGN
×— clearly distinct from the letter x; used in dimensions ("1920×1080") and math - U+00F7 DIVISION SIGN
÷— arithmetic division - U+00BC, U+00BD, U+00BE — precomposed fractions ¼ ½ ¾
- U+00B2, U+00B3 — superscript digits ² ³ (x², km³)
Currency
- U+00A3 POUND SIGN
£— British pound sterling - U+00A5 YEN SIGN
¥— Japanese yen and Chinese yuan - U+00A2 CENT SIGN
¢— US cent - U+00A4 CURRENCY SIGN
¤— generic currency placeholder
Accented Latin Letters
The most-used characters in this block are the accented letters for European languages:
| Language | Characters Used |
|---|---|
| French | à â ç é è ê ë î ï ô ù û ü ÿ æ œ (œ is in Latin Extended-A) |
| German | ä ö ü ß Ä Ö Ü |
| Spanish | á é í ó ú ñ ü |
| Portuguese | ã õ â ê ô á é í ó ú à ç |
| Swedish/Norwegian/Danish | å ø æ |
The character U+00DF LATIN SMALL LETTER SHARP S ß (Eszett) deserves special note: it is the only character in Basic Latin or Latin-1 Supplement without a direct uppercase equivalent in the original ISO 8859-1 standard. German orthography traditionally substituted "SS" when capitalizing. Unicode 5.1 (2008) added U+1E9E LATIN CAPITAL LETTER SHARP S ẞ, and modern German orthographic reform permits its use.
Encoding in UTF-8
Unlike Basic Latin, Latin-1 Supplement characters require two bytes in UTF-8. For example:
- U+00E9 LATIN SMALL LETTER E WITH ACUTE é encodes as 0xC3 0xA9
- U+00A9 COPYRIGHT SIGN © encodes as 0xC2 0xA9
The pattern for U+0080–U+00BF is 0xC2 0x80–0xC2 0xBF. For U+00C0–U+00FF the pattern is 0xC3 0x80–0xC3 0xBF. This two-byte encoding means a Latin-1 byte stream and the equivalent UTF-8 stream will differ in length whenever non-ASCII characters appear, which is a common source of encoding bugs when code assumes byte length equals character length.
Block Explorer में और
The Basic Latin block (U+0000–U+007F) is the first Unicode block and covers …
The General Punctuation block (U+2000–U+206F) contains typographic spaces, dashes, quotation marks, and …
The Mathematical Operators block (U+2200–U+22FF) contains 256 symbols covering set theory, logic, …
The Arrows block (U+2190–U+21FF) contains 112 arrow characters including simple directional arrows, …
The Dingbats block (U+2700–U+27BF) was created to encode the Zapf Dingbats typeface …
The Miscellaneous Symbols block (U+2600–U+26FF) is one of Unicode's most eclectic, containing …
The CJK Unified Ideographs block (U+4E00–U+9FFF) is one of the largest Unicode …
The Hangul Syllables block (U+AC00–U+D7A3) contains 11,172 precomposed Korean syllable blocks algorithmically …
Emoji in Unicode span multiple blocks across the Supplementary Multilingual Plane, including …
The Currency Symbols block (U+20A0–U+20CF) contains dedicated Unicode characters for currencies that …
The Box Drawing block (U+2500–U+257F) and Block Elements block (U+2580–U+259F) provide characters …
The Enclosed Alphanumerics block (U+2460–U+24FF) contains circled numbers, parenthesized numbers and letters, …
The Geometric Shapes block (U+25A0–U+25FF) and related blocks contain squares, circles, triangles, …
The Musical Symbols block (U+1D100–U+1D1FF) is a Supplementary Multilingual Plane block containing …