🧱 Block Explorer

Latin-1 Supplement Block

The Latin-1 Supplement block (U+0080–U+00FF) extends ASCII with accented Latin characters for Western European languages and matches the original ISO 8859-1 encoding. This guide explores the characters in the Latin-1 Supplement, their linguistic uses, and why this block is important for European language support.

·

The Latin-1 Supplement block (U+0080–U+00FF) extends Basic Latin with 128 additional characters covering Western European languages and a set of widely-used symbols. Together, the Basic Latin block and Latin-1 Supplement form the first 256 Unicode code points — a range that maps exactly to the ISO 8859-1 (Latin-1) character encoding, cementing the historical bridge between 8-bit encodings and Unicode.

Origins: ISO 8859-1 and the 8-Bit Era

When personal computers spread through the 1980s, ASCII's 128 characters were insufficient for European languages. ISO 8859-1, published in 1987, filled code points 128–255 with accented Latin characters and symbols needed for Western European languages including French, German, Spanish, Portuguese, Swedish, Norwegian, Danish, Dutch, and Finnish.

Unicode's decision to mirror ISO 8859-1 exactly in U+0080–U+00FF guaranteed that any Latin-1 encoded byte can be interpreted as the Unicode code point with the same numeric value. This equivalence made migration from 8-bit Western European encodings to Unicode straightforward — a critical factor in Unicode's adoption.

Block Layout

Range Category Contents
U+0080–U+009F C1 control characters 32 non-printable control codes
U+00A0–U+00BF Symbols and punctuation Non-breaking space, currency, fractions, special punctuation
U+00C0–U+00D6 Uppercase letters À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö
U+00D7 Symbol Multiplication sign ×
U+00D8–U+00DE Uppercase letters Ø Ù Ú Û Ü Ý Þ
U+00DF–U+00F6 Lowercase letters ß à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö
U+00F7 Symbol Division sign ÷
U+00F8–U+00FF Lowercase letters ø ù ú û ü ý þ ÿ

C1 Control Characters (U+0080–U+009F)

These 32 code points are the "C1 control characters," a second set of control codes standardized in ISO 6429. In legacy systems they had specific terminal-control meanings, but in modern Unicode contexts they appear almost exclusively in ISO 8859-1 misinterpretations: Windows-1252 (CP1252) assigns printable characters to most of these positions, so documents labeled as Latin-1 but actually encoded in Windows-1252 will display characters like (U+20AC), (U+201E), and (U+2026) in place of C1 controls.

The notable C1 characters include U+0085 NEXT LINE (NEL), which Unicode formally recognizes as a line-ending character alongside LF and CRLF.

Widely-Used Symbols

Spacing and Punctuation

  • U+00A0 NO-BREAK SPACE — visually identical to a regular space but prevents line wrapping; commonly needed in "10 km", "Mr. Smith"
  • U+00AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK « — used as opening quotation mark in French, Russian, and many other languages
  • U+00BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK » — corresponding closing guillemet
  • U+00AD SOFT HYPHEN — invisible hint to a line-breaking algorithm that a hyphen may be inserted here
  • U+00A7 SECTION SIGN § — marks sections in legal documents, statutes, and academic citations
  • U+00A9 COPYRIGHT SIGN © — copyright notice
  • U+00AE REGISTERED SIGN ® — registered trademark
  • U+00B6 PILCROW SIGN — paragraph mark, used in word processors to show paragraph breaks
  • U+00B0 DEGREE SIGN ° — temperatures, angles, geographic coordinates
  • U+00B5 MICRO SIGN µ — SI prefix for micro (10⁻⁶); note that U+03BC GREEK SMALL LETTER MU is the preferred character for the micro prefix in technical contexts

Mathematical Symbols

  • U+00B1 PLUS-MINUS SIGN ± — tolerance ranges, measurement uncertainty
  • U+00D7 MULTIPLICATION SIGN × — clearly distinct from the letter x; used in dimensions ("1920×1080") and math
  • U+00F7 DIVISION SIGN ÷ — arithmetic division
  • U+00BC, U+00BD, U+00BE — precomposed fractions ¼ ½ ¾
  • U+00B2, U+00B3 — superscript digits ² ³ (x², km³)

Currency

  • U+00A3 POUND SIGN £ — British pound sterling
  • U+00A5 YEN SIGN ¥ — Japanese yen and Chinese yuan
  • U+00A2 CENT SIGN ¢ — US cent
  • U+00A4 CURRENCY SIGN ¤ — generic currency placeholder

Accented Latin Letters

The most-used characters in this block are the accented letters for European languages:

Language Characters Used
French à â ç é è ê ë î ï ô ù û ü ÿ æ œ (œ is in Latin Extended-A)
German ä ö ü ß Ä Ö Ü
Spanish á é í ó ú ñ ü
Portuguese ã õ â ê ô á é í ó ú à ç
Swedish/Norwegian/Danish å ø æ

The character U+00DF LATIN SMALL LETTER SHARP S ß (Eszett) deserves special note: it is the only character in Basic Latin or Latin-1 Supplement without a direct uppercase equivalent in the original ISO 8859-1 standard. German orthography traditionally substituted "SS" when capitalizing. Unicode 5.1 (2008) added U+1E9E LATIN CAPITAL LETTER SHARP S , and modern German orthographic reform permits its use.

Encoding in UTF-8

Unlike Basic Latin, Latin-1 Supplement characters require two bytes in UTF-8. For example: - U+00E9 LATIN SMALL LETTER E WITH ACUTE é encodes as 0xC3 0xA9 - U+00A9 COPYRIGHT SIGN © encodes as 0xC2 0xA9

The pattern for U+0080–U+00BF is 0xC2 0x800xC2 0xBF. For U+00C0–U+00FF the pattern is 0xC3 0x800xC3 0xBF. This two-byte encoding means a Latin-1 byte stream and the equivalent UTF-8 stream will differ in length whenever non-ASCII characters appear, which is a common source of encoding bugs when code assumes byte length equals character length.

Block Explorer içinde daha fazlası

Basic Latin (ASCII) Block

The Basic Latin block (U+0000–U+007F) is the first Unicode block and covers …

General Punctuation Block

The General Punctuation block (U+2000–U+206F) contains typographic spaces, dashes, quotation marks, and …

Mathematical Operators Block

The Mathematical Operators block (U+2200–U+22FF) contains 256 symbols covering set theory, logic, …

Arrows Block

The Arrows block (U+2190–U+21FF) contains 112 arrow characters including simple directional arrows, …

Dingbats Block

The Dingbats block (U+2700–U+27BF) was created to encode the Zapf Dingbats typeface …

Miscellaneous Symbols Block

The Miscellaneous Symbols block (U+2600–U+26FF) is one of Unicode's most eclectic, containing …

CJK Unified Ideographs Overview

The CJK Unified Ideographs block (U+4E00–U+9FFF) is one of the largest Unicode …

Hangul Block

The Hangul Syllables block (U+AC00–U+D7A3) contains 11,172 precomposed Korean syllable blocks algorithmically …

Emoji Blocks Overview

Emoji in Unicode span multiple blocks across the Supplementary Multilingual Plane, including …

Currency Symbols Block

The Currency Symbols block (U+20A0–U+20CF) contains dedicated Unicode characters for currencies that …

Box Drawing & Block Elements Blocks

The Box Drawing block (U+2500–U+257F) and Block Elements block (U+2580–U+259F) provide characters …

Enclosed Alphanumerics Block

The Enclosed Alphanumerics block (U+2460–U+24FF) contains circled numbers, parenthesized numbers and letters, …

Geometric Shapes Blocks

The Geometric Shapes block (U+25A0–U+25FF) and related blocks contain squares, circles, triangles, …

Musical Symbols Block

The Musical Symbols block (U+1D100–U+1D1FF) is a Supplementary Multilingual Plane block containing …