ISO 8859
Família de codificações de byte único de 8 bits para diferentes grupos de idiomas. ISO 8859-1 (Latin-1) foi a base dos primeiros 256 pontos de código do Unicode.
What is ISO 8859?
ISO 8859 is a family of 15 8-bit single-byte character encoding standards published by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). Each standard in the family covers a specific language group or region, extending the 128-character ASCII base into the upper 128 positions (0x80–0xFF) with characters needed for that region's scripts and languages.
The ISO 8859 family was the dominant encoding infrastructure for non-ASCII text on the internet and personal computers throughout the 1980s and 1990s. Even today, understanding ISO 8859 is essential for working with legacy data, email systems, and pre-Unicode content.
The Family Members
| Standard | Name | Languages Covered |
|---|---|---|
| ISO 8859-1 | Latin-1 | Western European (French, German, Spanish, Portuguese, Italian) |
| ISO 8859-2 | Latin-2 | Central European (Czech, Polish, Hungarian, Croatian) |
| ISO 8859-3 | Latin-3 | Southern European (Turkish, Maltese, Esperanto) |
| ISO 8859-4 | Latin-4 | Northern European (Estonian, Latvian, Lithuanian) |
| ISO 8859-5 | Cyrillic | Russian, Bulgarian, Serbian, Macedonian |
| ISO 8859-6 | Arabic | Arabic |
| ISO 8859-7 | Greek | Modern Greek |
| ISO 8859-8 | Hebrew | Hebrew |
| ISO 8859-9 | Latin-5 | Turkish (Latin-1 variant) |
| ISO 8859-10 | Latin-6 | Nordic languages |
| ISO 8859-11 | Thai | Thai (essentially TIS 620) |
| ISO 8859-13 | Latin-7 | Baltic languages |
| ISO 8859-14 | Latin-8 | Celtic languages (Irish, Welsh) |
| ISO 8859-15 | Latin-9 | Western European + Euro sign |
| ISO 8859-16 | Latin-10 | South-Eastern European |
Note: ISO 8859-12 was proposed for Devanagari but never finalized.
How ISO 8859 Works
Every member of the family shares the same structure:
- 0x00–0x1F: C0 control characters (identical to ASCII)
- 0x20–0x7E: Printable ASCII characters (identical across all members)
- 0x7F: DEL control character
- 0x80–0x9F: C1 control characters (defined but rarely used in practice)
- 0xA0–0xFF: Region-specific printable characters
The region-specific characters in 0xA0–0xFF are what differ between standards. For example, byte 0xE9 means:
- ISO 8859-1: é (Latin small letter e with acute)
- ISO 8859-5: щ (Cyrillic small letter shcha)
- ISO 8859-7: ι (Greek small letter iota with tonos, in some positions)
ISO 8859-1 and Its Importance
ISO 8859-1 (Latin-1) is the most widely used family member. It covers the characters needed for Western European languages and was adopted as:
- The default encoding of HTTP/1.0 (
text/html; charset=ISO-8859-1) - The lower 256 code points of Unicode (U+0000–U+00FF map exactly to Latin-1)
- The basis for Windows-1252
This Unicode correspondence means that converting a Latin-1 string to Unicode is trivial: each byte value directly becomes the Unicode code point.
# ISO 8859-1 to Unicode: byte values are identical to code points
b'\xe9'.decode('iso-8859-1') # 'é' — U+00E9
b'\xe9'.decode('latin-1') # same (latin-1 is an alias)
b'\xe9'.decode('utf-8') # raises UnicodeDecodeError!
# The difference between Latin-1 and Windows-1252
b'\x80'.decode('iso-8859-1') # '\x80' — a C1 control character
b'\x80'.decode('windows-1252') # '€' — Euro sign
ISO 8859-15: Latin-9
ISO 8859-15 is a revision of Latin-1 that replaced 8 rarely-used characters with more useful ones, most notably adding the Euro sign (€) at 0xA4. Latin-1 was defined in 1987, before the Euro was introduced in 1999. Latin-9 also added characters for French (Œ, œ) and Finnish (Š, š, Ž, ž).
Despite being technically superior, ISO 8859-15 saw limited adoption — most systems had already standardized on Latin-1 or migrated to UTF-8.
Quick Facts
| Property | Value |
|---|---|
| Standards body | ISO/IEC JTC 1 |
| Number of parts | 15 (no ISO 8859-12) |
| Bytes per character | 1 (single-byte) |
| Characters per standard | 256 (191–192 printable) |
| ASCII compatible | Yes (0x00–0x7F identical) |
| Unicode of Latin-1 | U+0000–U+00FF exactly |
| Status | Legacy — superseded by Unicode/UTF-8 |
Common Pitfalls
Confusing Latin-1 with Windows-1252. Windows-1252 adds printable characters in 0x80–0x9F (the C1 control range of Latin-1), including the Euro sign, smart quotes, and em-dashes. Many web browsers historically treated ISO-8859-1 declarations as windows-1252, creating a widespread discrepancy between declared and actual encoding.
Assuming all European text is Latin-1. Polish (ISO 8859-2), Turkish (ISO 8859-9), and Greek (ISO 8859-7) require different standards. A Polish document claiming charset=iso-8859-1 will display ą, ę, ó as wrong characters.
Multi-byte East Asian languages. ISO 8859 standards are single-byte encodings and cannot represent Chinese, Japanese, or Korean characters, which require multi-byte encodings like Shift-JIS, GB2312, or Big5.
Termos Relacionados
Mais em Codificação
Código Padrão Americano para o Intercâmbio de Informação. Codificação de 7 bits …
Visual art created from text characters, originally limited to the 95 printable …
Binary-to-text encoding that represents binary data using 64 ASCII characters (A–Z, a–z, …
Codificação de caracteres chineses tradicionais usada principalmente em Taiwan e Hong Kong, …
Sistema que mapeia caracteres para sequências de bytes para armazenamento e transmissão …
Registro oficial de nomes de codificações de caracteres mantido pela IANA, usado …
Código Estendido de Intercâmbio Decimal Codificado em Binário. Codificação de mainframe IBM …
Codificação de caracteres coreanos baseada em KS X 1001, mapeando sílabas Hangul …
Família de codificações de caracteres chineses simplificados: GB2312 (6.763 caracteres) evoluiu para …
U+FEFF colocado no início de um fluxo de texto para indicar a …