American Standard Code for Information Interchange. 7-битная кодировка, охватывающая 128 символов (0–127): управляющие символы, цифры, латинские буквы и базовые символы.

What is Кодировка символов?

Система сопоставления символов с байтовыми последовательностями для цифрового хранения и передачи. Каждый текстовый файл имеет кодировку — вопрос в том, правильно ли она объявлена.

What is Windows-1252?

Расширение ISO 8859-1 от Microsoft, добавляющее типографские кавычки, длинное тире и знак евро в диапазоне 0x80–0x9F. Самая распространённая устаревшая «латинская» кодировка.

Кодировка

ISO 8859

Семейство 8-битных однобайтовых кодировок для разных языковых групп. ISO 8859-1 (Latin-1) послужила основой для первых 256 code points Unicode.

2021-03-10 · Updated 2024-09-12

What is ISO 8859?

ISO 8859 is a family of 15 8-bit single-byte character encoding standards published by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). Each standard in the family covers a specific language group or region, extending the 128-character ASCII base into the upper 128 positions (0x80–0xFF) with characters needed for that region's scripts and languages.

The ISO 8859 family was the dominant encoding infrastructure for non-ASCII text on the internet and personal computers throughout the 1980s and 1990s. Even today, understanding ISO 8859 is essential for working with legacy data, email systems, and pre-Unicode content.

The Family Members

Standard	Name	Languages Covered
ISO 8859-1	Latin-1	Western European (French, German, Spanish, Portuguese, Italian)
ISO 8859-2	Latin-2	Central European (Czech, Polish, Hungarian, Croatian)
ISO 8859-3	Latin-3	Southern European (Turkish, Maltese, Esperanto)
ISO 8859-4	Latin-4	Northern European (Estonian, Latvian, Lithuanian)
ISO 8859-5	Cyrillic	Russian, Bulgarian, Serbian, Macedonian
ISO 8859-6	Arabic	Arabic
ISO 8859-7	Greek	Modern Greek
ISO 8859-8	Hebrew	Hebrew
ISO 8859-9	Latin-5	Turkish (Latin-1 variant)
ISO 8859-10	Latin-6	Nordic languages
ISO 8859-11	Thai	Thai (essentially TIS 620)
ISO 8859-13	Latin-7	Baltic languages
ISO 8859-14	Latin-8	Celtic languages (Irish, Welsh)
ISO 8859-15	Latin-9	Western European + Euro sign
ISO 8859-16	Latin-10	South-Eastern European

Note: ISO 8859-12 was proposed for Devanagari but never finalized.

How ISO 8859 Works

Every member of the family shares the same structure:

0x00–0x1F: C0 control characters (identical to ASCII)
0x20–0x7E: Printable ASCII characters (identical across all members)
0x7F: DEL control character
0x80–0x9F: C1 control characters (defined but rarely used in practice)
0xA0–0xFF: Region-specific printable characters

The region-specific characters in 0xA0–0xFF are what differ between standards. For example, byte 0xE9 means:

ISO 8859-1: é (Latin small letter e with acute)
ISO 8859-5: щ (Cyrillic small letter shcha)
ISO 8859-7: ι (Greek small letter iota with tonos, in some positions)

ISO 8859-1 and Its Importance

ISO 8859-1 (Latin-1) is the most widely used family member. It covers the characters needed for Western European languages and was adopted as:

The default encoding of HTTP/1.0 (text/html; charset=ISO-8859-1)
The lower 256 code points of Unicode (U+0000–U+00FF map exactly to Latin-1)
The basis for Windows-1252

This Unicode correspondence means that converting a Latin-1 string to Unicode is trivial: each byte value directly becomes the Unicode code point.

# ISO 8859-1 to Unicode: byte values are identical to code points
b'\xe9'.decode('iso-8859-1')    # 'é' — U+00E9
b'\xe9'.decode('latin-1')       # same (latin-1 is an alias)
b'\xe9'.decode('utf-8')         # raises UnicodeDecodeError!

# The difference between Latin-1 and Windows-1252
b'\x80'.decode('iso-8859-1')    # '\x80' — a C1 control character
b'\x80'.decode('windows-1252')  # '€' — Euro sign

ISO 8859-15: Latin-9

ISO 8859-15 is a revision of Latin-1 that replaced 8 rarely-used characters with more useful ones, most notably adding the Euro sign (€) at 0xA4. Latin-1 was defined in 1987, before the Euro was introduced in 1999. Latin-9 also added characters for French (Œ, œ) and Finnish (Š, š, Ž, ž).

Despite being technically superior, ISO 8859-15 saw limited adoption — most systems had already standardized on Latin-1 or migrated to UTF-8.

Quick Facts

Property	Value
Standards body	ISO/IEC JTC 1
Number of parts	15 (no ISO 8859-12)
Bytes per character	1 (single-byte)
Characters per standard	256 (191–192 printable)
ASCII compatible	Yes (0x00–0x7F identical)
Unicode of Latin-1	U+0000–U+00FF exactly
Status	Legacy — superseded by Unicode/UTF-8

Common Pitfalls

Confusing Latin-1 with Windows-1252. Windows-1252 adds printable characters in 0x80–0x9F (the C1 control range of Latin-1), including the Euro sign, smart quotes, and em-dashes. Many web browsers historically treated ISO-8859-1 declarations as windows-1252, creating a widespread discrepancy between declared and actual encoding.

Assuming all European text is Latin-1. Polish (ISO 8859-2), Turkish (ISO 8859-9), and Greek (ISO 8859-7) require different standards. A Polish document claiming charset=iso-8859-1 will display ą, ę, ó as wrong characters.

Multi-byte East Asian languages. ISO 8859 standards are single-byte encodings and cannot represent Chinese, Japanese, or Korean characters, which require multi-byte encodings like Shift-JIS, GB2312, or Big5.

Связанные термины

ASCII Кодировка символов Windows-1252

Ещё в Кодировка

ASCII

American Standard Code for Information Interchange. 7-битная кодировка, охватывающая 128 символов (0–127): …

ASCII Art

Visual art created from text characters, originally limited to the 95 printable …

Base64

Binary-to-text encoding that represents binary data using 64 ASCII characters (A–Z, a–z, …

Big5

Кодировка традиционного китайского, используемая в основном на Тайване и в Гонконге, кодирующая …

EBCDIC

Extended Binary Coded Decimal Interchange Code. Кодировка мейнфреймов IBM с непоследовательными диапазонами …

EUC-KR

Корейская кодировка на основе KS X 1001, отображающая слоги хангыля и ханча …

GB2312 / GB18030

Семейство кодировок упрощённого китайского: GB2312 (6763 символа) эволюционировала в GBK, затем в …

Shift JIS

Японская кодировка, сочетающая однобайтовый ASCII/JIS Roman с двухбайтовыми кандзи JIS X 0208. …

UCS-2

Устаревшая фиксированная 2-байтовая кодировка, охватывающая только BMP (U+0000–U+FFFF). Предшественник UTF-16, не способный …

UTF-16

Многобайтовая кодировка Unicode, использующая 2 или 4 байта (1 или 2 code …

← Вернуться к глоссарию