American Standard Code for Information Interchange। 7-bit एन्कोडिंग जो 128 अक्षरों (0–127) को कवर करती है: नियंत्रण अक्षर, अंक, Latin अक्षर और मूल प्रतीक।

What is वर्ण एन्कोडिंग?

एक प्रणाली जो डिजिटल भंडारण और संचरण के लिए अक्षरों को byte sequences में मैप करती है। प्रत्येक text file की एक एन्कोडिंग होती है — सवाल यह है कि क्या यह सही घोषित की गई है।

What is Windows-1252?

Microsoft का ISO 8859-1 का superset, जो 0x80–0x9F रेंज में smart quotes, em dash और euro sign जोड़ता है। सबसे आम legacy "Latin" एन्कोडिंग।

एन्कोडिंग

ISO 8859

विभिन्न भाषा समूहों के लिए 8-bit single-byte एन्कोडिंग का परिवार। ISO 8859-1 (Latin-1) Unicode के पहले 256 code points का आधार था।

2021-03-10 · Updated 2024-09-12

What is ISO 8859?

ISO 8859 is a family of 15 8-bit single-byte character encoding standards published by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). Each standard in the family covers a specific language group or region, extending the 128-character ASCII base into the upper 128 positions (0x80–0xFF) with characters needed for that region's scripts and languages.

The ISO 8859 family was the dominant encoding infrastructure for non-ASCII text on the internet and personal computers throughout the 1980s and 1990s. Even today, understanding ISO 8859 is essential for working with legacy data, email systems, and pre-Unicode content.

The Family Members

Standard	Name	Languages Covered
ISO 8859-1	Latin-1	Western European (French, German, Spanish, Portuguese, Italian)
ISO 8859-2	Latin-2	Central European (Czech, Polish, Hungarian, Croatian)
ISO 8859-3	Latin-3	Southern European (Turkish, Maltese, Esperanto)
ISO 8859-4	Latin-4	Northern European (Estonian, Latvian, Lithuanian)
ISO 8859-5	Cyrillic	Russian, Bulgarian, Serbian, Macedonian
ISO 8859-6	Arabic	Arabic
ISO 8859-7	Greek	Modern Greek
ISO 8859-8	Hebrew	Hebrew
ISO 8859-9	Latin-5	Turkish (Latin-1 variant)
ISO 8859-10	Latin-6	Nordic languages
ISO 8859-11	Thai	Thai (essentially TIS 620)
ISO 8859-13	Latin-7	Baltic languages
ISO 8859-14	Latin-8	Celtic languages (Irish, Welsh)
ISO 8859-15	Latin-9	Western European + Euro sign
ISO 8859-16	Latin-10	South-Eastern European

Note: ISO 8859-12 was proposed for Devanagari but never finalized.

How ISO 8859 Works

Every member of the family shares the same structure:

0x00–0x1F: C0 control characters (identical to ASCII)
0x20–0x7E: Printable ASCII characters (identical across all members)
0x7F: DEL control character
0x80–0x9F: C1 control characters (defined but rarely used in practice)
0xA0–0xFF: Region-specific printable characters

The region-specific characters in 0xA0–0xFF are what differ between standards. For example, byte 0xE9 means:

ISO 8859-1: é (Latin small letter e with acute)
ISO 8859-5: щ (Cyrillic small letter shcha)
ISO 8859-7: ι (Greek small letter iota with tonos, in some positions)

ISO 8859-1 and Its Importance

ISO 8859-1 (Latin-1) is the most widely used family member. It covers the characters needed for Western European languages and was adopted as:

The default encoding of HTTP/1.0 (text/html; charset=ISO-8859-1)
The lower 256 code points of Unicode (U+0000–U+00FF map exactly to Latin-1)
The basis for Windows-1252

This Unicode correspondence means that converting a Latin-1 string to Unicode is trivial: each byte value directly becomes the Unicode code point.

# ISO 8859-1 to Unicode: byte values are identical to code points
b'\xe9'.decode('iso-8859-1')    # 'é' — U+00E9
b'\xe9'.decode('latin-1')       # same (latin-1 is an alias)
b'\xe9'.decode('utf-8')         # raises UnicodeDecodeError!

# The difference between Latin-1 and Windows-1252
b'\x80'.decode('iso-8859-1')    # '\x80' — a C1 control character
b'\x80'.decode('windows-1252')  # '€' — Euro sign

ISO 8859-15: Latin-9

ISO 8859-15 is a revision of Latin-1 that replaced 8 rarely-used characters with more useful ones, most notably adding the Euro sign (€) at 0xA4. Latin-1 was defined in 1987, before the Euro was introduced in 1999. Latin-9 also added characters for French (Œ, œ) and Finnish (Š, š, Ž, ž).

Despite being technically superior, ISO 8859-15 saw limited adoption — most systems had already standardized on Latin-1 or migrated to UTF-8.

Quick Facts

Property	Value
Standards body	ISO/IEC JTC 1
Number of parts	15 (no ISO 8859-12)
Bytes per character	1 (single-byte)
Characters per standard	256 (191–192 printable)
ASCII compatible	Yes (0x00–0x7F identical)
Unicode of Latin-1	U+0000–U+00FF exactly
Status	Legacy — superseded by Unicode/UTF-8

Common Pitfalls

Confusing Latin-1 with Windows-1252. Windows-1252 adds printable characters in 0x80–0x9F (the C1 control range of Latin-1), including the Euro sign, smart quotes, and em-dashes. Many web browsers historically treated ISO-8859-1 declarations as windows-1252, creating a widespread discrepancy between declared and actual encoding.

Assuming all European text is Latin-1. Polish (ISO 8859-2), Turkish (ISO 8859-9), and Greek (ISO 8859-7) require different standards. A Polish document claiming charset=iso-8859-1 will display ą, ę, ó as wrong characters.

Multi-byte East Asian languages. ISO 8859 standards are single-byte encodings and cannot represent Chinese, Japanese, or Korean characters, which require multi-byte encodings like Shift-JIS, GB2312, or Big5.

एन्कोडिंग में और

ASCII

American Standard Code for Information Interchange। 7-bit एन्कोडिंग जो 128 अक्षरों (0–127) …

ASCII Art

Visual art created from text characters, originally limited to the 95 printable …

Base64

Binary-to-text encoding that represents binary data using 64 ASCII characters (A–Z, a–z, …

Big5

पारंपरिक चीनी अक्षर एन्कोडिंग जो मुख्य रूप से ताइवान और हांगकांग में …

EBCDIC

Extended Binary Coded Decimal Interchange Code। IBM mainframe एन्कोडिंग जिसमें असंतत अक्षर …

EUC-KR

KS X 1001 पर आधारित कोरियाई अक्षर एन्कोडिंग, जो Hangul syllables और …

GB2312 / GB18030

सरलीकृत चीनी अक्षर एन्कोडिंग परिवार: GB2312 (6,763 अक्षर) GBK में विकसित हुआ …

IANA कैरेक्टर सेट

IANA द्वारा रखरखाव किया गया अक्षर एन्कोडिंग नामों का आधिकारिक रजिस्ट्री, HTTP …

Shift JIS

जापानी अक्षर एन्कोडिंग जो single-byte ASCII/JIS Roman को double-byte JIS X 0208 …

UCS-2

अप्रचलित निश्चित-लंबाई 2-byte एन्कोडिंग जो केवल BMP (U+0000–U+FFFF) को कवर करती है। …

← शब्दावली पर वापस जाएं