Farklı dil grupları için 8-bit tek-byte kodlama ailesi. ISO 8859-1 (Latin-1), Unicode'un ilk 256 kod noktasının temelini oluşturmuştur.

Baytların yanlış kodlama ile çözülmesinden kaynaklanan bozuk metin. Japonca terim (文字化け). Örnek: UTF-8 olarak kaydedilen 'café' Latin-1 olarak okunursa → 'cafÃ©'.

What is Karakter kodlaması?

Karakterleri dijital depolama ve iletim için byte dizilerine eşleyen sistem. Her metin dosyasının bir kodlaması vardır — soru doğru şekilde bildirilip bildirilmediğidir.

Kodlama

Windows-1252

Microsoft'un ISO 8859-1'in üst kümesi, 0x80–0x9F aralığında akıllı tırnaklar, em tire ve euro işareti ekler. En yaygın eski "Latin" kodlaması.

2021-03-22 · Updated 2024-06-28

What is Windows-1252?

Windows-1252 (also known as CP1252 or the "ANSI" code page) is an 8-bit character encoding developed by Microsoft as an extension of ISO 8859-1 (Latin-1). It adds 27 printable characters in the range 0x80–0x9F — the region that ISO 8859-1 reserves for C1 control characters — making it more useful in practice for Western European typography.

Windows-1252 became one of the most widely deployed encodings in history because it was the default code page for English and Western European editions of Windows from Windows 3.1 through Windows XP. Any text file created on a Western European Windows system and not explicitly encoded as UTF-8 is almost certainly Windows-1252.

How Windows-1252 Differs from Latin-1

ISO 8859-1 leaves the range 0x80–0x9F defined as non-printable C1 control characters. Windows-1252 repurposes this range for useful typographic characters:

Byte	Windows-1252	ISO 8859-1
0x80	€ (Euro sign)	C1 control
0x82	‚ (single low-9 quotation mark)	C1 control
0x83	ƒ (Latin small f with hook)	C1 control
0x84	„ (double low-9 quotation mark)	C1 control
0x85	… (ellipsis)	C1 control
0x86	† (dagger)	C1 control
0x87	‡ (double dagger)	C1 control
0x8C	Œ (Latin capital OE)	C1 control
0x91	' (left single quotation mark)	C1 control
0x92	' (right single quotation mark)	C1 control
0x93	" (left double quotation mark)	C1 control
0x94	" (right double quotation mark)	C1 control
0x96	– (en dash)	C1 control
0x97	— (em dash)	C1 control
0x99	™ (trademark sign)	C1 control
0x9C	œ (Latin small oe)	C1 control

The range 0xA0–0xFF is identical between Windows-1252 and ISO 8859-1.

The Browser Compatibility Problem

For over a decade, web browsers diverged from the HTTP standard on encoding. HTTP/1.1 specified that text/html; charset=ISO-8859-1 meant pure ISO 8859-1. But because so many real-world pages were served with this declaration while actually containing Windows-1252 bytes (smart quotes, em dashes, Euro signs), browsers silently treated ISO-8859-1 as Windows-1252.

This was eventually formalized: the WHATWG Encoding Standard (used by all modern browsers) defines ISO-8859-1 as an alias for windows-1252. If a page declares charset=ISO-8859-1, browsers now treat it as Windows-1252.

Code Examples

# Windows-1252 in Python
text = 'Hello \u2013 world'  # En dash
encoded = text.encode('windows-1252')
print(encoded)  # b'Hello \x96 world'

# Smart quotes
smart = '\u201cHello\u201d'  # "Hello"
encoded = smart.encode('windows-1252')
print(encoded)  # b'\x93Hello\x94'

# What Latin-1 sees for those bytes
b'\x93Hello\x94'.decode('latin-1')   # '\x93Hello\x94' — C1 controls, not quotes
b'\x93Hello\x94'.decode('cp1252')    # '"Hello"' — correct

# Detecting the classic mojibake pattern
# UTF-8 é (0xC3 0xA9) read as CP1252 → 'Ã©'
b'\xc3\xa9'.decode('cp1252')  # 'Ã©'

// Node.js: explicit Windows-1252 decoding
const iconv = require('iconv-lite');
const buf = Buffer.from([0x93, 0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x94]);
iconv.decode(buf, 'win1252');  // '"Hello"'

Quick Facts

Property	Value
Also Known As	CP1252, ANSI (Windows colloquial)
Base	ISO 8859-1 (Latin-1)
Extends	Adds 27 printable chars in 0x80–0x9F
Platform	Windows (default for English/Western European)
IANA name	windows-1252
Browser treatment	Alias for ISO-8859-1 per WHATWG
Unicode coverage	U+0000–U+00FF + 27 additional characters

Common Pitfalls

The "ANSI" misnomer. Windows users and older Microsoft documentation call Windows-1252 "ANSI," but ANSI (the American National Standards Institute) never defined this encoding. The term persists in Windows API functions like CreateFileA (ANSI) vs. CreateFileW (Wide/Unicode), referring to the system code page, which is often but not always CP1252.

Smart quotes and mojibake. Microsoft Word documents saved as plain text often contain Windows-1252 smart quotes (0x91–0x94) and em/en dashes (0x96–0x97). When these files are treated as ISO 8859-1 or UTF-8, these characters appear as garbled symbols. The ftfy Python library can automatically fix many of these mojibake patterns.

The Euro sign hazard. The Euro sign (€) is at 0x80 in Windows-1252 but at U+20AC in Unicode (encoded as 3 bytes E2 82 AC in UTF-8). A CP1252 file with a Euro sign, incorrectly parsed as UTF-8, will throw a UnicodeDecodeError or produce garbage characters.

İlgili Terimler

ISO 8859 Mojibake Karakter kodlaması

Kodlama içinde daha fazlası

ASCII

American Standard Code for Information Interchange. 128 karakteri (0–127) kapsayan 7-bit kodlama: …

ASCII Art

Visual art created from text characters, originally limited to the 95 printable …

Base64

Binary-to-text encoding that represents binary data using 64 ASCII characters (A–Z, a–z, …

Bayt sırası işareti

Byte sırasını ve kodlamayı belirtmek için bir metin akışının başına yerleştirilen U+FEFF. …

Big5

Öncelikle Tayvan ve Hong Kong'da kullanılan, yaklaşık 13.000 CJK karakteri kodlayan Geleneksel …

EBCDIC

Extended Binary Coded Decimal Interchange Code. Bitişik olmayan harf aralıklarına sahip IBM …

EUC-KR

KS X 1001'e dayanan Korece karakter kodlaması, Hangul heceleri ve Hanja'yı çift-byte …

GB2312 / GB18030

Basitleştirilmiş Çince karakter kodlama ailesi: GB2312 (6.763 karakter) GBK'ya, ardından zorunlu Unicode …

IANA karakter kümesi

IANA tarafından sürdürülen, HTTP Content-Type başlıklarında ve MIME'de kullanılan resmi karakter kodlama …

ISO 8859

Farklı dil grupları için 8-bit tek-byte kodlama ailesi. ISO 8859-1 (Latin-1), Unicode'un …

← Sözlüğe Geri Dön