Windows-1252
ISO 8859-1의 Microsoft 확장판으로, 0x80~0x9F 범위에 스마트 따옴표, 줄표, 유로 기호를 추가합니다. 가장 흔한 레거시 라틴 인코딩입니다.
What is Windows-1252?
Windows-1252 (also known as CP1252 or the "ANSI" code page) is an 8-bit character encoding developed by Microsoft as an extension of ISO 8859-1 (Latin-1). It adds 27 printable characters in the range 0x80–0x9F — the region that ISO 8859-1 reserves for C1 control characters — making it more useful in practice for Western European typography.
Windows-1252 became one of the most widely deployed encodings in history because it was the default code page for English and Western European editions of Windows from Windows 3.1 through Windows XP. Any text file created on a Western European Windows system and not explicitly encoded as UTF-8 is almost certainly Windows-1252.
How Windows-1252 Differs from Latin-1
ISO 8859-1 leaves the range 0x80–0x9F defined as non-printable C1 control characters. Windows-1252 repurposes this range for useful typographic characters:
| Byte | Windows-1252 | ISO 8859-1 |
|---|---|---|
| 0x80 | € (Euro sign) | C1 control |
| 0x82 | ‚ (single low-9 quotation mark) | C1 control |
| 0x83 | ƒ (Latin small f with hook) | C1 control |
| 0x84 | „ (double low-9 quotation mark) | C1 control |
| 0x85 | … (ellipsis) | C1 control |
| 0x86 | † (dagger) | C1 control |
| 0x87 | ‡ (double dagger) | C1 control |
| 0x8C | Œ (Latin capital OE) | C1 control |
| 0x91 | ' (left single quotation mark) | C1 control |
| 0x92 | ' (right single quotation mark) | C1 control |
| 0x93 | " (left double quotation mark) | C1 control |
| 0x94 | " (right double quotation mark) | C1 control |
| 0x96 | – (en dash) | C1 control |
| 0x97 | — (em dash) | C1 control |
| 0x99 | ™ (trademark sign) | C1 control |
| 0x9C | œ (Latin small oe) | C1 control |
The range 0xA0–0xFF is identical between Windows-1252 and ISO 8859-1.
The Browser Compatibility Problem
For over a decade, web browsers diverged from the HTTP standard on encoding. HTTP/1.1 specified that text/html; charset=ISO-8859-1 meant pure ISO 8859-1. But because so many real-world pages were served with this declaration while actually containing Windows-1252 bytes (smart quotes, em dashes, Euro signs), browsers silently treated ISO-8859-1 as Windows-1252.
This was eventually formalized: the WHATWG Encoding Standard (used by all modern browsers) defines ISO-8859-1 as an alias for windows-1252. If a page declares charset=ISO-8859-1, browsers now treat it as Windows-1252.
Code Examples
# Windows-1252 in Python
text = 'Hello \u2013 world' # En dash
encoded = text.encode('windows-1252')
print(encoded) # b'Hello \x96 world'
# Smart quotes
smart = '\u201cHello\u201d' # "Hello"
encoded = smart.encode('windows-1252')
print(encoded) # b'\x93Hello\x94'
# What Latin-1 sees for those bytes
b'\x93Hello\x94'.decode('latin-1') # '\x93Hello\x94' — C1 controls, not quotes
b'\x93Hello\x94'.decode('cp1252') # '"Hello"' — correct
# Detecting the classic mojibake pattern
# UTF-8 é (0xC3 0xA9) read as CP1252 → 'é'
b'\xc3\xa9'.decode('cp1252') # 'é'
// Node.js: explicit Windows-1252 decoding
const iconv = require('iconv-lite');
const buf = Buffer.from([0x93, 0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x94]);
iconv.decode(buf, 'win1252'); // '"Hello"'
Quick Facts
| Property | Value |
|---|---|
| Also Known As | CP1252, ANSI (Windows colloquial) |
| Base | ISO 8859-1 (Latin-1) |
| Extends | Adds 27 printable chars in 0x80–0x9F |
| Platform | Windows (default for English/Western European) |
| IANA name | windows-1252 |
| Browser treatment | Alias for ISO-8859-1 per WHATWG |
| Unicode coverage | U+0000–U+00FF + 27 additional characters |
Common Pitfalls
The "ANSI" misnomer. Windows users and older Microsoft documentation call Windows-1252 "ANSI," but ANSI (the American National Standards Institute) never defined this encoding. The term persists in Windows API functions like CreateFileA (ANSI) vs. CreateFileW (Wide/Unicode), referring to the system code page, which is often but not always CP1252.
Smart quotes and mojibake. Microsoft Word documents saved as plain text often contain Windows-1252 smart quotes (0x91–0x94) and em/en dashes (0x96–0x97). When these files are treated as ISO 8859-1 or UTF-8, these characters appear as garbled symbols. The ftfy Python library can automatically fix many of these mojibake patterns.
The Euro sign hazard. The Euro sign (€) is at 0x80 in Windows-1252 but at U+20AC in Unicode (encoded as 3 bytes E2 82 AC in UTF-8). A CP1252 file with a Euro sign, incorrectly parsed as UTF-8, will throw a UnicodeDecodeError or produce garbage characters.
관련 용어
인코딩의 더 많은 용어
미국 정보 교환 표준 부호. 0~127의 128개 문자를 다루는 7비트 인코딩으로, 제어 …
Visual art created from text characters, originally limited to the 95 printable …
Binary-to-text encoding that represents binary data using 64 ASCII characters (A–Z, a–z, …
주로 대만과 홍콩에서 사용되는 번체 중국어 문자 인코딩으로, 약 13,000개의 CJK 문자를 …
확장 이진화 십진법 교환 부호. 문자 범위가 연속적이지 않은 IBM 메인프레임 인코딩으로, …
KS X 1001 기반의 한국어 문자 인코딩으로, 한글 음절과 한자를 2바이트 시퀀스에 …
간체 중국어 문자 인코딩 체계: GB2312(6,763자)에서 GBK를 거쳐 GB18030으로 발전하였으며, 유니코드와 호환되는 …
IANA가 관리하는 문자 인코딩 이름의 공식 레지스트리로, HTTP Content-Type 헤더와 MIME에서 사용됩니다(예: …
서로 다른 언어권을 위한 8비트 단일 바이트 인코딩 모음. ISO 8859-1(Latin-1)은 유니코드 …
단일 바이트 ASCII/JIS 로만과 이중 바이트 JIS X 0208 한자를 결합한 일본어 …