EBCDIC
확장 이진화 십진법 교환 부호. 문자 범위가 연속적이지 않은 IBM 메인프레임 인코딩으로, 금융 및 기업 메인프레임에서 여전히 사용됩니다.
What is EBCDIC?
EBCDIC (Extended Binary Coded Decimal Interchange Code) is an 8-bit character encoding developed by IBM for its mainframe and midrange computer systems, first introduced with the IBM System/360 in 1964. Unlike ASCII, which assigns codes in an intuitive order (digits 0–9 in sequence, letters in alphabetical order), EBCDIC uses a layout derived from punched card encoding conventions that predates modern computing.
EBCDIC is not used on personal computers, web servers, or Unix/Linux systems. However, it remains the native encoding of IBM mainframes (z/OS, z/VM), IBM AS/400 (iSeries, IBM i), and IBM midrange systems — which still process a significant portion of the world's banking transactions, airline reservations, insurance claims, and government records.
Why EBCDIC Differs So Radically from ASCII
EBCDIC's layout makes sense only in the context of punched cards. IBM's punched card system used a 12-row card, and character codes were assigned based on how many holes a card-reading machine would punch. The numeric digits 0–9 were assigned codes 0xF0–0xF9 (not 0x30–0x39 as in ASCII). The letters were split into three non-contiguous ranges:
| Range | Characters |
|---|---|
| 0x81–0x89 | a–i |
| 0x91–0x99 | j–r |
| 0xA2–0xA9 | s–z |
| 0xC1–0xC9 | A–I |
| 0xD1–0xD9 | J–R |
| 0xE2–0xE9 | S–Z |
The gaps between letter ranges are significant: 0x8A–0x8F, 0x90, 0x9A–0xA1, etc., are control characters or special characters. This means that a simple "increment by 1" loop from 'A' to 'Z' would not work in EBCDIC — the letter sequence is not contiguous.
EBCDIC Variants
There is no single EBCDIC. IBM defines over 50 EBCDIC code pages for different national characters and regions:
| Code Page | Region/Language |
|---|---|
| EBCDIC-US (37) | United States |
| EBCDIC 500 | International (ECMA-16) |
| EBCDIC 870 | Latin-2 (Central European) |
| EBCDIC 875 | Greek |
| EBCDIC 930 | Japanese (Katakana) |
| EBCDIC 935 | Simplified Chinese |
| EBCDIC 037 | Canada/Netherlands variant of US |
The existence of dozens of incompatible EBCDIC variants makes EBCDIC-to-EBCDIC conversion as problematic as EBCDIC-to-ASCII conversion.
Code Examples
# Python: EBCDIC conversion
text = 'Hello'
# EBCDIC code page 37 (US)
ebcdic = text.encode('cp037')
print(ebcdic) # b'\xc8\x85\x93\x93\x96'
print([hex(b) for b in ebcdic])
# ['0xc8', '0x85', '0x93', '0x93', '0x96']
# H=0xC8, e=0x85, l=0x93, l=0x93, o=0x96
# Decode back
ebcdic_bytes = b'\xc8\x85\x93\x93\x96'
ebcdic_bytes.decode('cp037') # 'Hello'
# ASCII byte for 'A' = 0x41; EBCDIC byte for 'A' = 0xC1
ord('A') # 65 (0x41) in ASCII/Unicode
'A'.encode('cp037')[0] # 193 (0xC1) in EBCDIC-037
# Digit difference
ord('0') # 48 (0x30) in ASCII
'0'.encode('cp037')[0] # 240 (0xF0) in EBCDIC-037
Practical EBCDIC Challenges
FTP binary vs. ASCII mode. When transferring files from a mainframe via FTP in ASCII mode, the FTP server performs EBCDIC-to-ASCII translation. In binary mode, bytes are transferred unchanged. Forgetting to use binary mode for non-text files (images, compiled programs) corrupts them; using ASCII mode for text translates correctly.
Newline convention. EBCDIC uses a single character for end-of-record: NL (0x15), which is the EBCDIC New Line character. This is different from ASCII's LF (0x0A), CR (0x0D), or CR+LF (0x0D 0x0A). File transfer and parsing tools must account for this.
Sort order. In EBCDIC, lowercase letters have lower byte values than uppercase (a=0x81 < A=0xC1), while in ASCII uppercase is lower (A=0x41 < a=0x61). An application sorted alphabetically in EBCDIC will not be sorted alphabetically if its data is processed on an ASCII system without re-sorting.
Quick Facts
| Property | Value |
|---|---|
| Full Name | Extended Binary Coded Decimal Interchange Code |
| Developed by | IBM |
| Introduced | 1964 (IBM System/360) |
| Bits per character | 8 |
| Letter arrangement | Non-contiguous (three ranges each) |
| Digit codes | 0xF0–0xF9 |
| Variants | 50+ code pages |
| Used on | IBM mainframes (z/OS), IBM i (AS/400) |
| Python codec | cp037, cp500, cp875, etc. |
Common Pitfalls
Assuming ASCII-like letter ordering. Any algorithm that iterates letters by incrementing byte values, checks alphabetical order by comparing byte values, or uses character ranges like c >= 'A' && c <= 'Z' will fail silently on EBCDIC without adaptation.
NULL byte conflicts. In EBCDIC, the NULL character is 0x00 (same as ASCII). However, a space is 0x40 in EBCDIC vs. 0x20 in ASCII. Programs that check for spaces using the ASCII byte value will treat EBCDIC spaces as something else entirely.
Modern context. Despite EBCDIC's age, mainframes processing financial transactions still use it. Developers building interfaces between cloud services and legacy mainframe systems (via MQ, FTP, or direct TCP) regularly encounter EBCDIC conversion requirements.
관련 용어
인코딩의 더 많은 용어
미국 정보 교환 표준 부호. 0~127의 128개 문자를 다루는 7비트 인코딩으로, 제어 …
Visual art created from text characters, originally limited to the 95 printable …
Binary-to-text encoding that represents binary data using 64 ASCII characters (A–Z, a–z, …
주로 대만과 홍콩에서 사용되는 번체 중국어 문자 인코딩으로, 약 13,000개의 CJK 문자를 …
KS X 1001 기반의 한국어 문자 인코딩으로, 한글 음절과 한자를 2바이트 시퀀스에 …
간체 중국어 문자 인코딩 체계: GB2312(6,763자)에서 GBK를 거쳐 GB18030으로 발전하였으며, 유니코드와 호환되는 …
IANA가 관리하는 문자 인코딩 이름의 공식 레지스트리로, HTTP Content-Type 헤더와 MIME에서 사용됩니다(예: …
서로 다른 언어권을 위한 8비트 단일 바이트 인코딩 모음. ISO 8859-1(Latin-1)은 유니코드 …
단일 바이트 ASCII/JIS 로만과 이중 바이트 JIS X 0208 한자를 결합한 일본어 …
BMP(U+0000~U+FFFF)만 지원하는 구식 고정 2바이트 인코딩. UTF-16의 전신으로 보충 문자를 표현할 수 …