What is 文字エンコーディング?

文字をデジタル保存・送信用のバイト列にマッピングするシステム。すべてのテキストファイルにはエンコーディングがあり、正しく宣言されているかどうかが重要です。

エンコーディング

EBCDIC

Q: What is ASCII?

米国情報交換標準符号。0〜127の128文字を扱う7ビットエンコーディングで、制御文字・数字・ラテン文字・基本記号を含みます。

拡張二進化十進数コード。文字範囲が連続していないIBMメインフレームエンコーディングで、金融・企業メインフレームで今も使われています。

2021-05-17 · Updated 2024-07-22

What is EBCDIC?

EBCDIC (Extended Binary Coded Decimal Interchange Code) is an 8-bit character encoding developed by IBM for its mainframe and midrange computer systems, first introduced with the IBM System/360 in 1964. Unlike ASCII, which assigns codes in an intuitive order (digits 0–9 in sequence, letters in alphabetical order), EBCDIC uses a layout derived from punched card encoding conventions that predates modern computing.

EBCDIC is not used on personal computers, web servers, or Unix/Linux systems. However, it remains the native encoding of IBM mainframes (z/OS, z/VM), IBM AS/400 (iSeries, IBM i), and IBM midrange systems — which still process a significant portion of the world's banking transactions, airline reservations, insurance claims, and government records.

Why EBCDIC Differs So Radically from ASCII

EBCDIC's layout makes sense only in the context of punched cards. IBM's punched card system used a 12-row card, and character codes were assigned based on how many holes a card-reading machine would punch. The numeric digits 0–9 were assigned codes 0xF0–0xF9 (not 0x30–0x39 as in ASCII). The letters were split into three non-contiguous ranges:

Range	Characters
0x81–0x89	a–i
0x91–0x99	j–r
0xA2–0xA9	s–z
0xC1–0xC9	A–I
0xD1–0xD9	J–R
0xE2–0xE9	S–Z

The gaps between letter ranges are significant: 0x8A–0x8F, 0x90, 0x9A–0xA1, etc., are control characters or special characters. This means that a simple "increment by 1" loop from 'A' to 'Z' would not work in EBCDIC — the letter sequence is not contiguous.

EBCDIC Variants

There is no single EBCDIC. IBM defines over 50 EBCDIC code pages for different national characters and regions:

Code Page	Region/Language
EBCDIC-US (37)	United States
EBCDIC 500	International (ECMA-16)
EBCDIC 870	Latin-2 (Central European)
EBCDIC 875	Greek
EBCDIC 930	Japanese (Katakana)
EBCDIC 935	Simplified Chinese
EBCDIC 037	Canada/Netherlands variant of US

The existence of dozens of incompatible EBCDIC variants makes EBCDIC-to-EBCDIC conversion as problematic as EBCDIC-to-ASCII conversion.

Code Examples

# Python: EBCDIC conversion
text = 'Hello'

# EBCDIC code page 37 (US)
ebcdic = text.encode('cp037')
print(ebcdic)   # b'\xc8\x85\x93\x93\x96'
print([hex(b) for b in ebcdic])
# ['0xc8', '0x85', '0x93', '0x93', '0x96']
# H=0xC8, e=0x85, l=0x93, l=0x93, o=0x96

# Decode back
ebcdic_bytes = b'\xc8\x85\x93\x93\x96'
ebcdic_bytes.decode('cp037')  # 'Hello'

# ASCII byte for 'A' = 0x41; EBCDIC byte for 'A' = 0xC1
ord('A')                        # 65 (0x41) in ASCII/Unicode
'A'.encode('cp037')[0]          # 193 (0xC1) in EBCDIC-037

# Digit difference
ord('0')                        # 48 (0x30) in ASCII
'0'.encode('cp037')[0]          # 240 (0xF0) in EBCDIC-037

Practical EBCDIC Challenges

FTP binary vs. ASCII mode. When transferring files from a mainframe via FTP in ASCII mode, the FTP server performs EBCDIC-to-ASCII translation. In binary mode, bytes are transferred unchanged. Forgetting to use binary mode for non-text files (images, compiled programs) corrupts them; using ASCII mode for text translates correctly.

Newline convention. EBCDIC uses a single character for end-of-record: NL (0x15), which is the EBCDIC New Line character. This is different from ASCII's LF (0x0A), CR (0x0D), or CR+LF (0x0D 0x0A). File transfer and parsing tools must account for this.

Sort order. In EBCDIC, lowercase letters have lower byte values than uppercase (a=0x81 < A=0xC1), while in ASCII uppercase is lower (A=0x41 < a=0x61). An application sorted alphabetically in EBCDIC will not be sorted alphabetically if its data is processed on an ASCII system without re-sorting.

Quick Facts

Property	Value
Full Name	Extended Binary Coded Decimal Interchange Code
Developed by	IBM
Introduced	1964 (IBM System/360)
Bits per character	8
Letter arrangement	Non-contiguous (three ranges each)
Digit codes	0xF0–0xF9
Variants	50+ code pages
Used on	IBM mainframes (z/OS), IBM i (AS/400)
Python codec	cp037, cp500, cp875, etc.

Common Pitfalls

Assuming ASCII-like letter ordering. Any algorithm that iterates letters by incrementing byte values, checks alphabetical order by comparing byte values, or uses character ranges like c >= 'A' && c <= 'Z' will fail silently on EBCDIC without adaptation.

NULL byte conflicts. In EBCDIC, the NULL character is 0x00 (same as ASCII). However, a space is 0x40 in EBCDIC vs. 0x20 in ASCII. Programs that check for spaces using the ASCII byte value will treat EBCDIC spaces as something else entirely.

Modern context. Despite EBCDIC's age, mainframes processing financial transactions still use it. Developers building interfaces between cloud services and legacy mainframe systems (via MQ, FTP, or direct TCP) regularly encounter EBCDIC conversion requirements.

エンコーディングのその他の用語

ASCII

米国情報交換標準符号。0〜127の128文字を扱う7ビットエンコーディングで、制御文字・数字・ラテン文字・基本記号を含みます。

ASCII Art

Visual art created from text characters, originally limited to the 95 printable …

Base64

Binary-to-text encoding that represents binary data using 64 ASCII characters (A–Z, a–z, …

Big5

主に台湾と香港で使われる繁体字中国語文字エンコーディングで、約13,000のCJK文字をエンコードします。

EUC-KR

KS X 1001に基づく韓国語文字エンコーディングで、ハングル音節と漢字を2バイトシーケンスにマッピングします。

GB2312 / GB18030

簡体字中国語文字エンコーディングファミリー：GB2312（6,763文字）がGBKを経てGB18030へと発展し、Unicodeと互換性のある中国の国家標準となっています。

IANA 文字セット

IANAが管理する文字エンコーディング名の公式レジストリで、HTTP Content-TypeヘッダーとMIMEで使われます（例：charset=utf-8）。

ISO 8859

異なる言語グループ向けの8ビット1バイトエンコーディングファミリー。ISO 8859-1（Latin-1）はUnicodeの最初の256コードポイントの基礎となりました。

Shift JIS

1バイトのASCII/JISローマ字と2バイトのJIS X 0208漢字を組み合わせた日本語文字エンコーディング。レガシーな日本語システムで今も使われています。

UCS-2

BMP（U+0000〜U+FFFF）のみをカバーする廃止済みの固定2バイトエンコーディング。UTF-16の前身で、補助文字を表現できません。

← 用語集へ