エンコーディング

ASCII

米国情報交換標準符号。0〜127の128文字を扱う7ビットエンコーディングで、制御文字・数字・ラテン文字・基本記号を含みます。

· 更新日

What is ASCII?

ASCII — the American Standard Code for Information Interchange — is the foundational character encoding standard that underpins virtually all modern text processing. Finalized in 1963 and revised in 1967, ASCII defines a mapping between 128 integer values (0–127) and a specific set of characters: 33 non-printable control characters, 10 digits, 52 letters (26 uppercase, 26 lowercase), and 33 punctuation and symbol characters.

Understanding ASCII is not merely historical homework. It remains deeply embedded in protocols, file formats, programming languages, and the design of every encoding standard that came after it — including UTF-8, which was intentionally designed to be fully backward compatible with ASCII.

How ASCII Works

ASCII represents each character as a 7-bit binary number. Although modern systems store ASCII in 8-bit bytes, the high bit is always 0, leaving 128 usable positions. The layout was carefully designed:

  • 0–31: Non-printable control characters (NUL, TAB, LF, CR, ESC, etc.)
  • 32: Space
  • 33–47: Punctuation (!, ", #, $, %, &, ', (, ), *, +, ,, -, ., /)
  • 48–57: Digits 09
  • 65–90: Uppercase AZ
  • 97–122: Lowercase az

One clever design choice: uppercase and lowercase letters differ by exactly one bit (bit 5). A is 65 (0b01000001), a is 97 (0b01100001). This made case-insensitive comparisons trivial in early hardware by simply masking or setting a single bit.

Code Examples

# Python: ASCII value of a character
ord('A')   # 65
ord('a')   # 97
ord('0')   # 48
ord('\n')  # 10  (newline / LF)

# Character from ASCII value
chr(65)    # 'A'
chr(97)    # 'a'

# Check if a string is pure ASCII
'Hello'.isascii()       # True
'Héllo'.isascii()       # False
'Hello\n'.isascii()     # True  (control chars count)
// JavaScript
'A'.charCodeAt(0);   // 65
String.fromCharCode(65);  // 'A'

// Check ASCII-safe range
[...'Hello'].every(c => c.charCodeAt(0) < 128);  // true

Quick Facts

Property Value
Full Name American Standard Code for Information Interchange
Year 1963 (finalized), 1967 (revised)
Bits per character 7
Total characters 128 (0–127)
Printable characters 95
Control characters 33
Standard body ASA (now ANSI)
Modern relevance Subset of Unicode, UTF-8, Latin-1, Windows-1252

Common Pitfalls

Confusing ASCII with Latin-1 or Windows-1252. ASCII ends at code point 127. The characters in the 128–255 range (é, ü, ñ, etc.) are NOT ASCII — they belong to extended encodings like ISO 8859-1 or Windows-1252. Many developers incorrectly say "ASCII" when they mean one of these extended encodings.

Assuming "ASCII text" is safe everywhere. While ASCII characters have identical byte values in UTF-8, UTF-16, and UTF-32, the surrounding binary framing differs. An ASCII file opened as UTF-16 (which uses 2 bytes per character and may have a BOM) will produce garbage.

Forgetting that control characters are ASCII. Tab (9), newline (10), carriage return (13), null (0), escape (27) — these are all ASCII values. The presence of control characters does not mean a file is "not ASCII."

ASCII in the Unicode Ecosystem

Unicode's first 128 code points (U+0000 to U+007F) are identical to ASCII. This was a deliberate choice to ensure backward compatibility. Every ASCII document is automatically valid UTF-8 without any changes to the byte values. This compatibility is one of the key reasons UTF-8 became the dominant web encoding.

The ASCII control characters also survive in Unicode, though several (like NUL, U+0000) have special handling in software. The printable ASCII range (U+0020–U+007E) is sometimes called the Basic Latin block in Unicode terminology.

関連用語

エンコーディング のその他の用語

ASCII Art

Visual art created from text characters, originally limited to the 95 printable …

Base64

Binary-to-text encoding that represents binary data using 64 ASCII characters (A–Z, a–z, …

Big5

主に台湾と香港で使われる繁体字中国語文字エンコーディングで、約13,000のCJK文字をエンコードします。

EBCDIC

拡張二進化十進数コード。文字範囲が連続していないIBMメインフレームエンコーディングで、金融・企業メインフレームで今も使われています。

EUC-KR

KS X 1001に基づく韓国語文字エンコーディングで、ハングル音節と漢字を2バイトシーケンスにマッピングします。

GB2312 / GB18030

簡体字中国語文字エンコーディングファミリー:GB2312(6,763文字)がGBKを経てGB18030へと発展し、Unicodeと互換性のある中国の国家標準となっています。

IANA 文字セット

IANAが管理する文字エンコーディング名の公式レジストリで、HTTP Content-TypeヘッダーとMIMEで使われます(例:charset=utf-8)。

ISO 8859

異なる言語グループ向けの8ビット1バイトエンコーディングファミリー。ISO 8859-1(Latin-1)はUnicodeの最初の256コードポイントの基礎となりました。

Shift JIS

1バイトのASCII/JISローマ字と2バイトのJIS X 0208漢字を組み合わせた日本語文字エンコーディング。レガシーな日本語システムで今も使われています。

UCS-2

BMP(U+0000〜U+FFFF)のみをカバーする廃止済みの固定2バイトエンコーディング。UTF-16の前身で、補助文字を表現できません。