ASCII
美国信息交换标准代码。7位编码,涵盖128个字符(0–127),包括控制字符、数字、拉丁字母和基本符号。
What is ASCII?
ASCII — the American Standard Code for Information Interchange — is the foundational character encoding standard that underpins virtually all modern text processing. Finalized in 1963 and revised in 1967, ASCII defines a mapping between 128 integer values (0–127) and a specific set of characters: 33 non-printable control characters, 10 digits, 52 letters (26 uppercase, 26 lowercase), and 33 punctuation and symbol characters.
Understanding ASCII is not merely historical homework. It remains deeply embedded in protocols, file formats, programming languages, and the design of every encoding standard that came after it — including UTF-8, which was intentionally designed to be fully backward compatible with ASCII.
How ASCII Works
ASCII represents each character as a 7-bit binary number. Although modern systems store ASCII in 8-bit bytes, the high bit is always 0, leaving 128 usable positions. The layout was carefully designed:
- 0–31: Non-printable control characters (NUL, TAB, LF, CR, ESC, etc.)
- 32: Space
- 33–47: Punctuation (
!,",#,$,%,&,',(,),*,+,,,-,.,/) - 48–57: Digits
0–9 - 65–90: Uppercase
A–Z - 97–122: Lowercase
a–z
One clever design choice: uppercase and lowercase letters differ by exactly one bit (bit 5). A is 65 (0b01000001), a is 97 (0b01100001). This made case-insensitive comparisons trivial in early hardware by simply masking or setting a single bit.
Code Examples
# Python: ASCII value of a character
ord('A') # 65
ord('a') # 97
ord('0') # 48
ord('\n') # 10 (newline / LF)
# Character from ASCII value
chr(65) # 'A'
chr(97) # 'a'
# Check if a string is pure ASCII
'Hello'.isascii() # True
'Héllo'.isascii() # False
'Hello\n'.isascii() # True (control chars count)
// JavaScript
'A'.charCodeAt(0); // 65
String.fromCharCode(65); // 'A'
// Check ASCII-safe range
[...'Hello'].every(c => c.charCodeAt(0) < 128); // true
Quick Facts
| Property | Value |
|---|---|
| Full Name | American Standard Code for Information Interchange |
| Year | 1963 (finalized), 1967 (revised) |
| Bits per character | 7 |
| Total characters | 128 (0–127) |
| Printable characters | 95 |
| Control characters | 33 |
| Standard body | ASA (now ANSI) |
| Modern relevance | Subset of Unicode, UTF-8, Latin-1, Windows-1252 |
Common Pitfalls
Confusing ASCII with Latin-1 or Windows-1252. ASCII ends at code point 127. The characters in the 128–255 range (é, ü, ñ, etc.) are NOT ASCII — they belong to extended encodings like ISO 8859-1 or Windows-1252. Many developers incorrectly say "ASCII" when they mean one of these extended encodings.
Assuming "ASCII text" is safe everywhere. While ASCII characters have identical byte values in UTF-8, UTF-16, and UTF-32, the surrounding binary framing differs. An ASCII file opened as UTF-16 (which uses 2 bytes per character and may have a BOM) will produce garbage.
Forgetting that control characters are ASCII. Tab (9), newline (10), carriage return (13), null (0), escape (27) — these are all ASCII values. The presence of control characters does not mean a file is "not ASCII."
ASCII in the Unicode Ecosystem
Unicode's first 128 code points (U+0000 to U+007F) are identical to ASCII. This was a deliberate choice to ensure backward compatibility. Every ASCII document is automatically valid UTF-8 without any changes to the byte values. This compatibility is one of the key reasons UTF-8 became the dominant web encoding.
The ASCII control characters also survive in Unicode, though several (like NUL, U+0000) have special handling in software. The printable ASCII range (U+0020–U+007E) is sometimes called the Basic Latin block in Unicode terminology.
相关术语
编码 中的更多内容
Visual art created from text characters, originally limited to the 95 printable …
Binary-to-text encoding that represents binary data using 64 ASCII characters (A–Z, a–z, …
主要在台湾和香港使用的繁体中文字符编码,收录约13,000个CJK字符。
扩展二进制编码十进制交换码。IBM大型机编码,字母范围不连续,至今仍用于银行和企业大型机。
基于KS X 1001的韩语字符编码,将韩文音节和汉字映射为双字节序列。
简体中文字符编码系列:GB2312(6,763字)经GBK演化为GB18030,成为与Unicode兼容的中国强制性国家标准。
由IANA维护的字符编码名称官方注册表,用于HTTP Content-Type头和MIME(如charset=utf-8)。
针对不同语言组的8位单字节编码系列,ISO 8859-1(Latin-1)是Unicode前256个码位的基础。
将单字节ASCII/JIS罗马字与双字节JIS X 0208汉字相结合的日语字符编码,仍在传统日语系统中使用。
仅覆盖BMP(U+0000–U+FFFF)的废弃固定2字节编码,是UTF-16的前身,无法表示补充字符。