📖 Unicode History & Culture

The Birth of ASCII (1963)

ASCII was created in 1963 by the American Standards Association to standardize communication between different computers and telegraph systems using a 7-bit character set of 128 code points. This article tells the story of ASCII's creation, the engineers behind it, and how a 60-year-old standard still underpins modern computing.

Published 2024-08-01 · Updated 2025-06-17

ASCII — the American Standard Code for Information Interchange — is one of the most consequential technical standards ever created. Ratified in 1963 and refined through the 1960s, it established a shared language between machines at a time when every computer manufacturer invented its own character encoding. Its 128 characters still underpin virtually every digital text system in use today.

The Problem ASCII Solved

Before ASCII, computing was a tower of Babel. IBM machines used one encoding, Burroughs used another, and teletype networks used yet another. A document typed on a Univac terminal could not be meaningfully transmitted to an IBM 1401. In an era when government agencies, universities, and businesses were beginning to connect computers via telephone lines, this incompatibility was a serious obstacle.

The situation was inherited partly from the telegraph era. The Baudot code (1870) and its successor ITA-2 (1932) were already in wide use for teleprinter networks. These 5-bit codes had 32 possible values — enough for uppercase letters and digits, but not lowercase or punctuation. As computers needed richer character sets, manufacturers improvised their own solutions.

Bob Bemer and the X3.2 Committee

The effort to create a universal American standard began in 1960 under ASA (American Standards Association, later ANSI). The committee responsible was X3.2, the Subcommittee on Coded Character Sets and Data Format.

The pivotal figure was Bob Bemer, an IBM programmer who had been advocating for a unified character code since the late 1950s. Bemer contributed several ideas that made it into the final standard, including the escape character (ESC, 0x1B), which allowed a single character set to signal switches to alternate modes — a concept that survives in ANSI escape sequences used in terminals to this day.

Bemer is also credited with popularizing the backslash character (\\) as a complement to the forward slash, and he was a strong advocate for including lowercase letters, which some committee members initially considered a luxury.

Why 7 Bits? Why 128 Characters?

ASCII uses 7 bits per character, yielding exactly 128 code points (0–127). This was a deliberate engineering compromise. Six bits (64 values) was too few — it could not accommodate both uppercase and lowercase letters alongside digits and punctuation. Eight bits (256 values) was seen as wasteful given the transmission costs of the era, where every bit added to a character widened bandwidth requirements.

Seven bits also fit comfortably inside the 8-bit bytes that were becoming standard, leaving the eighth bit free for parity checking — an error-detection mechanism critical in noisy telephone-line transmissions of the 1960s.

The 128 slots were allocated with considerable care:

0–31 and 127: 33 control characters (non-printing)
32: Space
33–47: Punctuation and symbols
48–57: Digits 0–9
58–64: More punctuation
65–90: Uppercase A–Z
91–96: More punctuation (including [, \\, ], ^, _, `)
97–122: Lowercase a–z
123–126: {, |, }, ~

Control Characters: ASCII's Hidden Layer

Nearly a quarter of ASCII's 128 values are control characters — codes that do not represent printable symbols but instead command devices. Many have their roots in telegraphy and Teletype machine operation.

Key control characters include:

NUL (0x00): Null, used as a string terminator in C.
BEL (0x07): Bell — literally rang a bell on teletype machines; still triggers a terminal beep.
BS (0x08): Backspace — moved the print head left without erasing.
HT (0x09): Horizontal Tab.
LF (0x0A): Line Feed — advances paper one line. Unix line endings use LF alone.
CR (0x0D): Carriage Return — returns the print head to the start of the line. Windows uses CR+LF.
ESC (0x1B): Escape — signals a following sequence has special meaning.
DEL (0x7F): Delete — originally punched all holes in a paper tape to obliterate a character.

The CR/LF split reflects the physical operation of the Teletype Model 33, where two separate mechanical actions were required to start a new line. This historical accident explains why Windows and Unix still disagree on line endings sixty years later.

The 1963 and 1968 Standards

The first published ASCII standard appeared in 1963 as ASA X3.4-1963. It was revised in 1967 and finalized in its now-canonical form as ANSI X3.4-1968. President Lyndon B. Johnson mandated that all federal computers use ASCII in 1968, which dramatically accelerated its adoption.

The standard was also published internationally as ISO 646 (1967), though ISO 646 allowed national variants that substituted characters for local needs — for example, replacing # with £ in the British variant, or @ with à in French versions. This created compatibility headaches of its own.

ASCII's Lasting Legacy

ASCII's influence is so pervasive it is almost invisible. The Latin letters, digits, and punctuation in ASCII occupy the first 128 code points of every Unicode version ever published. UTF-8, the dominant encoding of the modern web, was specifically designed so that any ASCII-only byte stream is simultaneously valid UTF-8. A plain-text file written in 1968 on an ASCII terminal is byte-for-byte identical to a modern UTF-8 document containing the same content.

The control characters have fared nearly as well. CR, LF, HT, and NUL are universal. ESC sequences spawned the entire ANSI/VT100 terminal protocol family, which in turn became the foundation for xterm, iTerm2, and every modern terminal emulator.

Bob Bemer, who lived until 2004, often expressed ambivalence about ASCII's longevity. He understood it as a practical compromise, not a vision of perfection. The real testament to the standard is not that it was ideal, but that it was good enough, and arrived at exactly the right moment.

Unicode History & Culture 中的更多内容

EBCDIC: IBM's Alternative

EBCDIC (Extended Binary Coded Decimal Interchange Code) was IBM's character encoding used …

The Unicode Consortium: Who Decides?

The Unicode Consortium is the non-profit organization responsible for developing and maintaining …

How New Characters Get Added to Unicode

Adding a new character to Unicode requires submitting a detailed proposal to …

The Emoji Proposal Process

Getting a new emoji into Unicode requires a formal proposal to the …

CJK Unification: Controversy and Compromise

CJK unification was Unicode's decision to assign the same code points to …

The Mojibake Problem: A History

Mojibake — Japanese for 'character transformation' — is the garbled text that …

Unicode Milestones

From the first Unicode draft in 1988 to the addition of emoji, …

How Unicode Changed the Internet

Before Unicode became universal, the web was fragmented by incompatible national encodings …

Fun Unicode Facts and Easter Eggs

Unicode is full of surprising, obscure, and occasionally humorous characters — from …

← 返回指南列表