문자당 1~4바이트를 사용하는 가변 길이 유니코드 인코딩. 웹의 지배적 인코딩(웹사이트의 98% 이상)으로 ASCII와 완전히 하위 호환됩니다.

What is 문자 인코딩?

문자를 디지털 저장 및 전송을 위한 바이트 시퀀스에 매핑하는 시스템. 모든 텍스트 파일에는 인코딩이 있으며, 올바르게 선언되었는지가 중요합니다.

What is 제어 문자?

텍스트 처리를 제어하는 비출력 문자. C0(U+0000~U+001F): NUL, TAB, LF, CR, ESC. C1(U+0080~U+009F): 현대 유니코드에서는 거의 사용되지 않습니다. 일반 범주: Cc.

모든 문자 체계의 모든 문자에 고유 번호(코드 포인트)를 부여하는 범용 문자 인코딩 표준. 버전 16.0에는 154,998개의 할당된 문자가 포함됩니다.

인코딩

ASCII

미국 정보 교환 표준 부호. 0~127의 128개 문자를 다루는 7비트 인코딩으로, 제어 문자, 숫자, 라틴 문자, 기본 기호를 포함합니다.

2021-02-01 · Updated 2024-06-15

What is ASCII?

ASCII — the American Standard Code for Information Interchange — is the foundational character encoding standard that underpins virtually all modern text processing. Finalized in 1963 and revised in 1967, ASCII defines a mapping between 128 integer values (0–127) and a specific set of characters: 33 non-printable control characters, 10 digits, 52 letters (26 uppercase, 26 lowercase), and 33 punctuation and symbol characters.

Understanding ASCII is not merely historical homework. It remains deeply embedded in protocols, file formats, programming languages, and the design of every encoding standard that came after it — including UTF-8, which was intentionally designed to be fully backward compatible with ASCII.

How ASCII Works

ASCII represents each character as a 7-bit binary number. Although modern systems store ASCII in 8-bit bytes, the high bit is always 0, leaving 128 usable positions. The layout was carefully designed:

0–31: Non-printable control characters (NUL, TAB, LF, CR, ESC, etc.)
32: Space
33–47: Punctuation (!, ", #, $, %, &, ', (, ), *, +, ,, -, ., /)
48–57: Digits 0–9
65–90: Uppercase A–Z
97–122: Lowercase a–z

One clever design choice: uppercase and lowercase letters differ by exactly one bit (bit 5). A is 65 (0b01000001), a is 97 (0b01100001). This made case-insensitive comparisons trivial in early hardware by simply masking or setting a single bit.

Code Examples

# Python: ASCII value of a character
ord('A')   # 65
ord('a')   # 97
ord('0')   # 48
ord('\n')  # 10  (newline / LF)

# Character from ASCII value
chr(65)    # 'A'
chr(97)    # 'a'

# Check if a string is pure ASCII
'Hello'.isascii()       # True
'Héllo'.isascii()       # False
'Hello\n'.isascii()     # True  (control chars count)

// JavaScript
'A'.charCodeAt(0);   // 65
String.fromCharCode(65);  // 'A'

// Check ASCII-safe range
[...'Hello'].every(c => c.charCodeAt(0) < 128);  // true

Quick Facts

Property	Value
Full Name	American Standard Code for Information Interchange
Year	1963 (finalized), 1967 (revised)
Bits per character	7
Total characters	128 (0–127)
Printable characters	95
Control characters	33
Standard body	ASA (now ANSI)
Modern relevance	Subset of Unicode, UTF-8, Latin-1, Windows-1252

Common Pitfalls

Confusing ASCII with Latin-1 or Windows-1252. ASCII ends at code point 127. The characters in the 128–255 range (é, ü, ñ, etc.) are NOT ASCII — they belong to extended encodings like ISO 8859-1 or Windows-1252. Many developers incorrectly say "ASCII" when they mean one of these extended encodings.

Assuming "ASCII text" is safe everywhere. While ASCII characters have identical byte values in UTF-8, UTF-16, and UTF-32, the surrounding binary framing differs. An ASCII file opened as UTF-16 (which uses 2 bytes per character and may have a BOM) will produce garbage.

Forgetting that control characters are ASCII. Tab (9), newline (10), carriage return (13), null (0), escape (27) — these are all ASCII values. The presence of control characters does not mean a file is "not ASCII."

ASCII in the Unicode Ecosystem

Unicode's first 128 code points (U+0000 to U+007F) are identical to ASCII. This was a deliberate choice to ensure backward compatibility. Every ASCII document is automatically valid UTF-8 without any changes to the byte values. This compatibility is one of the key reasons UTF-8 became the dominant web encoding.

The ASCII control characters also survive in Unicode, though several (like NUL, U+0000) have special handling in software. The printable ASCII range (U+0020–U+007E) is sometimes called the Basic Latin block in Unicode terminology.