What is การเข้ารหัสอักขระ?

ระบบที่แมปอักขระเป็นลำดับไบต์สำหรับการจัดเก็บและส่งผ่านข้อมูลดิจิทัล ทุกไฟล์ข้อความมีการเข้ารหัส คำถามคือมีการประกาศอย่างถูกต้องหรือไม่

What is การเข้ารหัสเปอร์เซ็นต์ (URL encoding)?

การเข้ารหัสอักขระที่ไม่ใช่ ASCII และอักขระที่สงวนไว้ใน URL โดยแทนที่แต่ละไบต์ด้วย %XX ใช้ UTF-8 ก่อน แล้วเข้ารหัส percent แต่ละไบต์: é → %C3%A9

การเข้ารหัส

Base64

Binary-to-text encoding that represents binary data using 64 ASCII characters (A–Z, a–z, 0–9, +, /). Used for embedding binary data in text-based protocols like email (MIME) and data URIs.

What is Base64?

Base64 is a binary-to-text encoding scheme that represents arbitrary binary data using a set of 64 printable ASCII characters. The name comes directly from the size of the character set. It is defined in RFC 4648 (The Base16, Base32, and Base64 Data Encodings) and is one of the most widely deployed encoding schemes in computing, appearing in email attachments, data URIs, HTTP Basic Authentication, JSON Web Tokens, and cryptographic certificate formats.

The need for Base64 arises because many communication protocols and storage systems were designed for text and cannot reliably transmit arbitrary binary bytes. Email protocols (SMTP), HTTP headers, and early internet protocols treated certain byte values as control characters with special meaning. Base64 sidesteps this problem entirely by encoding every byte as one or two printable characters.

The Base64 Alphabet

The standard Base64 alphabet (RFC 4648 Table 1) consists of:

Uppercase letters A–Z (26 characters)
Lowercase letters a–z (26 characters)
Digits 0–9 (10 characters)
Plus sign + and forward slash / (2 characters)

Together these 64 characters can each represent exactly 6 bits of data (2⁶ = 64). Base64 encodes input in 3-byte (24-bit) groups, converting each group into four 6-bit values and looking up each value in the alphabet. Three input bytes always produce exactly four output characters — a 33% size increase.

Padding: If the input length is not a multiple of 3, the output is padded with one or two = characters to make the output length a multiple of 4.

import base64

# Encoding
data = b"Hello, Unicode!"
encoded = base64.b64encode(data)
# b'SGVsbG8sIFVuaWNvZGUh'

# Decoding
decoded = base64.b64decode(encoded)
# b'Hello, Unicode!'

URL-Safe Variant (Base64url)

The standard + and / characters are significant in URLs, making standard Base64 unsafe for use in query strings, filenames, or JWT tokens. RFC 4648 defines Base64url, which replaces + with - and / with _. Padding = characters are often omitted in Base64url contexts.

# URL-safe encoding (used in JWTs, data URIs for some contexts)
encoded_url = base64.urlsafe_b64encode(data)

Common Use Cases

Data URIs: Browsers support inline resources encoded as data: URIs. A small PNG icon can be embedded directly in HTML or CSS without a network request:

data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==

MIME email (RFC 2045): Email bodies and attachments are Base64-encoded to survive transmission through SMTP servers that may modify bytes with values above 127 or interpret CR/LF sequences.

HTTP Basic Authentication: Credentials are Base64-encoded (though not encrypted) in the Authorization header: Authorization: Basic dXNlcjpwYXNzd29yZA==

JSON Web Tokens (JWT): Each section of a JWT (header, payload, signature) is Base64url-encoded and joined with dots.

Quick Facts

Property	Value
Defined in	RFC 4648
Character set size	64 printable ASCII characters
Encoding ratio	3 bytes input → 4 characters output (33% overhead)
Padding character	`=` (one or two)
URL-safe variant	Base64url: replaces `+`→`-`, `/`→`_`
Key use cases	MIME email, data URIs, JWT, HTTP Basic Auth
Python module	`import base64` (stdlib)

คำศัพท์ที่เกี่ยวข้อง

ASCII การเข้ารหัสอักขระ การเข้ารหัสเปอร์เซ็นต์ (URL encoding)

เพิ่มเติมใน การเข้ารหัส

ASCII

มาตรฐานรหัสข้อมูลของอเมริกา (American Standard Code for Information Interchange) การเข้ารหัส 7 บิตครอบคลุม 128 ตัวอักษร …

ASCII Art

Visual art created from text characters, originally limited to the 95 printable …

Big5

การเข้ารหัสอักษรจีนตัวเต็มที่ใช้ส่วนใหญ่ในไต้หวันและฮ่องกง เข้ารหัสอักขระ CJK ประมาณ 13,000 ตัว

EBCDIC

Extended Binary Coded Decimal Interchange Code รหัสเข้ารหัสของเมนเฟรม IBM ที่มีช่วงตัวอักษรไม่ต่อเนื่อง ยังคงใช้ในธนาคารและเมนเฟรมองค์กร

EUC-KR

การเข้ารหัสอักขระภาษาเกาหลีที่อิงตาม KS X 1001 แมปอักษรฮันกึลและฮันจาเป็นลำดับสองไบต์

GB2312 / GB18030

กลุ่มการเข้ารหัสอักษรจีนตัวย่อ: GB2312 (6,763 อักขระ) พัฒนาเป็น GBK แล้วเป็น GB18030 ซึ่งเป็นมาตรฐานแห่งชาติจีนที่บังคับใช้และเข้ากันได้กับ Unicode

ISO 8859

กลุ่มการเข้ารหัสไบต์เดี่ยว 8 บิตสำหรับกลุ่มภาษาต่างๆ ISO 8859-1 (Latin-1) เป็นพื้นฐานของ 256 จุดรหัสแรกของ Unicode

Shift JIS

การเข้ารหัสอักขระภาษาญี่ปุ่นที่ผสม ASCII/JIS Roman แบบไบต์เดี่ยวกับคันจิ JIS X 0208 แบบสองไบต์ ยังคงใช้งานในระบบญี่ปุ่นรุ่นเก่า

UCS-2

การเข้ารหัส 2 ไบต์แบบความยาวคงที่ที่ล้าสมัย ครอบคลุมเฉพาะ BMP (U+0000–U+FFFF) เป็นรุ่นก่อนของ UTF-16 ที่ไม่สามารถแสดงอักขระเสริมได้

UTF-16

การเข้ารหัส Unicode แบบความยาวแปรผันที่ใช้ 2 หรือ 4 ไบต์ (1 หรือ 2 หน่วยรหัส 16 …

← กลับไปยังอภิธานศัพท์