What is Ký tự điều khiển?

Các ký tự không in kiểm soát xử lý văn bản. C0 (U+0000–U+001F): NUL, TAB, LF, CR, ESC. C1 (U+0080–U+009F): hiếm khi dùng trong Unicode hiện đại. Danh mục chung: Cc.

Tiêu chuẩn mã hóa thông tin Mỹ (American Standard Code for Information Interchange). Mã hóa 7-bit bao gồm 128 ký tự (0–127): ký tự điều khiển, chữ số, chữ cái Latin và ký hiệu cơ bản.

Lập trình và phát triển

Ký tự null

U+0000 (NUL). Ký tự Unicode/ASCII đầu tiên, được sử dụng làm dấu kết thúc chuỗi trong C/C++. Rủi ro bảo mật: null byte injection có thể cắt ngắn chuỗi trong các hệ thống dễ bị tấn công.

2024-08-15 · Updated 2025-07-21

What Is the Null Character?

The Null Character is U+0000, the code point at position zero in the Unicode standard. It is also known as NUL, NULL, or \0. It was inherited from ASCII (where it is defined as 000 in octal, 0x00 in hex) and has the lowest possible code point value.

In many systems and programming languages, the null character serves as a string terminator — a sentinel value that marks the end of a string. In C and related languages, strings are arrays of bytes terminated by a \0 byte. In higher-level languages like Python and JavaScript, strings are length-counted rather than null-terminated, so \0 is a valid character that can appear anywhere in a string.

In C and C-Style Languages

// C: null-terminated strings
char str[] = "Hello";
// Stored as: H e l l o \0
// str[5] == '\0' (null terminator)

// strlen() counts bytes until \0
strlen("Hello\0World");  // 5 — stops at first \0
printf("%s\n", "Hello\0World");  // prints "Hello" only

The null terminator convention is the source of null injection attacks in security: if a high-level language allows \0 in strings but a lower-level system truncates at it, an attacker can craft inputs like "admin\0.jpg" to confuse the system.

In Python

Python strings are length-counted; \0 is a valid string character:

s = "Hello\x00World"
len(s)          # 11 — counts the null
s[5]            # "\x00"
"\x00" in s     # True
print(s)        # "Hello World" (terminal may hide the null)

# Null bytes cause errors with C-extension interfaces
import os
try:
    os.stat("file\x00name")   # ValueError: embedded null character
except ValueError as e:
    print(e)  # embedded null character

# Checking for null bytes
"\x00" in user_input        # security check for null injection
user_input.replace("\x00", "")  # strip null bytes

In JavaScript

// JavaScript strings can contain \0
const s = "Hello\u0000World";
s.length;           // 11
s.charCodeAt(5);    // 0
s.includes("\0");   // true

// alert() and DOM APIs may truncate or mishandle null characters
console.log(s);     // "Hello World" (null is invisible in most consoles)

Security Implications

Null characters have been exploited in several vulnerability classes:

Null byte injection in file paths: "../etc/passwd\0.jpg" — C-level fopen sees the path as "../etc/passwd", ignoring the .jpg suffix.
SQL injection with nulls: Some SQL parsers or ORMs may mishandle null bytes in query parameters.
LDAP injection: Null bytes can terminate LDAP filter strings prematurely.

# Secure input validation: reject null bytes
def validate_filename(name: str) -> str:
    if "\x00" in name:
        raise ValueError("Filename contains null byte")
    return name

In File Formats

Null bytes have specific roles in binary file formats: padding in fixed-width fields, record separators in some database formats, and terminators in null-padded string fields (common in C structs serialized to disk).

# Reading a fixed-width null-padded field from binary
raw_field = b"Alice\x00\x00\x00\x00\x00"  # 10 bytes, null-padded
name = raw_field.rstrip(b"\x00").decode("utf-8")  # "Alice"

Unicode Status

In Unicode, U+0000 is a valid code point but a restricted character in several contexts: - XML forbids U+0000 in documents. - UTF-8 encoding of U+0000 is the single byte 0x00 (not the modified UTF-8 0xC0 0x80). Java's Modified UTF-8 encodes it as 0xC0 0x80 to avoid embedded nulls.

Quick Facts

Property	Value
Code point	U+0000
Name	NULL (NUL)
ASCII equivalent	`\0`
C usage	String terminator
Python/JS	Valid string character (length-counted strings)
UTF-8 encoding	`0x00` (single byte)
XML	Forbidden in XML documents
Security risk	Null injection — always sanitize in security-sensitive contexts