Ký tự null
U+0000 (NUL). Ký tự Unicode/ASCII đầu tiên, được sử dụng làm dấu kết thúc chuỗi trong C/C++. Rủi ro bảo mật: null byte injection có thể cắt ngắn chuỗi trong các hệ thống dễ bị tấn công.
What Is the Null Character?
The Null Character is U+0000, the code point at position zero in the Unicode standard. It is also known as NUL, NULL, or \0. It was inherited from ASCII (where it is defined as 000 in octal, 0x00 in hex) and has the lowest possible code point value.
In many systems and programming languages, the null character serves as a string terminator — a sentinel value that marks the end of a string. In C and related languages, strings are arrays of bytes terminated by a \0 byte. In higher-level languages like Python and JavaScript, strings are length-counted rather than null-terminated, so \0 is a valid character that can appear anywhere in a string.
In C and C-Style Languages
// C: null-terminated strings
char str[] = "Hello";
// Stored as: H e l l o \0
// str[5] == '\0' (null terminator)
// strlen() counts bytes until \0
strlen("Hello\0World"); // 5 — stops at first \0
printf("%s\n", "Hello\0World"); // prints "Hello" only
The null terminator convention is the source of null injection attacks in security: if a high-level language allows \0 in strings but a lower-level system truncates at it, an attacker can craft inputs like "admin\0.jpg" to confuse the system.
In Python
Python strings are length-counted; \0 is a valid string character:
s = "Hello\x00World"
len(s) # 11 — counts the null
s[5] # "\x00"
"\x00" in s # True
print(s) # "Hello World" (terminal may hide the null)
# Null bytes cause errors with C-extension interfaces
import os
try:
os.stat("file\x00name") # ValueError: embedded null character
except ValueError as e:
print(e) # embedded null character
# Checking for null bytes
"\x00" in user_input # security check for null injection
user_input.replace("\x00", "") # strip null bytes
In JavaScript
// JavaScript strings can contain \0
const s = "Hello\u0000World";
s.length; // 11
s.charCodeAt(5); // 0
s.includes("\0"); // true
// alert() and DOM APIs may truncate or mishandle null characters
console.log(s); // "Hello World" (null is invisible in most consoles)
Security Implications
Null characters have been exploited in several vulnerability classes:
- Null byte injection in file paths:
"../etc/passwd\0.jpg"— C-levelfopensees the path as"../etc/passwd", ignoring the.jpgsuffix. - SQL injection with nulls: Some SQL parsers or ORMs may mishandle null bytes in query parameters.
- LDAP injection: Null bytes can terminate LDAP filter strings prematurely.
# Secure input validation: reject null bytes
def validate_filename(name: str) -> str:
if "\x00" in name:
raise ValueError("Filename contains null byte")
return name
In File Formats
Null bytes have specific roles in binary file formats: padding in fixed-width fields, record separators in some database formats, and terminators in null-padded string fields (common in C structs serialized to disk).
# Reading a fixed-width null-padded field from binary
raw_field = b"Alice\x00\x00\x00\x00\x00" # 10 bytes, null-padded
name = raw_field.rstrip(b"\x00").decode("utf-8") # "Alice"
Unicode Status
In Unicode, U+0000 is a valid code point but a restricted character in several contexts:
- XML forbids U+0000 in documents.
- UTF-8 encoding of U+0000 is the single byte 0x00 (not the modified UTF-8 0xC0 0x80). Java's Modified UTF-8 encodes it as 0xC0 0x80 to avoid embedded nulls.
Quick Facts
| Property | Value |
|---|---|
| Code point | U+0000 |
| Name | NULL (NUL) |
| ASCII equivalent | \0 |
| C usage | String terminator |
| Python/JS | Valid string character (length-counted strings) |
| UTF-8 encoding | 0x00 (single byte) |
| XML | Forbidden in XML documents |
| Security risk | Null injection — always sanitize in security-sensitive contexts |
Thuật ngữ liên quan
Thêm trong Lập trình và phát triển
Mẫu regex sử dụng thuộc tính Unicode: \p{L} (bất kỳ chữ cái …
Một chuỗi ký tự trong ngôn ngữ lập trình. Biểu diễn nội …
Cú pháp biểu diễn ký tự Unicode trong mã nguồn. Khác nhau …
Hai đơn vị mã 16-bit (surrogate cao U+D800–U+DBFF + surrogate thấp U+DC00–U+DFFF) …
Java strings use UTF-16 internally. char is 16-bit (only BMP). For supplementary …
U+FFFD (�). Hiển thị khi bộ giải mã gặp chuỗi byte không …
Bất kỳ ký tự nào không có glyph hiển thị: khoảng trắng, …
Văn bản bị hỏng do giải mã byte bằng mã hóa sai. …
Mã hóa chuyển đổi ký tự thành byte (str.encode('utf-8')); giải mã chuyển …
Python 3 uses Unicode strings by default (str = UTF-8 internally via …