What is 控制字符?

控制文本处理的不可打印字符。C0（U+0000–U+001F）：NUL、TAB、LF、CR、ESC；C1（U+0080–U+009F）：在现代Unicode中极少使用。一般类别：Cc。

美国信息交换标准代码。7位编码，涵盖128个字符（0–127），包括控制字符、数字、拉丁字母和基本符号。

编程与开发

空字符

U+0000（NUL），第一个Unicode/ASCII字符，在C/C++中用作字符串终止符，安全隐患：空字节注入可能在易受攻击的系统中截断字符串。

2024-08-15 · Updated 2025-07-21

What Is the Null Character?

The Null Character is U+0000, the code point at position zero in the Unicode standard. It is also known as NUL, NULL, or \0. It was inherited from ASCII (where it is defined as 000 in octal, 0x00 in hex) and has the lowest possible code point value.

In many systems and programming languages, the null character serves as a string terminator — a sentinel value that marks the end of a string. In C and related languages, strings are arrays of bytes terminated by a \0 byte. In higher-level languages like Python and JavaScript, strings are length-counted rather than null-terminated, so \0 is a valid character that can appear anywhere in a string.

In C and C-Style Languages

// C: null-terminated strings
char str[] = "Hello";
// Stored as: H e l l o \0
// str[5] == '\0' (null terminator)

// strlen() counts bytes until \0
strlen("Hello\0World");  // 5 — stops at first \0
printf("%s\n", "Hello\0World");  // prints "Hello" only

The null terminator convention is the source of null injection attacks in security: if a high-level language allows \0 in strings but a lower-level system truncates at it, an attacker can craft inputs like "admin\0.jpg" to confuse the system.

In Python

Python strings are length-counted; \0 is a valid string character:

s = "Hello\x00World"
len(s)          # 11 — counts the null
s[5]            # "\x00"
"\x00" in s     # True
print(s)        # "Hello World" (terminal may hide the null)

# Null bytes cause errors with C-extension interfaces
import os
try:
    os.stat("file\x00name")   # ValueError: embedded null character
except ValueError as e:
    print(e)  # embedded null character

# Checking for null bytes
"\x00" in user_input        # security check for null injection
user_input.replace("\x00", "")  # strip null bytes

In JavaScript

// JavaScript strings can contain \0
const s = "Hello\u0000World";
s.length;           // 11
s.charCodeAt(5);    // 0
s.includes("\0");   // true

// alert() and DOM APIs may truncate or mishandle null characters
console.log(s);     // "Hello World" (null is invisible in most consoles)

Security Implications

Null characters have been exploited in several vulnerability classes:

Null byte injection in file paths: "../etc/passwd\0.jpg" — C-level fopen sees the path as "../etc/passwd", ignoring the .jpg suffix.
SQL injection with nulls: Some SQL parsers or ORMs may mishandle null bytes in query parameters.
LDAP injection: Null bytes can terminate LDAP filter strings prematurely.

# Secure input validation: reject null bytes
def validate_filename(name: str) -> str:
    if "\x00" in name:
        raise ValueError("Filename contains null byte")
    return name

In File Formats

Null bytes have specific roles in binary file formats: padding in fixed-width fields, record separators in some database formats, and terminators in null-padded string fields (common in C structs serialized to disk).

# Reading a fixed-width null-padded field from binary
raw_field = b"Alice\x00\x00\x00\x00\x00"  # 10 bytes, null-padded
name = raw_field.rstrip(b"\x00").decode("utf-8")  # "Alice"

Unicode Status

In Unicode, U+0000 is a valid code point but a restricted character in several contexts: - XML forbids U+0000 in documents. - UTF-8 encoding of U+0000 is the single byte 0x00 (not the modified UTF-8 0xC0 0x80). Java's Modified UTF-8 encodes it as 0xC0 0x80 to avoid embedded nulls.

Quick Facts

Property	Value
Code point	U+0000
Name	NULL (NUL)
ASCII equivalent	`\0`
C usage	String terminator
Python/JS	Valid string character (length-counted strings)
UTF-8 encoding	`0x00` (single byte)
XML	Forbidden in XML documents
Security risk	Null injection — always sanitize in security-sensitive contexts

编程与开发中的更多内容

Java Unicode

Java strings use UTF-16 internally. char is 16-bit (only BMP). For supplementary …

Python Unicode

Python 3 uses Unicode strings by default (str = UTF-8 internally via …

Rust Unicode

Rust strings (str/String) are guaranteed valid UTF-8. char type represents a Unicode …

Unicode 正则表达式

使用Unicode属性的正则表达式模式：\p{L}（任意字母）、\p{Script=Greek}（希腊文字）、\p{Emoji}，各语言和正则引擎的支持程度不同。

Unicode 转义序列

在源代码中表示Unicode字符的语法，各语言不同：\u2713（Python/Java/JS）、\u{2713}（JS/Ruby/Rust）、\U00012345（Python/C）。

不可见字符

无可见字形的字符：空白、零宽字符、控制字符和格式字符，可能引发欺骗和文本隐写等安全问题。

乱码

代理对

在UTF-16中一起编码补充字符的两个16位码元（高代理U+D800–U+DBFF + 低代理U+DC00–U+DFFF），😀 = D83D DE00。

字符串

编程语言中的字符序列，内部表示各异：UTF-8（Go、Rust、新版Python）、UTF-16（Java、JavaScript、C#）或UTF-32（Python）。

字符串长度歧义

Unicode字符串的“长度”取决于计量单位：码元（JavaScript .length）、码位（Python len()）或字素簇。👨‍👩‍👧‍👦 = 7个码位，1个字素。

← 返回词汇表

空字符