编程与开发

空字符

U+0000(NUL),第一个Unicode/ASCII字符,在C/C++中用作字符串终止符,安全隐患:空字节注入可能在易受攻击的系统中截断字符串。

· Updated

What Is the Null Character?

The Null Character is U+0000, the code point at position zero in the Unicode standard. It is also known as NUL, NULL, or \0. It was inherited from ASCII (where it is defined as 000 in octal, 0x00 in hex) and has the lowest possible code point value.

In many systems and programming languages, the null character serves as a string terminator — a sentinel value that marks the end of a string. In C and related languages, strings are arrays of bytes terminated by a \0 byte. In higher-level languages like Python and JavaScript, strings are length-counted rather than null-terminated, so \0 is a valid character that can appear anywhere in a string.

In C and C-Style Languages

// C: null-terminated strings
char str[] = "Hello";
// Stored as: H e l l o \0
// str[5] == '\0' (null terminator)

// strlen() counts bytes until \0
strlen("Hello\0World");  // 5 — stops at first \0
printf("%s\n", "Hello\0World");  // prints "Hello" only

The null terminator convention is the source of null injection attacks in security: if a high-level language allows \0 in strings but a lower-level system truncates at it, an attacker can craft inputs like "admin\0.jpg" to confuse the system.

In Python

Python strings are length-counted; \0 is a valid string character:

s = "Hello\x00World"
len(s)          # 11 — counts the null
s[5]            # "\x00"
"\x00" in s     # True
print(s)        # "Hello World" (terminal may hide the null)

# Null bytes cause errors with C-extension interfaces
import os
try:
    os.stat("file\x00name")   # ValueError: embedded null character
except ValueError as e:
    print(e)  # embedded null character

# Checking for null bytes
"\x00" in user_input        # security check for null injection
user_input.replace("\x00", "")  # strip null bytes

In JavaScript

// JavaScript strings can contain \0
const s = "Hello\u0000World";
s.length;           // 11
s.charCodeAt(5);    // 0
s.includes("\0");   // true

// alert() and DOM APIs may truncate or mishandle null characters
console.log(s);     // "Hello World" (null is invisible in most consoles)

Security Implications

Null characters have been exploited in several vulnerability classes:

  1. Null byte injection in file paths: "../etc/passwd\0.jpg" — C-level fopen sees the path as "../etc/passwd", ignoring the .jpg suffix.
  2. SQL injection with nulls: Some SQL parsers or ORMs may mishandle null bytes in query parameters.
  3. LDAP injection: Null bytes can terminate LDAP filter strings prematurely.
# Secure input validation: reject null bytes
def validate_filename(name: str) -> str:
    if "\x00" in name:
        raise ValueError("Filename contains null byte")
    return name

In File Formats

Null bytes have specific roles in binary file formats: padding in fixed-width fields, record separators in some database formats, and terminators in null-padded string fields (common in C structs serialized to disk).

# Reading a fixed-width null-padded field from binary
raw_field = b"Alice\x00\x00\x00\x00\x00"  # 10 bytes, null-padded
name = raw_field.rstrip(b"\x00").decode("utf-8")  # "Alice"

Unicode Status

In Unicode, U+0000 is a valid code point but a restricted character in several contexts: - XML forbids U+0000 in documents. - UTF-8 encoding of U+0000 is the single byte 0x00 (not the modified UTF-8 0xC0 0x80). Java's Modified UTF-8 encodes it as 0xC0 0x80 to avoid embedded nulls.

Quick Facts

Property Value
Code point U+0000
Name NULL (NUL)
ASCII equivalent \0
C usage String terminator
Python/JS Valid string character (length-counted strings)
UTF-8 encoding 0x00 (single byte)
XML Forbidden in XML documents
Security risk Null injection — always sanitize in security-sensitive contexts

相关术语

编程与开发 中的更多内容