Sonstiges

Steuerzeichen

Nicht druckbare Zeichen, die die Textverarbeitung steuern. C0 (U+0000–U+001F): NUL, TAB, LF, CR, ESC. C1 (U+0080–U+009F): im modernen Unicode selten verwendet. Allgemeine Kategorie: Cc.

· Updated

What is a Control Character?

Control characters are Unicode (and ASCII) characters that do not represent printable glyphs but instead carry instructions that control the behavior of text-processing devices, terminals, communication protocols, and rendering systems. They occupy two code point ranges in Unicode:

  • C0 controls: U+0000 through U+001F (32 characters) — the original ASCII control characters
  • DEL: U+007F — originally the "delete" character from punched tape computing
  • C1 controls: U+0080 through U+009F (32 characters) — extensions added in the ISO 8859 era for 8-bit systems

Together these 65 code points form what Unicode calls the control characters or the Cc general category.

Origins: Teletype and Mainframe Computing

Control characters were designed for the era of teletypes, punched tape, and serial communication. The original 32 ASCII control characters (0–31) encoded physical machine operations:

  • U+0007 BEL (Bell): Ring the physical bell on a teletype to alert the operator
  • U+0008 BS (Backspace): Move the print head one character to the left
  • U+0009 HT (Horizontal Tab): Advance to the next tab stop
  • U+000A LF (Line Feed): Advance paper by one line
  • U+000D CR (Carriage Return): Return the print head to the beginning of the line
  • U+001B ESC (Escape): Signal the start of a control sequence (used in terminal escape codes)
  • U+007F DEL: Originally erased a character on punched tape by punching all holes

Many of these are now universally meaningful in modern computing: - LF (U+000A): Unix newline - CR+LF (U+000D U+000A): Windows newline - TAB (U+0009): Indentation in code and data files

The C1 Controls

The C1 range (U+0080–U+009F) was added for 8-bit character sets (ISO 8859) to extend control functionality into the 128–159 byte range. These include characters like:

  • U+0085 NEL (Next Line): A newline variant used in IBM mainframe EBCDIC environments
  • U+008D RI (Reverse Line Feed): Move cursor up one line
  • U+009B CSI (Control Sequence Introducer): Alternative to ESC+[ for ANSI terminal sequences

C1 controls are rarely used in modern text but appear in legacy data, mainframe transfers, and occasionally in security exploits.

Control Characters in Modern Computing

Most control characters have no visual representation. Unicode assigns them the general category Cc (Control) and the bidirectional category BN (Boundary Neutral). They are not normally rendered as glyphs.

The ones still actively used in modern software:

Character Code Point Modern Use
NUL U+0000 C string terminator; null byte in binary protocols
TAB U+0009 Indentation, TSV data format
LF U+000A Unix line endings
CR U+000D Part of Windows CRLF line endings
ESC U+001B ANSI escape sequences for terminal color/cursor
DEL U+007F Terminal delete key signal

Security Considerations

Control characters are a significant source of security vulnerabilities:

  • NUL byte injection (U+0000): In languages with C-string roots, an embedded NUL terminates a string. Filenames containing NUL can truncate in some contexts, enabling path traversal attacks.
  • CRLF injection (U+000D U+000A): Inserting CRLF sequences into HTTP headers, email headers, or log entries can split headers and inject fake entries — a class of attack known as HTTP response splitting.
  • Escape sequence injection (U+001B): Embedding ANSI escape sequences in log files can corrupt terminal displays or, in some terminal emulators, execute arbitrary commands.
  • C1 control obfuscation: C1 controls like NEL (U+0085) are sometimes used to bypass input validation that only strips C0 characters.
  • Bidi controls (U+202A–U+202E, U+2066–U+2069): Technically Format characters (Cf), not control characters (Cc), but closely related in their invisibility and security impact.

Input validation and sanitization routines in web applications should explicitly handle the full range of Unicode control characters, not just ASCII characters 0–31.

Quick Facts

Property Value
C0 range U+0000 – U+001F (32 characters)
DEL U+007F (1 character)
C1 range U+0080 – U+009F (32 characters)
Unicode category Cc (Control)
Total count 65 code points
Still widely used NUL, TAB, LF, CR, ESC
Security risks NUL injection, CRLF injection, escape injection
Origin Teletype and punched tape era (1960s ASCII)

Verwandte Begriffe