What is 유니코드 양방향 알고리즘 (UBA)?

문자 양방향 범주와 명시적 방향 재정의를 사용하여 혼합 방향 텍스트(예: 영어 + 아랍어)의 표시 순서를 결정하는 알고리즘.

What is RTL (오른쪽에서 왼쪽)?

문자가 오른쪽에서 왼쪽으로 흐르는 텍스트 방향. 아랍어, 히브리어, 타아나 문자 등에서 사용되며, 올바른 표시를 위해 양방향 알고리즘이 필요합니다.

보안

양방향 재정의 공격

유니코드 양방향 재정의 문자(U+202A~U+202E, U+2066~U+2069)를 사용하여 악성 파일 이름이나 코드를 위장하는 공격. 'readme‮fdp.exe'는 'readmeexe.pdf'로 표시됩니다.

2024-11-04 · Updated 2025-07-03

What is a Bidi Override?

A bidi override is a Unicode text manipulation technique that uses bidirectional (bidi) control characters to reverse or scramble the display order of characters on screen, making text appear different from what it actually contains. Because many writing systems — Arabic, Hebrew, Persian, Urdu, and others — are written right-to-left (RTL), Unicode includes a bidirectional algorithm (Unicode Standard Annex #9) and a set of invisible control characters to control text direction. Attackers exploit these characters to disguise the true content of filenames, URLs, source code, and messages.

The Bidirectional Algorithm

Unicode's Bidirectional Algorithm (UBA) determines the display order of characters in mixed-direction text. It works automatically for most text — Arabic runs right-to-left, Latin runs left-to-right, and they are visually arranged correctly. The bidi control characters allow authors to override or modify this automatic behavior.

The most dangerous characters for security purposes are:

Character	Code Point	Name	Effect
RLO	U+202E	RIGHT-TO-LEFT OVERRIDE	Forces all following characters to display RTL
LRO	U+202D	LEFT-TO-RIGHT OVERRIDE	Forces all following characters to display LTR
RLE	U+202B	RIGHT-TO-LEFT EMBEDDING	Creates RTL embedding level
LRE	U+202A	LEFT-TO-RIGHT EMBEDDING	Creates LTR embedding level
PDF	U+202C	POP DIRECTIONAL FORMATTING	Ends the most recent override or embedding
RLI	U+2067	RIGHT-TO-LEFT ISOLATE	Isolates RTL section
LRI	U+2066	LEFT-TO-RIGHT ISOLATE	Isolates LTR section

Attack Scenarios

Filename spoofing (the classic attack) A malicious executable can be named to appear as a harmless document. For example, the actual filename might be stored as:

evil[RLO]fdp.exe

When rendered, the RLO character causes everything after it to display right-to-left. The screen shows: evil.pdf — the .exe extension is hidden and the fdp becomes pdf in the reversed display. Users believe they are opening a PDF file.

This attack was actively used in malware distribution, particularly in USB-spread worms and email attachments, and was exploited in targeted attacks against energy companies and government systems around 2011–2012.

Source code backdoors (CVE-2021-42574, "Trojan Source") Described by researchers Nicholas Boucher and Ross Anderson at Cambridge in 2021, the Trojan Source attack embeds bidi override characters inside code comments or string literals. The source code renders differently in a text editor (which respects bidi) than it is actually interpreted by the compiler (which ignores bidi control characters in code). An attacker can make a security check appear to be inside an if block when it is actually outside, or hide an entire malicious code path inside what appears to be a comment.

URL obfuscation Bidi overrides in URLs displayed in browser status bars or email clients can make a malicious domain appear to be a legitimate one.

Mitigations

Compilers and interpreters: After CVE-2021-42574, GCC, Clang, Rust, Go, Python, and others added warnings or errors for bidi override characters in source code
Code editors: VS Code, Vim, and others added visual indicators for bidi control characters
File managers: Modern operating systems show warnings for filenames containing bidi overrides
Email clients: Display raw filenames or strip bidi characters from attachment names

Quick Facts

Property	Value
Primary control character	U+202E RIGHT-TO-LEFT OVERRIDE (RLO)
Governing algorithm	Unicode Standard Annex #9 (UAX#9)
Notable CVE	CVE-2021-42574 (Trojan Source)
Trojan Source researchers	Boucher & Anderson, Cambridge, 2021
Affected surfaces	Filenames, source code, URLs, messages
Compiler responses	GCC, Clang, Rust, Go, Python all patched in 2021
Legitimate use	Displaying Arabic/Hebrew mixed with Latin text