양방향 재정의 공격
유니코드 양방향 재정의 문자(U+202A~U+202E, U+2066~U+2069)를 사용하여 악성 파일 이름이나 코드를 위장하는 공격. 'readmefdp.exe'는 'readmeexe.pdf'로 표시됩니다.
What is a Bidi Override?
A bidi override is a Unicode text manipulation technique that uses bidirectional (bidi) control characters to reverse or scramble the display order of characters on screen, making text appear different from what it actually contains. Because many writing systems — Arabic, Hebrew, Persian, Urdu, and others — are written right-to-left (RTL), Unicode includes a bidirectional algorithm (Unicode Standard Annex #9) and a set of invisible control characters to control text direction. Attackers exploit these characters to disguise the true content of filenames, URLs, source code, and messages.
The Bidirectional Algorithm
Unicode's Bidirectional Algorithm (UBA) determines the display order of characters in mixed-direction text. It works automatically for most text — Arabic runs right-to-left, Latin runs left-to-right, and they are visually arranged correctly. The bidi control characters allow authors to override or modify this automatic behavior.
The most dangerous characters for security purposes are:
| Character | Code Point | Name | Effect |
|---|---|---|---|
| RLO | U+202E | RIGHT-TO-LEFT OVERRIDE | Forces all following characters to display RTL |
| LRO | U+202D | LEFT-TO-RIGHT OVERRIDE | Forces all following characters to display LTR |
| RLE | U+202B | RIGHT-TO-LEFT EMBEDDING | Creates RTL embedding level |
| LRE | U+202A | LEFT-TO-RIGHT EMBEDDING | Creates LTR embedding level |
| U+202C | POP DIRECTIONAL FORMATTING | Ends the most recent override or embedding | |
| RLI | U+2067 | RIGHT-TO-LEFT ISOLATE | Isolates RTL section |
| LRI | U+2066 | LEFT-TO-RIGHT ISOLATE | Isolates LTR section |
Attack Scenarios
Filename spoofing (the classic attack) A malicious executable can be named to appear as a harmless document. For example, the actual filename might be stored as:
evil[RLO]fdp.exe
When rendered, the RLO character causes everything after it to display right-to-left. The screen shows: evil.pdf — the .exe extension is hidden and the fdp becomes pdf in the reversed display. Users believe they are opening a PDF file.
This attack was actively used in malware distribution, particularly in USB-spread worms and email attachments, and was exploited in targeted attacks against energy companies and government systems around 2011–2012.
Source code backdoors (CVE-2021-42574, "Trojan Source")
Described by researchers Nicholas Boucher and Ross Anderson at Cambridge in 2021, the Trojan Source attack embeds bidi override characters inside code comments or string literals. The source code renders differently in a text editor (which respects bidi) than it is actually interpreted by the compiler (which ignores bidi control characters in code). An attacker can make a security check appear to be inside an if block when it is actually outside, or hide an entire malicious code path inside what appears to be a comment.
URL obfuscation Bidi overrides in URLs displayed in browser status bars or email clients can make a malicious domain appear to be a legitimate one.
Mitigations
- Compilers and interpreters: After CVE-2021-42574, GCC, Clang, Rust, Go, Python, and others added warnings or errors for bidi override characters in source code
- Code editors: VS Code, Vim, and others added visual indicators for bidi control characters
- File managers: Modern operating systems show warnings for filenames containing bidi overrides
- Email clients: Display raw filenames or strip bidi characters from attachment names
Quick Facts
| Property | Value |
|---|---|
| Primary control character | U+202E RIGHT-TO-LEFT OVERRIDE (RLO) |
| Governing algorithm | Unicode Standard Annex #9 (UAX#9) |
| Notable CVE | CVE-2021-42574 (Trojan Source) |
| Trojan Source researchers | Boucher & Anderson, Cambridge, 2021 |
| Affected surfaces | Filenames, source code, URLs, messages |
| Compiler responses | GCC, Clang, Rust, Go, Python all patched in 2021 |
| Legitimate use | Displaying Arabic/Hebrew mixed with Latin text |
관련 용어
보안의 더 많은 용어
Exploiting Unicode bidirectional control characters to disguise malicious code or filenames. The …
도메인 이름에 시각적으로 유사한 유니코드 문자를 사용하여 합법적인 사이트를 사칭하는 공격. аpple.com(키릴 …
Exploiting Unicode normalization to bypass security filters. Input validated before normalization may …
U+200D. 인접 문자의 결합을 요청합니다. 이모지 시퀀스에 필수적입니다(👩+ZWJ+💻=👩💻). 인도 문자에서는 합자 형성을 …
U+200C. 인접 문자의 결합을 방지합니다. 페르시아어/아랍어에서 올바른 글자 형태를 위해 필수적이며, 데바나가리에서 …
서로 다른 문자 체계에서 동일하거나 매우 유사하게 보이는 문자. 예: 라틴 'a'와 …
유니코드 기능을 사용하여 사용자를 속이는 것: 가짜 도메인을 위한 동형이자, 가짜 파일 …
confusables.txt(UCD)에 정의된 시각적으로 혼동될 수 있는 문자 쌍에 대한 유니코드 공식 용어. …
서로 다른 문자 체계의 문자를 혼합하는 텍스트를 식별합니다(예: 라틴 + 키릴). 동형이자 …