セキュリティ

Bidi オーバーライド攻撃

Unicode双方向オーバーライド文字(U+202A〜U+202E・U+2066〜U+2069)を使って悪意のあるファイル名やコードを偽装する攻撃。'readme‮fdp.exe'は'readmeexe.pdf'と表示されます。

· Updated

What is a Bidi Override?

A bidi override is a Unicode text manipulation technique that uses bidirectional (bidi) control characters to reverse or scramble the display order of characters on screen, making text appear different from what it actually contains. Because many writing systems — Arabic, Hebrew, Persian, Urdu, and others — are written right-to-left (RTL), Unicode includes a bidirectional algorithm (Unicode Standard Annex #9) and a set of invisible control characters to control text direction. Attackers exploit these characters to disguise the true content of filenames, URLs, source code, and messages.

The Bidirectional Algorithm

Unicode's Bidirectional Algorithm (UBA) determines the display order of characters in mixed-direction text. It works automatically for most text — Arabic runs right-to-left, Latin runs left-to-right, and they are visually arranged correctly. The bidi control characters allow authors to override or modify this automatic behavior.

The most dangerous characters for security purposes are:

Character Code Point Name Effect
RLO U+202E RIGHT-TO-LEFT OVERRIDE Forces all following characters to display RTL
LRO U+202D LEFT-TO-RIGHT OVERRIDE Forces all following characters to display LTR
RLE U+202B RIGHT-TO-LEFT EMBEDDING Creates RTL embedding level
LRE U+202A LEFT-TO-RIGHT EMBEDDING Creates LTR embedding level
PDF U+202C POP DIRECTIONAL FORMATTING Ends the most recent override or embedding
RLI U+2067 RIGHT-TO-LEFT ISOLATE Isolates RTL section
LRI U+2066 LEFT-TO-RIGHT ISOLATE Isolates LTR section

Attack Scenarios

Filename spoofing (the classic attack) A malicious executable can be named to appear as a harmless document. For example, the actual filename might be stored as:

evil[RLO]fdp.exe

When rendered, the RLO character causes everything after it to display right-to-left. The screen shows: evil.pdf — the .exe extension is hidden and the fdp becomes pdf in the reversed display. Users believe they are opening a PDF file.

This attack was actively used in malware distribution, particularly in USB-spread worms and email attachments, and was exploited in targeted attacks against energy companies and government systems around 2011–2012.

Source code backdoors (CVE-2021-42574, "Trojan Source") Described by researchers Nicholas Boucher and Ross Anderson at Cambridge in 2021, the Trojan Source attack embeds bidi override characters inside code comments or string literals. The source code renders differently in a text editor (which respects bidi) than it is actually interpreted by the compiler (which ignores bidi control characters in code). An attacker can make a security check appear to be inside an if block when it is actually outside, or hide an entire malicious code path inside what appears to be a comment.

URL obfuscation Bidi overrides in URLs displayed in browser status bars or email clients can make a malicious domain appear to be a legitimate one.

Mitigations

  • Compilers and interpreters: After CVE-2021-42574, GCC, Clang, Rust, Go, Python, and others added warnings or errors for bidi override characters in source code
  • Code editors: VS Code, Vim, and others added visual indicators for bidi control characters
  • File managers: Modern operating systems show warnings for filenames containing bidi overrides
  • Email clients: Display raw filenames or strip bidi characters from attachment names

Quick Facts

Property Value
Primary control character U+202E RIGHT-TO-LEFT OVERRIDE (RLO)
Governing algorithm Unicode Standard Annex #9 (UAX#9)
Notable CVE CVE-2021-42574 (Trojan Source)
Trojan Source researchers Boucher & Anderson, Cambridge, 2021
Affected surfaces Filenames, source code, URLs, messages
Compiler responses GCC, Clang, Rust, Go, Python all patched in 2021
Legitimate use Displaying Arabic/Hebrew mixed with Latin text

関連用語

セキュリティ のその他の用語

Bidi Text Attack

Exploiting Unicode bidirectional control characters to disguise malicious code or filenames. The …

IDN ホモグラフ攻撃

ドメイン名に視覚的に似たUnicode文字を使って正規サイトになりすます攻撃。аpple.com(キリルа)はapple.comに見えます。ブラウザはPunycodeの表示ルールで防御します。

Normalization Attack

Exploiting Unicode normalization to bypass security filters. Input validated before normalization may …

Unicode スプーフィング

Unicode機能を使ってユーザーを欺くこと:偽ドメインのためのホモグリフ・偽ファイル拡張子のためのBidiオーバーライド・隠しテキストのための不可視文字。

ゼロ幅接合子 (ZWJ)

U+200D。隣接する文字の結合を要求します。絵文字シーケンスに不可欠です(👩+ZWJ+💻=👩‍💻)。インド系文字では合字形成を要求します。テキスト境界を隠すためにも使われます。

ゼロ幅非接合子 (ZWNJ)

U+200C。隣接する文字の結合を防ぎます。ペルシャ語/アラビア語で正しい文字形態のために必須で、デーヴァナーガリーで合字を防ぐためにも使われます。

ホモグリフ

異なるスクリプトから来た同一または非常に似て見える文字。例:ラテン'a'とキリル'а'。フィッシング・スプーフィング・ソーシャルエンジニアリング攻撃に使われます。

混同しやすい文字

confusables.txt(UCD)で定義された、視覚的に混同しやすい文字ペアに対するUnicodeの公式用語。ホモグリフより広い概念で、単に似ているだけの文字も含みます。

混在スクリプト検出

異なるスクリプトの文字を混在させるテキストを識別します(例:ラテン+キリル)。ホモグリフ攻撃に対する主要な防御で、ブラウザはこれを使ってPunycode表示をトリガーします。