유니코드 스푸핑
유니코드 기능을 사용하여 사용자를 속이는 것: 가짜 도메인을 위한 동형이자, 가짜 파일 확장자를 위한 양방향 재정의, 숨겨진 텍스트를 위한 보이지 않는 문자.
What is Unicode Spoofing?
Unicode spoofing is a class of cyberattack that exploits the visual similarity between Unicode characters to deceive users, systems, or automated tools. Rather than hacking servers or stealing credentials directly, Unicode spoofing attacks manipulate human perception — making something malicious appear legitimate by substituting visually identical characters from different Unicode code points.
The attack is possible because Unicode encodes over 149,000 characters from more than 150 scripts, and many characters across those scripts look identical or nearly identical when rendered on screen. A Latin a, a Cyrillic а, and a Greek α are three different code points, but most fonts render them identically.
How Unicode Spoofing Works
The general pattern involves three steps:
- Identify a target string — a domain name, username, file name, or code identifier that the attacker wants to impersonate
- Substitute lookalike characters — replace one or more characters with visually identical Unicode equivalents from a different script or block
- Deploy the spoofed string — register the domain, create the account, commit the file, or send the message
To a human reader — and to many software systems that do not perform script analysis — the spoofed string appears identical to the original.
Common Attack Scenarios
Phishing via IDN homograph attack
An attacker registers аpple.com where а is Cyrillic (U+0430) instead of Latin (U+0061). The domain resolves to a phishing server. Users who click a link to this domain see what looks like apple.com in the address bar, especially in older browsers or email clients that do not display Punycode.
Username impersonation
On platforms that allow Unicode usernames, an attacker creates @elоn with a Cyrillic о (U+043E). Followers of the real @elon may be deceived into interacting with the fake account, especially in notifications or @mentions.
Source code backdoors
A malicious contributor submits code containing a function def verify_раssword(...) where р and а are Cyrillic. The function appears to be verify_password in code review. The real verify_password function is never called in certain paths, allowing authentication bypass.
File name spoofing
A malicious file named report_finalе.pdf uses a Cyrillic е (U+0435) at the end. File managers display it identically to report_finale.pdf. Combined with bidirectional override characters, the displayed filename can be made to look entirely different from the actual filename.
Mitigation Techniques
At the browser level: Modern browsers convert internationalized domain names containing mixed scripts to Punycode display (e.g., xn--pple-43d.com) to alert users to potential spoofing.
At the platform level: Social platforms can normalize usernames by mapping confusable characters to a canonical form, then preventing registration of two usernames that normalize identically.
At the application level: Developers can apply Unicode TR39 confusables checks to any identifier or string that will be displayed to users alongside other identifiers.
At the code review level: Security-aware editors and static analysis tools can flag source files that contain characters outside the expected ASCII or script range.
Quick Facts
| Property | Value |
|---|---|
| Root cause | Visual equivalence across Unicode scripts |
| Key enabling standard | Unicode TR39 confusables dataset |
| Primary attack surfaces | Domain names, usernames, source code, filenames |
| Technical name for domain variant | IDN homograph attack |
| Browser defense | Punycode fallback rendering |
| Source code defense | Linters, Unicode character set whitelisting |
| Year of notable browser fix | 2005 (Firefox added Punycode fallback) |
관련 용어
보안의 더 많은 용어
Exploiting Unicode bidirectional control characters to disguise malicious code or filenames. The …
도메인 이름에 시각적으로 유사한 유니코드 문자를 사용하여 합법적인 사이트를 사칭하는 공격. аpple.com(키릴 …
Exploiting Unicode normalization to bypass security filters. Input validated before normalization may …
U+200D. 인접 문자의 결합을 요청합니다. 이모지 시퀀스에 필수적입니다(👩+ZWJ+💻=👩💻). 인도 문자에서는 합자 형성을 …
U+200C. 인접 문자의 결합을 방지합니다. 페르시아어/아랍어에서 올바른 글자 형태를 위해 필수적이며, 데바나가리에서 …
서로 다른 문자 체계에서 동일하거나 매우 유사하게 보이는 문자. 예: 라틴 'a'와 …
유니코드 양방향 재정의 문자(U+202A~U+202E, U+2066~U+2069)를 사용하여 악성 파일 이름이나 코드를 위장하는 공격. …
confusables.txt(UCD)에 정의된 시각적으로 혼동될 수 있는 문자 쌍에 대한 유니코드 공식 용어. …
서로 다른 문자 체계의 문자를 혼합하는 텍스트를 식별합니다(예: 라틴 + 키릴). 동형이자 …