What is Ký tự đồng dạng?

Các ký tự từ các chữ viết khác nhau trông giống hệt nhau hoặc rất giống nhau, chẳng hạn như chữ Latin 'a' và chữ Cyrillic 'а'. Được sử dụng trong tấn công phishing, giả mạo và kỹ thuật xã hội.

What is Tấn công đồng âm IDN?

Sử dụng các ký tự Unicode trông giống nhau trong tên miền để mạo danh các trang web hợp pháp. аpple.com (chữ а Cyrillic) trông giống apple.com. Trình duyệt phòng thủ bằng quy tắc hiển thị Punycode.

What is Tấn công ghi đè hai chiều?

Sử dụng các ký tự ghi đè hai chiều Unicode (U+202A–U+202E, U+2066–U+2069) để ngụy trang tên tệp hoặc mã độc hại. 'readme‮fdp.exe' hiển thị là 'readmeexe.pdf'.

What is Ký tự vô hình?

Bất kỳ ký tự nào không có glyph hiển thị: khoảng trắng, ký tự không chiều rộng, ký tự điều khiển và ký tự định dạng. Có thể gây ra vấn đề bảo mật như giả mạo và buôn lậu văn bản.

Bảo mật

Giả mạo Unicode

Sử dụng các tính năng Unicode để lừa người dùng: homoglyph cho tên miền giả, ghi đè bidi cho phần mở rộng tệp giả hoặc ký tự vô hình cho văn bản ẩn.

2025-01-20 · Updated 2025-11-12

What is Unicode Spoofing?

Unicode spoofing is a class of cyberattack that exploits the visual similarity between Unicode characters to deceive users, systems, or automated tools. Rather than hacking servers or stealing credentials directly, Unicode spoofing attacks manipulate human perception — making something malicious appear legitimate by substituting visually identical characters from different Unicode code points.

The attack is possible because Unicode encodes over 149,000 characters from more than 150 scripts, and many characters across those scripts look identical or nearly identical when rendered on screen. A Latin a, a Cyrillic а, and a Greek α are three different code points, but most fonts render them identically.

How Unicode Spoofing Works

The general pattern involves three steps:

Identify a target string — a domain name, username, file name, or code identifier that the attacker wants to impersonate
Substitute lookalike characters — replace one or more characters with visually identical Unicode equivalents from a different script or block
Deploy the spoofed string — register the domain, create the account, commit the file, or send the message

To a human reader — and to many software systems that do not perform script analysis — the spoofed string appears identical to the original.

Common Attack Scenarios

Phishing via IDN homograph attack An attacker registers аpple.com where а is Cyrillic (U+0430) instead of Latin (U+0061). The domain resolves to a phishing server. Users who click a link to this domain see what looks like apple.com in the address bar, especially in older browsers or email clients that do not display Punycode.

Username impersonation On platforms that allow Unicode usernames, an attacker creates @elоn with a Cyrillic о (U+043E). Followers of the real @elon may be deceived into interacting with the fake account, especially in notifications or @mentions.

Source code backdoors A malicious contributor submits code containing a function def verify_раssword(...) where р and а are Cyrillic. The function appears to be verify_password in code review. The real verify_password function is never called in certain paths, allowing authentication bypass.

File name spoofing A malicious file named report_finalе.pdf uses a Cyrillic е (U+0435) at the end. File managers display it identically to report_finale.pdf. Combined with bidirectional override characters, the displayed filename can be made to look entirely different from the actual filename.

Mitigation Techniques

At the browser level: Modern browsers convert internationalized domain names containing mixed scripts to Punycode display (e.g., xn--pple-43d.com) to alert users to potential spoofing.

At the platform level: Social platforms can normalize usernames by mapping confusable characters to a canonical form, then preventing registration of two usernames that normalize identically.

At the application level: Developers can apply Unicode TR39 confusables checks to any identifier or string that will be displayed to users alongside other identifiers.

At the code review level: Security-aware editors and static analysis tools can flag source files that contain characters outside the expected ASCII or script range.

Quick Facts

Property	Value
Root cause	Visual equivalence across Unicode scripts
Key enabling standard	Unicode TR39 confusables dataset
Primary attack surfaces	Domain names, usernames, source code, filenames
Technical name for domain variant	IDN homograph attack
Browser defense	Punycode fallback rendering
Source code defense	Linters, Unicode character set whitelisting
Year of notable browser fix	2005 (Firefox added Punycode fallback)