What is Karıştırılabilir karakter?

Unicode'un görsel olarak karıştırılabilecek karakter çiftleri için resmi terimi, confusables.txt (UCD) dosyasında tanımlanır. Homoglyph'lerden daha geniş — sadece özdeş değil, benzer olanları da içerir.

What is IDN homograf saldırısı?

Alan adlarında görsel olarak benzer Unicode karakterler kullanarak meşru siteleri taklit etme. аpple.com (Kiril а) apple.com'a benzer. Tarayıcılar Punycode görüntüleme kurallarıyla savunma yapar.

What is Yazı sistemi?

Bir karakterin ait olduğu yazı sistemi (örneğin, Latin, Cyrillic, Han). Unicode 16.0, 168 yazı sistemi tanımlar; Script özelliği güvenlik ve karışık yazı tespiti için anahtar öneme sahiptir.

Güvenlik

Homoglif

Farklı alfabelerden özdeş veya çok benzer görünen karakterler, örneğin Latin 'a' ve Kiril 'а'. Kimlik avı, sahtekarlık ve sosyal mühendislik saldırılarında kullanılır.

2024-09-01 · Updated 2025-08-11

What is a Homoglyph?

A homoglyph is a character that looks visually identical or nearly identical to another character but has a completely different Unicode code point, name, and meaning. The word comes from the Greek homos (same) and glyphe (carving or symbol). Because modern typefaces render these characters with the same shape on screen, human eyes — and sometimes software — cannot distinguish between them.

The most well-known example is the Latin lowercase letter a (U+0061) and the Cyrillic lowercase letter а (U+0430). They are rendered identically in most fonts, yet they are entirely different code points belonging to different Unicode scripts. Dozens of such pairs exist across Latin, Greek, Cyrillic, Armenian, and many other scripts.

Why Homoglyphs Are a Security Problem

The Unicode Standard encodes characters from over 150 scripts, and many scripts independently developed symbols that resemble those in other scripts. This is expected and linguistically valid. The security problem arises when attackers deliberately substitute one character for another to trick users into believing they are looking at something they are not.

Common targets include:

Domain names: The Internationalized Domain Names in Applications (IDNA) standard allows non-ASCII characters in domain names. An attacker can register pаypal.com using a Cyrillic а and create a convincing phishing site that appears to be paypal.com to a casual viewer.
Usernames and handles: Social platforms that allow Unicode usernames are vulnerable to impersonation attacks where a fake account mimics a real one character-for-character.
Source code and filenames: Homoglyphs in variable names or filenames can introduce subtle backdoors that are nearly impossible to spot during code review.

Common Homoglyph Pairs

Many scripts contribute characters that visually overlap with Latin letters:

Latin o (U+006F), Cyrillic о (U+043E), Greek ο (U+03BF) — all look like "o"
Latin p (U+0070) and Cyrillic р (U+0440) — identical lowercase forms
Latin c (U+0063) and Cyrillic с (U+0441) — identical lowercase forms
Latin e (U+0065) and Cyrillic е (U+0435) — identical lowercase forms
Latin H (U+0048) and Cyrillic Н (U+041D) — identical uppercase forms

This means the word "COPE" written entirely in Cyrillic characters — СОРЕ — looks exactly like the Latin word "COPE" in most fonts.

How to Detect and Prevent Homoglyph Attacks

Unicode Technical Report #39 (Unicode Security Mechanisms) defines a confusables dataset that maps thousands of characters to their "safe" visual equivalents. Software can use this dataset to normalize or flag suspicious text.

Common defenses include:

Script mixing detection — reject or warn when a string contains characters from more than one script
Confusables normalization — map potentially confusing characters to a canonical form before storage or comparison
Punycode display — browsers display internationalized domain names in Punycode (xn--...) form when mixed scripts are detected
Visual diff tools — security-aware editors can highlight characters that are not in the expected script

Quick Facts

Property	Value
Term origin	Greek homos (same) + glyphe (symbol)
Key Unicode document	Unicode TR39 — Unicode Security Mechanisms
Confusables data file	`confusables.txt` in Unicode Character Database
Most exploited scripts	Latin, Cyrillic, Greek, Armenian
Primary attack surface	Domain names (IDN), usernames, source code
Browser defense	Punycode fallback for mixed-script domains
Related term	Confusable, IDN homograph attack

İlgili Terimler

Karıştırılabilir karakter IDN homograf saldırısı Yazı sistemi

Güvenlik içinde daha fazlası

Bidi Text Attack

Exploiting Unicode bidirectional control characters to disguise malicious code or filenames. The …

IDN homograf saldırısı

Alan adlarında görsel olarak benzer Unicode karakterler kullanarak meşru siteleri taklit etme. …

Karıştırılabilir karakter

Unicode'un görsel olarak karıştırılabilecek karakter çiftleri için resmi terimi, confusables.txt (UCD) dosyasında …

Karışık betik tespiti

Farklı alfabelerden karakterleri karıştıran metni tanımlama (örn. Latin + Kiril). Homoglyph saldırılarına …

Normalization Attack

Exploiting Unicode normalization to bypass security filters. Input validated before normalization may …

Unicode sahtekarlığı

Kullanıcıları aldatmak için Unicode özelliklerini kullanma: sahte alan adları için homoglyph'ler, sahte …

Zero Width Joiner (ZWJ)

U+200D. Bitişik karakterlerin birleştirilmesini ister. Emoji dizileri için kritik (👩+ZWJ+💻=👩‍💻). Hint alfabelerinde …

Zero Width Non-Joiner (ZWNJ)

U+200C. Bitişik karakterlerin birleşmesini önler. Farsça/Arapça'da doğru harf biçimleri için ve Devanagari'de …

Çift yönlü geçersiz kılma saldırısı

Unicode çift yönlü geçersiz kılma karakterlerini (U+202A–U+202E, U+2066–U+2069) kullanarak kötü amaçlı dosya …

← Sözlüğe Geri Dön