ホモグリフ
Embed This Widget
Add the script tag and a data attribute to embed this widget.
Embed via iframe for maximum compatibility.
<iframe src="https://unicodefyi.com/iframe/glossary/homoglyph/" width="420" height="400" frameborder="0" style="border:0;border-radius:10px;max-width:100%" loading="lazy"></iframe>
Paste this URL in WordPress, Medium, or any oEmbed-compatible platform.
https://unicodefyi.com/glossary/homoglyph/
Add a dynamic SVG badge to your README or docs.
[](https://unicodefyi.com/glossary/homoglyph/)
Use the native HTML custom element.
異なるスクリプトから来た同一または非常に似て見える文字。例:ラテン'a'とキリル'а'。フィッシング・スプーフィング・ソーシャルエンジニアリング攻撃に使われます。
What is a Homoglyph?
A homoglyph is a character that looks visually identical or nearly identical to another character but has a completely different Unicode code point, name, and meaning. The word comes from the Greek homos (same) and glyphe (carving or symbol). Because modern typefaces render these characters with the same shape on screen, human eyes — and sometimes software — cannot distinguish between them.
The most well-known example is the Latin lowercase letter a (U+0061) and the Cyrillic lowercase letter а (U+0430). They are rendered identically in most fonts, yet they are entirely different code points belonging to different Unicode scripts. Dozens of such pairs exist across Latin, Greek, Cyrillic, Armenian, and many other scripts.
Why Homoglyphs Are a Security Problem
The Unicode Standard encodes characters from over 150 scripts, and many scripts independently developed symbols that resemble those in other scripts. This is expected and linguistically valid. The security problem arises when attackers deliberately substitute one character for another to trick users into believing they are looking at something they are not.
Common targets include:
- Domain names: The Internationalized Domain Names in Applications (IDNA) standard allows non-ASCII characters in domain names. An attacker can register
pаypal.comusing a Cyrillic а and create a convincing phishing site that appears to bepaypal.comto a casual viewer. - Usernames and handles: Social platforms that allow Unicode usernames are vulnerable to impersonation attacks where a fake account mimics a real one character-for-character.
- Source code and filenames: Homoglyphs in variable names or filenames can introduce subtle backdoors that are nearly impossible to spot during code review.
Common Homoglyph Pairs
Many scripts contribute characters that visually overlap with Latin letters:
- Latin o (U+006F), Cyrillic о (U+043E), Greek ο (U+03BF) — all look like "o"
- Latin p (U+0070) and Cyrillic р (U+0440) — identical lowercase forms
- Latin c (U+0063) and Cyrillic с (U+0441) — identical lowercase forms
- Latin e (U+0065) and Cyrillic е (U+0435) — identical lowercase forms
- Latin H (U+0048) and Cyrillic Н (U+041D) — identical uppercase forms
This means the word "COPE" written entirely in Cyrillic characters — СОРЕ — looks exactly like the Latin word "COPE" in most fonts.
How to Detect and Prevent Homoglyph Attacks
Unicode Technical Report #39 (Unicode Security Mechanisms) defines a confusables dataset that maps thousands of characters to their "safe" visual equivalents. Software can use this dataset to normalize or flag suspicious text.
Common defenses include:
- Script mixing detection — reject or warn when a string contains characters from more than one script
- Confusables normalization — map potentially confusing characters to a canonical form before storage or comparison
- Punycode display — browsers display internationalized domain names in Punycode (
xn--...) form when mixed scripts are detected - Visual diff tools — security-aware editors can highlight characters that are not in the expected script
Quick Facts
| Property | Value |
|---|---|
| Term origin | Greek homos (same) + glyphe (symbol) |
| Key Unicode document | Unicode TR39 — Unicode Security Mechanisms |
| Confusables data file | confusables.txt in Unicode Character Database |
| Most exploited scripts | Latin, Cyrillic, Greek, Armenian |
| Primary attack surface | Domain names (IDN), usernames, source code |
| Browser defense | Punycode fallback for mixed-script domains |
| Related term | Confusable, IDN homograph attack |
関連用語
セキュリティ のその他の用語
Exploiting Unicode bidirectional control characters to disguise malicious code or filenames. The …
Unicode双方向オーバーライド文字(U+202A〜U+202E・U+2066〜U+2069)を使って悪意のあるファイル名やコードを偽装する攻撃。'readmefdp.exe'は'readmeexe.pdf'と表示されます。
ドメイン名に視覚的に似たUnicode文字を使って正規サイトになりすます攻撃。аpple.com(キリルа)はapple.comに見えます。ブラウザはPunycodeの表示ルールで防御します。
Exploiting Unicode normalization to bypass security filters. Input validated before normalization may …
Unicode機能を使ってユーザーを欺くこと:偽ドメインのためのホモグリフ・偽ファイル拡張子のためのBidiオーバーライド・隠しテキストのための不可視文字。
U+200D。隣接する文字の結合を要求します。絵文字シーケンスに不可欠です(👩+ZWJ+💻=👩💻)。インド系文字では合字形成を要求します。テキスト境界を隠すためにも使われます。
U+200C。隣接する文字の結合を防ぎます。ペルシャ語/アラビア語で正しい文字形態のために必須で、デーヴァナーガリーで合字を防ぐためにも使われます。
confusables.txt(UCD)で定義された、視覚的に混同しやすい文字ペアに対するUnicodeの公式用語。ホモグリフより広い概念で、単に似ているだけの文字も含みます。
異なるスクリプトの文字を混在させるテキストを識別します(例:ラテン+キリル)。ホモグリフ攻撃に対する主要な防御で、ブラウザはこれを使ってPunycode表示をトリガーします。