Unicode Security
Security implications of Unicode
5 このシリーズのガイド
Unicode's vast character set introduces a range of security vulnerabilities including homograph attacks, bidirectional text spoofing, normalization exploits, and invisible character injection. This overview explains the major categories of Unicode security risks and provides a framework for defending against them in web applications and APIs.
IDN homograph attacks use look-alike Unicode characters to register domain names that appear identical to legitimate ones — for example, replacing a Latin 'a' with a Cyrillic 'а'. This guide explains how to detect and prevent IDN homograph attacks in domain registration systems, browser UI, and link validation code.
Zero-width and other invisible Unicode characters can be used to fingerprint text for tracking, hide malicious payloads in code, or bypass content filters while remaining undetectable to the human eye. This guide explains how to detect, visualize, and remove invisible Unicode characters from user input and stored text using code examples in Python and JavaScript.
Unicode passwords introduce normalization ambiguity that can cause authentication failures or allow password bypasses when different normalization forms produce different byte sequences for the same visible password. This guide covers the security implications of Unicode in authentication systems and best practices for normalizing and hashing Unicode passwords.
Phishing attacks increasingly exploit Unicode confusables, bidirectional overrides, and invisible characters to create deceptive URLs, spoofed sender addresses, and misleading link text. This guide covers the techniques used in Unicode-based phishing attacks and the detection, prevention, and user-education strategies to defend against them.