🧭

The Unicode Odyssey

From Zero to Unicode Expert

A 10-chapter journey that takes you from knowing nothing about character encoding to understanding the full depth of Unicode. The definitive learning path for developers who want to truly understand the system.

10 chapters · 39,000 단어 · ~156 분 읽기

The Problem: Why We Need Unicode

Before Unicode, every language needed its own encoding — leading to the chaos of mojibake and incompatible systems. This chapter explores the fragmented world of code pages and why a universal standard became essential.

~3,500 단어 · ~14 분

The Solution: How Unicode Works

Unicode assigns a unique number — a code point — to every character in every language. This chapter explains the elegant structure of planes, blocks, and the Basic Multilingual Plane.

~4,000 단어 · ~16 분

Encoding the Codepoints: UTF-8, UTF-16, UTF-32

Code points need to be encoded into bytes for storage and transmission. This chapter demystifies the three main Unicode encodings, their trade-offs, and why UTF-8 won the web.

~5,000 단어 · ~20 분

Characters Are Not What You Think

What you see as a single character on screen might be multiple code points combined. This chapter explores combining marks, grapheme clusters, and emoji sequences — the gap between code points and visual characters.

~4,000 단어 · ~16 분

The World's Writing Systems in Unicode

From Latin to CJK, from right-to-left Arabic to vertical Mongolian — Unicode encodes the world's writing systems. This chapter surveys the diversity of scripts and the technical challenges each presents.

~3,500 단어 · ~14 분

Unicode in Your Programming Language

Every programming language handles Unicode differently. This chapter compares string internals, iteration, and slicing across Python, JavaScript, Java, and Rust — with practical examples and gotchas.

~4,000 단어 · ~16 분

Normalization: When Equal Isn't Equal

Two strings that look identical might not be equal. This chapter explains the four Unicode normalization forms (NFC, NFD, NFKC, NFKD), canonical equivalence, and compatibility decomposition.

~4,500 단어 · ~18 분

Security: The Dark Side of Unicode

Unicode's power can be exploited. Homoglyph attacks, bidirectional text abuse, zero-width character injection, and IDN spoofing — this chapter explores the security risks and how to defend against them.

~4,000 단어 · ~16 분

Unicode on the Web: HTML, CSS, and Beyond

The web is built on Unicode — from HTML entities to CSS content properties to web fonts. This chapter covers everything you need to know about using Unicode correctly in web development.

~3,500 단어 · ~14 분

The Future of Unicode

Unicode continues to evolve with new characters, scripts, and emoji. This chapter looks at Unicode 16.0+, the emoji submission process, AI and Unicode, and the quest to encode the world's undeciphered scripts.

~3,000 단어 · ~12 분