🧭

The Unicode Odyssey

From Zero to Unicode Expert

A 10-chapter journey that takes you from knowing nothing about character encoding to understanding the full depth of Unicode. The definitive learning path for developers who want to truly understand the system.

10 chapters · 39,000 คำ · ~156 นาทีที่อ่าน
1

The Problem: Why We Need Unicode

Before Unicode, every language needed its own encoding — leading to the chaos of mojibake and incompatible systems. This chapter explores the fragmented world of code pages and why a universal standard became essential.

~3,500 คำ · ~14 นท.
2

The Solution: How Unicode Works

Unicode assigns a unique number — a code point — to every character in every language. This chapter explains the elegant structure of planes, blocks, and the Basic Multilingual Plane.

~4,000 คำ · ~16 นท.
3

Encoding the Codepoints: UTF-8, UTF-16, UTF-32

Code points need to be encoded into bytes for storage and transmission. This chapter demystifies the three main Unicode encodings, their trade-offs, and why UTF-8 won the web.

~5,000 คำ · ~20 นท.
4

Characters Are Not What You Think

What you see as a single character on screen might be multiple code points combined. This chapter explores combining marks, grapheme clusters, and emoji sequences — the gap between code points and visual characters.

~4,000 คำ · ~16 นท.
5

The World's Writing Systems in Unicode

From Latin to CJK, from right-to-left Arabic to vertical Mongolian — Unicode encodes the world's writing systems. This chapter surveys the diversity of scripts and the technical challenges each presents.

~3,500 คำ · ~14 นท.
6

Unicode in Your Programming Language

Every programming language handles Unicode differently. This chapter compares string internals, iteration, and slicing across Python, JavaScript, Java, and Rust — with practical examples and gotchas.

~4,000 คำ · ~16 นท.
7

Normalization: When Equal Isn't Equal

Two strings that look identical might not be equal. This chapter explains the four Unicode normalization forms (NFC, NFD, NFKC, NFKD), canonical equivalence, and compatibility decomposition.

~4,500 คำ · ~18 นท.
8

Security: The Dark Side of Unicode

Unicode's power can be exploited. Homoglyph attacks, bidirectional text abuse, zero-width character injection, and IDN spoofing — this chapter explores the security risks and how to defend against them.

~4,000 คำ · ~16 นท.
9

Unicode on the Web: HTML, CSS, and Beyond

The web is built on Unicode — from HTML entities to CSS content properties to web fonts. This chapter covers everything you need to know about using Unicode correctly in web development.

~3,500 คำ · ~14 นท.
10

The Future of Unicode

Unicode continues to evolve with new characters, scripts, and emoji. This chapter looks at Unicode 16.0+, the emoji submission process, AI and Unicode, and the quest to encode the world's undeciphered scripts.

~3,000 คำ · ~12 นท.