📊 인코딩 시각화 도구
문자가 UTF-8, UTF-16, UTF-32로 바이트 수준에서 어떻게 인코딩되는지 시각적으로 확인합니다. 헤더 비트, 페이로드 비트, 서로게이트 쌍을 표시합니다.
UTF-8
UTF-16
UTF-32
고정 4바이트 — 코드 포인트 값 직접 저장
비교
문자를 입력하면 UTF-8, UTF-16, UTF-32로 어떻게 인코딩되는지 확인할 수 있습니다
Type or paste any character or short string into the input field. Single characters reveal the most detail; strings show how multiple code points chain together as byte sequences in each encoding form.
The tool renders each encoding as a color-coded byte diagram. UTF-8 shows the bit pattern that signals 1-, 2-, 3-, or 4-byte sequences; UTF-16 shows each 16-bit code unit and surrogate pairs; UTF-32 shows the fixed 4-byte representation. Hover over bytes to see the bit values and their structural role in the encoding.
Use the side-by-side comparison view to see how the same character differs in byte count, bit layout, and hexadecimal representation across UTF-8, UTF-16 LE/BE, UTF-32 LE/BE, Latin-1, and Windows-1252. This view is especially valuable for understanding why supplementary plane characters require more bytes.
Character encoding is the mechanism by which abstract Unicode code points become concrete sequences of bytes suitable for storage and transmission. While Unicode defines the character inventory and assigns code points, encoding transforms those code points into binary data. The three main Unicode encoding forms — UTF-8, UTF-16, and UTF-32 — make different trade-offs between storage efficiency, processing simplicity, and compatibility with existing systems.
UTF-8 dominates the web (over 98% of web pages) because it uses only one byte for ASCII characters (U+0000–U+007F), making it compact for English and programming language text while still supporting all 1.1 million Unicode code points. It uses 2–4 bytes for non-ASCII characters, with the byte count determined by the code point's range. UTF-16 uses 2 bytes for Basic Multilingual Plane characters and 4 bytes (a surrogate pair) for supplementary characters, making it efficient for texts heavy in CJK characters. UTF-32 uses a fixed 4 bytes per code point regardless, which simplifies random access at the cost of 2–4x more storage than UTF-8 for typical text.
Visualizing encodings at the bit and byte level demystifies the machinery behind internationalized software. Understanding why a 4-byte emoji occupies 2 JavaScript string indices, why a 3-byte CJK character becomes a 2-byte UTF-16 code unit, or why UTF-8's self-synchronizing property enables fast error recovery — these concepts are essential for engineers building text editors, database schemas, network protocols, and any system where strings cross language or platform boundaries. Encoding awareness prevents the class of bugs that arise from conflating bytes, code units, and code points.
문자가 UTF-8, UTF-16, UTF-32로 바이트 수준에서 어떻게 인코딩되는지 시각적으로 확인합니다. 헤더 비트, 페이로드 비트, 서로게이트 쌍을 표시합니다.
고정 4바이트 — 코드 포인트 값 직접 저장
문자를 입력하면 UTF-8, UTF-16, UTF-32로 어떻게 인코딩되는지 확인할 수 있습니다