Pengkodean Unicode dengan panjang variabel menggunakan 1–4 byte per karakter. Pengkodean dominan di web (98%+ situs web) dengan kompatibilitas mundur penuh terhadap ASCII.

Encoding yang kompatibel dengan ASCII untuk nama domain Unicode, mengonversi label yang diinternasionalisasi menjadi string ASCII dengan prefiks xn--. münchen.de → xn--mnchen-3ya.de.

Web & HTML

Pengkodean persen (pengkodean URL)

Encoding karakter non-ASCII dan karakter khusus dalam URL dengan mengganti setiap byte dengan %XX. UTF-8 digunakan terlebih dahulu, kemudian setiap byte di-percent-encode: é → %C3%A9.

2023-11-27 · Updated 2025-02-03

What Is URL Encoding?

URL encoding — formally called percent-encoding — is the mechanism for representing arbitrary bytes in a URI. Any byte that is not an "unreserved" ASCII character is replaced with a percent sign followed by two uppercase hexadecimal digits representing the byte value: %XX.

The unreserved characters (which are never encoded) are: A–Z, a–z, 0–9, -, _, ., and ~.

Reserved characters like /, ?, #, &, = have special meaning in URLs and must be encoded when they appear as data rather than syntax.

Encoding Unicode Characters

Unicode characters above U+007F are first encoded as UTF-8 bytes, and each byte is then percent-encoded. The character ✓ (U+2713, CHECK MARK) has the UTF-8 byte sequence E2 9C 93, so it becomes %E2%9C%93 in a URL.

from urllib.parse import quote, unquote, urlencode

# Encoding
quote("café")         # "caf%C3%A9"
quote("✓ done")       # "%E2%9C%93%20done"
quote("한국어")        # "%ED%95%9C%EA%B5%AD%EC%96%B4"
quote("😀")           # "%F0%9F%98%80"

# Encoding query parameters (spaces → +)
quote_plus("hello world")  # "hello+world"
urlencode({"q": "unicode ✓"})  # "q=unicode+%E2%9C%93"

# Decoding
unquote("%E2%9C%93")   # "✓"
unquote("%ED%95%9C")   # "한"

JavaScript URL Encoding

JavaScript provides two levels of encoding:

// encodeURIComponent — encodes everything except unreserved chars
// Use for values in query strings, path segments, fragment identifiers
encodeURIComponent("hello world");  // "hello%20world"
encodeURIComponent("café");         // "caf%C3%A9"
encodeURIComponent("a/b");          // "a%2Fb"

// encodeURI — preserves URL structure characters (/:@?#&=+$,;)
// Use for a complete URL
encodeURI("https://example.com/path?q=café");
// "https://example.com/path?q=caf%C3%A9"

// Decoding
decodeURIComponent("caf%C3%A9");    // "café"

Practical Examples

Search query: /search?q=Unicode%20%E2%9C%93
File path: /files/caf%C3%A9.pdf
Hash fragment: /docs#section-%EC%84%B9%EC%85%98
Form field: name=%E6%9D%B1%E4%BA%AC&lang=ja

IDN and Path Encoding

Domain names use a different system (Punycode) rather than percent-encoding. URL paths and query strings use percent-encoding. A URL combining both looks like:

https://xn--e1afmapc.com/path/%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D1%80

Common Mistakes

Wrong encoding: Using the system's default encoding instead of UTF-8. RFC 3986 specifies UTF-8 for non-ASCII characters.

Double encoding: Encoding an already-encoded string. %25 is % encoded; encoding again gives %2525.

Encoding slashes in paths: Use quote(path, safe='/') in Python to preserve path separators.

# safe parameter preserves specified characters
quote("/path/to/file", safe="/")  # "/path/to/file"
quote("/path/to/file", safe="")   # "%2Fpath%2Fto%2Ffile"

Quick Facts

Property	Value
Format	`%XX` where XX is a hex byte value
Unreserved chars (never encoded)	`A–Z a–z 0–9 - _ . ~`
Unicode encoding	UTF-8 bytes, then percent-encode each byte
Space encoding	`%20` (RFC) or `+` (application/x-www-form-urlencoded)
Python function	`urllib.parse.quote()`
JS function	`encodeURIComponent()`
Max length	Technically unlimited; browsers typically support 2,000+ chars

Istilah Terkait

UTF-8 Punycode

Lainnya di Web & HTML

CSS Text Direction

CSS properties (direction, writing-mode, unicode-bidi) controlling text layout direction. Works with Unicode …

Entitas HTML

Representasi tekstual sebuah karakter dalam HTML. Tiga bentuk: nama (&), desimal (&), …

Internationalized Domain Name (IDN)

Nama domain yang berisi karakter Unicode non-ASCII, disimpan secara internal sebagai Punycode …

JavaScript Intl API

ECMAScript Internationalization API providing locale-aware string comparison (Collator), number formatting (NumberFormat), date …

Kumpulan karakter Content-Type

Parameter header HTTP yang mendeklarasikan encoding karakter dari sebuah respons (Content-Type: text/html; …

Pemilih variasi

Karakter (U+FE00–U+FE0F, U+E0100–U+E01EF) yang memilih varian glyph tertentu. VS15 (U+FE0E) = tampilan …

Penggabung kata

U+2060. Karakter zero-width yang mencegah pemisahan baris. Pengganti modern U+FEFF (BOM) sebagai …

Properti CSS content

Properti CSS yang menyisipkan konten yang dihasilkan via pseudo-elemen ::before dan ::after …

Punycode

Encoding yang kompatibel dengan ASCII untuk nama domain Unicode, mengonversi label yang …

Referensi karakter bernama

← Kembali ke Glosarium