Web & HTML

Pengkodean persen (pengkodean URL)

Encoding karakter non-ASCII dan karakter khusus dalam URL dengan mengganti setiap byte dengan %XX. UTF-8 digunakan terlebih dahulu, kemudian setiap byte di-percent-encode: é → %C3%A9.

· Updated

What Is URL Encoding?

URL encoding — formally called percent-encoding — is the mechanism for representing arbitrary bytes in a URI. Any byte that is not an "unreserved" ASCII character is replaced with a percent sign followed by two uppercase hexadecimal digits representing the byte value: %XX.

The unreserved characters (which are never encoded) are: A–Z, a–z, 0–9, -, _, ., and ~.

Reserved characters like /, ?, #, &, = have special meaning in URLs and must be encoded when they appear as data rather than syntax.

Encoding Unicode Characters

Unicode characters above U+007F are first encoded as UTF-8 bytes, and each byte is then percent-encoded. The character (U+2713, CHECK MARK) has the UTF-8 byte sequence E2 9C 93, so it becomes %E2%9C%93 in a URL.

from urllib.parse import quote, unquote, urlencode

# Encoding
quote("café")         # "caf%C3%A9"
quote("✓ done")       # "%E2%9C%93%20done"
quote("한국어")        # "%ED%95%9C%EA%B5%AD%EC%96%B4"
quote("😀")           # "%F0%9F%98%80"

# Encoding query parameters (spaces → +)
quote_plus("hello world")  # "hello+world"
urlencode({"q": "unicode ✓"})  # "q=unicode+%E2%9C%93"

# Decoding
unquote("%E2%9C%93")   # "✓"
unquote("%ED%95%9C")   # "한"

JavaScript URL Encoding

JavaScript provides two levels of encoding:

// encodeURIComponent — encodes everything except unreserved chars
// Use for values in query strings, path segments, fragment identifiers
encodeURIComponent("hello world");  // "hello%20world"
encodeURIComponent("café");         // "caf%C3%A9"
encodeURIComponent("a/b");          // "a%2Fb"

// encodeURI — preserves URL structure characters (/:@?#&=+$,;)
// Use for a complete URL
encodeURI("https://example.com/path?q=café");
// "https://example.com/path?q=caf%C3%A9"

// Decoding
decodeURIComponent("caf%C3%A9");    // "café"

Practical Examples

Search query: /search?q=Unicode%20%E2%9C%93
File path: /files/caf%C3%A9.pdf
Hash fragment: /docs#section-%EC%84%B9%EC%85%98
Form field: name=%E6%9D%B1%E4%BA%AC&lang=ja

IDN and Path Encoding

Domain names use a different system (Punycode) rather than percent-encoding. URL paths and query strings use percent-encoding. A URL combining both looks like:

https://xn--e1afmapc.com/path/%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D1%80

Common Mistakes

Wrong encoding: Using the system's default encoding instead of UTF-8. RFC 3986 specifies UTF-8 for non-ASCII characters.

Double encoding: Encoding an already-encoded string. %25 is % encoded; encoding again gives %2525.

Encoding slashes in paths: Use quote(path, safe='/') in Python to preserve path separators.

# safe parameter preserves specified characters
quote("/path/to/file", safe="/")  # "/path/to/file"
quote("/path/to/file", safe="")   # "%2Fpath%2Fto%2Ffile"

Quick Facts

Property Value
Format %XX where XX is a hex byte value
Unreserved chars (never encoded) A–Z a–z 0–9 - _ . ~
Unicode encoding UTF-8 bytes, then percent-encode each byte
Space encoding %20 (RFC) or + (application/x-www-form-urlencoded)
Python function urllib.parse.quote()
JS function encodeURIComponent()
Max length Technically unlimited; browsers typically support 2,000+ chars

Istilah Terkait

Lainnya di Web & HTML