Codificación porcentual (codificación URL)
Codificación de caracteres no ASCII y reservados en URLs reemplazando cada byte con %XX. Se usa UTF-8 primero y luego cada byte se codifica en porcentaje: é → %C3%A9.
What Is URL Encoding?
URL encoding — formally called percent-encoding — is the mechanism for representing arbitrary bytes in a URI. Any byte that is not an "unreserved" ASCII character is replaced with a percent sign followed by two uppercase hexadecimal digits representing the byte value: %XX.
The unreserved characters (which are never encoded) are: A–Z, a–z, 0–9, -, _, ., and ~.
Reserved characters like /, ?, #, &, = have special meaning in URLs and must be encoded when they appear as data rather than syntax.
Encoding Unicode Characters
Unicode characters above U+007F are first encoded as UTF-8 bytes, and each byte is then percent-encoded. The character ✓ (U+2713, CHECK MARK) has the UTF-8 byte sequence E2 9C 93, so it becomes %E2%9C%93 in a URL.
from urllib.parse import quote, unquote, urlencode
# Encoding
quote("café") # "caf%C3%A9"
quote("✓ done") # "%E2%9C%93%20done"
quote("한국어") # "%ED%95%9C%EA%B5%AD%EC%96%B4"
quote("😀") # "%F0%9F%98%80"
# Encoding query parameters (spaces → +)
quote_plus("hello world") # "hello+world"
urlencode({"q": "unicode ✓"}) # "q=unicode+%E2%9C%93"
# Decoding
unquote("%E2%9C%93") # "✓"
unquote("%ED%95%9C") # "한"
JavaScript URL Encoding
JavaScript provides two levels of encoding:
// encodeURIComponent — encodes everything except unreserved chars
// Use for values in query strings, path segments, fragment identifiers
encodeURIComponent("hello world"); // "hello%20world"
encodeURIComponent("café"); // "caf%C3%A9"
encodeURIComponent("a/b"); // "a%2Fb"
// encodeURI — preserves URL structure characters (/:@?#&=+$,;)
// Use for a complete URL
encodeURI("https://example.com/path?q=café");
// "https://example.com/path?q=caf%C3%A9"
// Decoding
decodeURIComponent("caf%C3%A9"); // "café"
Practical Examples
Search query: /search?q=Unicode%20%E2%9C%93
File path: /files/caf%C3%A9.pdf
Hash fragment: /docs#section-%EC%84%B9%EC%85%98
Form field: name=%E6%9D%B1%E4%BA%AC&lang=ja
IDN and Path Encoding
Domain names use a different system (Punycode) rather than percent-encoding. URL paths and query strings use percent-encoding. A URL combining both looks like:
https://xn--e1afmapc.com/path/%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D1%80
Common Mistakes
Wrong encoding: Using the system's default encoding instead of UTF-8. RFC 3986 specifies UTF-8 for non-ASCII characters.
Double encoding: Encoding an already-encoded string. %25 is % encoded; encoding again gives %2525.
Encoding slashes in paths: Use quote(path, safe='/') in Python to preserve path separators.
# safe parameter preserves specified characters
quote("/path/to/file", safe="/") # "/path/to/file"
quote("/path/to/file", safe="") # "%2Fpath%2Fto%2Ffile"
Quick Facts
| Property | Value |
|---|---|
| Format | %XX where XX is a hex byte value |
| Unreserved chars (never encoded) | A–Z a–z 0–9 - _ . ~ |
| Unicode encoding | UTF-8 bytes, then percent-encode each byte |
| Space encoding | %20 (RFC) or + (application/x-www-form-urlencoded) |
| Python function | urllib.parse.quote() |
| JS function | encodeURIComponent() |
| Max length | Technically unlimited; browsers typically support 2,000+ chars |
Términos relacionados
Más en Web y HTML
CSS properties (direction, writing-mode, unicode-bidi) controlling text layout direction. Works with Unicode …
Una representación textual de un carácter en HTML. Tres formas: con nombre …
Nombres de dominio que contienen caracteres Unicode no ASCII, almacenados internamente como …
ECMAScript Internationalization API providing locale-aware string comparison (Collator), number formatting (NumberFormat), date …
Parámetro de encabezado HTTP que declara la codificación de caracteres de una …
Renderizado de un carácter con un glifo de emoji en color, normalmente …
Renderizado de un carácter con un glifo de texto monocromo plano en …
Propiedad CSS que inserta contenido generado mediante los pseudoelementos ::before y ::after …
Codificación compatible con ASCII de nombres de dominio Unicode, convirtiendo etiquetas internacionalizadas …
Entidad HTML que utiliza un nombre legible por humanos: © → ©, …