网页与 HTML

Content-Type 字符集

声明响应字符编码的HTTP头参数(Content-Type: text/html; charset=utf-8),优先级高于文档内的编码声明。

· Updated

What Is the Content-Type Charset Parameter?

The Content-Type HTTP response header tells the browser two things: the media type of the response body (like text/html) and — optionally — the character encoding used to encode it. The encoding is specified via the charset parameter:

Content-Type: text/html; charset=UTF-8
Content-Type: text/plain; charset=ISO-8859-1
Content-Type: application/json; charset=UTF-8

Without this parameter, browsers must guess the encoding using heuristics, byte-order marks, or HTML meta tags — a process that can go wrong and produce garbled text (mojibake).

Why It Matters for Unicode

Unicode text is abstract code points. To transmit it over a network, you must encode those code points as bytes. UTF-8 is by far the most common encoding — it can represent every Unicode code point and is backwards-compatible with ASCII. If the server sends UTF-8 bytes but the browser interprets them as ISO-8859-1, multi-byte sequences will be misread.

Example: The string "café" encoded in UTF-8 is 63 61 66 C3 A9. If interpreted as ISO-8859-1: - C3à - A9© - Result displayed: café — classic mojibake.

Header vs. Meta Tag

For HTML, the encoding can be declared in two places:

<!-- HTML meta tag (in-document declaration) -->
<meta charset="UTF-8">
<!-- or legacy form: -->
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
# HTTP header (sent by server)
Content-Type: text/html; charset=UTF-8

The HTTP header takes precedence over the meta tag when both are present. The meta tag is a fallback for situations where the HTTP header is absent (e.g., opening a local HTML file).

Setting the Charset in Django

# Django sets UTF-8 by default in settings
DEFAULT_CHARSET = "utf-8"

# Response header automatically becomes:
# Content-Type: text/html; charset=utf-8

# Custom response
from django.http import HttpResponse
response = HttpResponse("Hello, 世界", content_type="text/plain; charset=utf-8")

Setting the Charset in Other Environments

# Flask
from flask import Flask, Response
app = Flask(__name__)

@app.route("/")
def index():
    return Response("Hello, 世界", mimetype="text/plain; charset=utf-8")
// Node.js / Express
res.setHeader("Content-Type", "text/html; charset=utf-8");
res.send("<p>Hello, 世界</p>");
# Nginx — add charset to responses
charset utf-8;
charset_types text/html text/plain text/css application/javascript;

JSON and charset

RFC 7159 and RFC 8259 specify that JSON must be encoded in UTF-8, UTF-16, or UTF-32. In practice, application/json is almost always UTF-8, and the charset parameter is technically redundant but harmless:

Content-Type: application/json; charset=UTF-8

Modern HTTP APIs typically omit the charset for JSON since UTF-8 is assumed.

BOM (Byte Order Mark)

Some tools prepend a UTF-8 BOM (EF BB BF) to UTF-8 files. Browsers recognize this as a UTF-8 signal, but the BOM itself is an invisible character that can cause issues in JavaScript and JSON parsing. Prefer the charset header over relying on BOMs.

Quick Facts

Property Value
Header format Content-Type: text/html; charset=UTF-8
Priority vs. meta tag HTTP header wins when both present
Recommended charset UTF-8 for all new content
Default if omitted Browser heuristics (unreliable)
JSON standard UTF-8 assumed; charset optional
Django default utf-8 via DEFAULT_CHARSET setting
Case sensitivity charset parameter name is case-insensitive; value usually uppercase by convention

相关术语

网页与 HTML 中的更多内容