Content-Type 字符集
声明响应字符编码的HTTP头参数(Content-Type: text/html; charset=utf-8),优先级高于文档内的编码声明。
What Is the Content-Type Charset Parameter?
The Content-Type HTTP response header tells the browser two things: the media type of the response body (like text/html) and — optionally — the character encoding used to encode it. The encoding is specified via the charset parameter:
Content-Type: text/html; charset=UTF-8
Content-Type: text/plain; charset=ISO-8859-1
Content-Type: application/json; charset=UTF-8
Without this parameter, browsers must guess the encoding using heuristics, byte-order marks, or HTML meta tags — a process that can go wrong and produce garbled text (mojibake).
Why It Matters for Unicode
Unicode text is abstract code points. To transmit it over a network, you must encode those code points as bytes. UTF-8 is by far the most common encoding — it can represent every Unicode code point and is backwards-compatible with ASCII. If the server sends UTF-8 bytes but the browser interprets them as ISO-8859-1, multi-byte sequences will be misread.
Example: The string "café" encoded in UTF-8 is 63 61 66 C3 A9. If interpreted as ISO-8859-1:
- C3 → Ã
- A9 → ©
- Result displayed: café — classic mojibake.
Header vs. Meta Tag
For HTML, the encoding can be declared in two places:
<!-- HTML meta tag (in-document declaration) -->
<meta charset="UTF-8">
<!-- or legacy form: -->
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
# HTTP header (sent by server)
Content-Type: text/html; charset=UTF-8
The HTTP header takes precedence over the meta tag when both are present. The meta tag is a fallback for situations where the HTTP header is absent (e.g., opening a local HTML file).
Setting the Charset in Django
# Django sets UTF-8 by default in settings
DEFAULT_CHARSET = "utf-8"
# Response header automatically becomes:
# Content-Type: text/html; charset=utf-8
# Custom response
from django.http import HttpResponse
response = HttpResponse("Hello, 世界", content_type="text/plain; charset=utf-8")
Setting the Charset in Other Environments
# Flask
from flask import Flask, Response
app = Flask(__name__)
@app.route("/")
def index():
return Response("Hello, 世界", mimetype="text/plain; charset=utf-8")
// Node.js / Express
res.setHeader("Content-Type", "text/html; charset=utf-8");
res.send("<p>Hello, 世界</p>");
# Nginx — add charset to responses
charset utf-8;
charset_types text/html text/plain text/css application/javascript;
JSON and charset
RFC 7159 and RFC 8259 specify that JSON must be encoded in UTF-8, UTF-16, or UTF-32. In practice, application/json is almost always UTF-8, and the charset parameter is technically redundant but harmless:
Content-Type: application/json; charset=UTF-8
Modern HTTP APIs typically omit the charset for JSON since UTF-8 is assumed.
BOM (Byte Order Mark)
Some tools prepend a UTF-8 BOM (EF BB BF) to UTF-8 files. Browsers recognize this as a UTF-8 signal, but the BOM itself is an invisible character that can cause issues in JavaScript and JSON parsing. Prefer the charset header over relying on BOMs.
Quick Facts
| Property | Value |
|---|---|
| Header format | Content-Type: text/html; charset=UTF-8 |
| Priority vs. meta tag | HTTP header wins when both present |
| Recommended charset | UTF-8 for all new content |
| Default if omitted | Browser heuristics (unreliable) |
| JSON standard | UTF-8 assumed; charset optional |
| Django default | utf-8 via DEFAULT_CHARSET setting |
| Case sensitivity | charset parameter name is case-insensitive; value usually uppercase by convention |
相关术语
网页与 HTML 中的更多内容
通过::before和::after伪元素使用Unicode转义插入生成内容的CSS属性:content: '\2713'可插入✓。
CSS properties (direction, writing-mode, unicode-bidi) controlling text layout direction. Works with Unicode …
HTML中字符的文本表示方式,有三种形式:命名(&)、十进制(&)、十六进制(&),对于与HTML语法冲突的字符是必需的。
ECMAScript Internationalization API providing locale-aware string comparison (Collator), number formatting (NumberFormat), date …
将Unicode域名转换为xn--前缀ASCII字符串的ASCII兼容编码,例如münchen.de → xn--mnchen-3ya.de。
CSS supports Unicode via escape sequences (\2713 for ✓), the content property …
XML版本的数字字符引用:✓或✓,XML只有5个命名实体(& < > " '),而HTML5有2,231个。
选择特定字形变体的字符(U+FE00–U+FE0F、U+E0100–U+E01EF),VS15(U+FE0E)表示文本呈现,VS16(U+FE0F)表示表情符号呈现。
使用人类可读名称的HTML实体:© → ©,— → —。HTML5定义了2,231个命名引用,且区分大小写。
包含非ASCII Unicode字符的域名,内部以Punycode(xn--...)存储,但向用户显示为Unicode,安全隐患:同形字攻击。