🖥️ Platform Guides

Unicode in QR Codes

QR codes can encode Unicode text using UTF-8, but many QR code generators and scanners default to ISO 8859-1, causing non-Latin characters to appear garbled when scanned. This guide explains how QR codes handle Unicode, how to generate QR codes with correct Unicode encoding, and how to verify that your QR code encodes non-ASCII text properly.

Published 2024-08-12 · Updated 2025-11-03

QR codes are everywhere — on product packaging, restaurant menus, business cards, and advertising. They were originally designed for tracking automotive parts in Japanese factories (by Denso Wave in 1994), but their ability to encode text has made them a universal data carrier. What many people do not realize is that QR codes can encode full Unicode text, enabling multilingual content, emoji, and characters from any writing system. This guide explains how Unicode works in QR codes, the encoding modes available, capacity trade-offs, and best practices for generating and scanning Unicode QR codes.

QR Code Encoding Modes

QR codes support four encoding modes, each optimized for different types of data:

Mode	Name	Characters	Bits per Character
0001	Numeric	0-9	3.33 (10 bits per 3 digits)
0010	Alphanumeric	0-9, A-Z, space, $%*+-./:	5.5 (11 bits per 2 chars)
0100	Byte (8-bit)	Any byte (0x00-0xFF)	8
1000	Kanji	Shift JIS double-byte characters	13

Where does Unicode fit?

Unicode text is encoded using Byte mode — the QR code stores raw bytes, and the convention is to encode those bytes as UTF-8. This means any Unicode character can be stored in a QR code, but it consumes more capacity than ASCII-optimized modes.

The flow for Unicode text:

Unicode text ("Hello, world")
    |
    v
UTF-8 encoding (bytes)
    |
    v
QR code Byte mode (each byte = 8 bits in the QR symbol)
    |
    v
Scanner reads bytes
    |
    v
UTF-8 decoding -> Unicode text

The Kanji mode exception

Kanji mode is a special case for Japanese. It encodes Shift JIS double-byte characters at 13 bits per character, which is more efficient than Byte mode (16 bits for a two-byte Shift JIS character). However, Kanji mode is limited to the Shift JIS character set and cannot encode arbitrary Unicode. For modern multilingual use, Byte mode with UTF-8 is preferred.

Capacity and Unicode

QR code capacity depends on the version (size), error correction level, and encoding mode. Here is the maximum data capacity for the largest QR code (Version 40, 177x177 modules) at the lowest error correction (Level L):

Mode	Maximum Capacity (Version 40, Level L)
Numeric	7,089 digits
Alphanumeric	4,296 characters
Byte	2,953 bytes
Kanji	1,817 characters

UTF-8 byte cost per character

Since Unicode in QR codes uses Byte mode with UTF-8 encoding, the cost varies by script:

Character Type	UTF-8 Bytes	QR Bits (Byte mode)	Example
ASCII (U+0000-U+007F)	1	8	A, 1, @
Latin Extended (U+0080-U+07FF)	2	16	e with accent, Cyrillic, Arabic
CJK, most scripts (U+0800-U+FFFF)	3	24	Chinese, Japanese, Korean, Thai
Emoji, rare scripts (U+10000+)	4	32	Emoji, musical symbols

Practical capacity examples

For a Version 10 QR code (57x57 modules) with medium error correction (Level M):

Content	Characters	Fits?
English URL (ASCII)	213 bytes	Yes
Korean text	~71 characters (3 bytes each)	Yes
Chinese text	~71 characters (3 bytes each)	Yes
Emoji message	~53 emoji (4 bytes each)	Yes
Arabic text	~106 characters (2 bytes each)	Yes
Mixed English + emoji	Varies	Calculate per-character

Error correction levels

Level	Recovery	Overhead	Use Case
L (Low)	~7%	Least	Clean environments, screens
M (Medium)	~15%	Moderate	General purpose
Q (Quartile)	~25%	High	Moderate damage expected
H (High)	~30%	Most	Harsh environments, printed labels

Higher error correction reduces data capacity. For Unicode-heavy QR codes where capacity is tight, Level L or M is recommended.

ECI (Extended Channel Interpretation)

The QR code specification includes ECI (Extended Channel Interpretation) — a mechanism to declare the encoding of Byte mode data:

ECI Value	Encoding
000003	ISO-8859-1 (Latin-1)
000020	Shift JIS
000026	UTF-8
000025	UTF-16 Big Endian

The ECI problem

In theory, ECI 26 (UTF-8) should be included in any QR code containing UTF-8 data. In practice:

Scenario	Reality
Most QR generators	Do not include ECI
Most QR scanners	Assume UTF-8 for Byte mode
Specification	ECI recommended but not required
Cross-scanner compatibility	Better without ECI (some old scanners ignore it)

The industry has converged on an informal standard: Byte mode data is assumed to be UTF-8 unless an ECI indicator says otherwise. Most modern scanners (smartphone cameras, barcode scanner apps) handle UTF-8 correctly without an ECI declaration.

However, if you are encoding text for high-reliability applications (logistics, medical), including ECI 26 is a safer choice. Test with your target scanners.

Generating Unicode QR Codes

Python (qrcode library)

import qrcode

# UTF-8 is the default for string input
qr = qrcode.QRCode(
    version=None,  # Auto-size
    error_correction=qrcode.constants.ERROR_CORRECT_M,
    box_size=10,
    border=4,
)
qr.add_data("Unicode text here")
qr.make(fit=True)

img = qr.make_image(fill_color="black", back_color="white")
img.save("unicode_qr.png")

JavaScript (qrcode.js)

// Browser
const qr = new QRCode(document.getElementById("qrcode"), {
    text: "Unicode text here",
    width: 256,
    height: 256,
    correctLevel: QRCode.CorrectLevel.M
});

// Node.js (qrcode package)
const QRCode = require('qrcode');
QRCode.toFile('unicode_qr.png', 'Unicode text here');

Handling capacity overflow

If your Unicode text exceeds the capacity of a given QR version:

Increase QR version: Version 1 (21x21) to Version 40 (177x177)
Lower error correction: From H to M or L
Use a URL shortener: Encode a short URL that redirects to the full content
Compress the text: For structured data, use a compact format
Split across multiple QR codes: Some standards (e.g., Structured Append) allow linking multiple QR codes

Scanning and Decoding

How scanners handle Unicode

Modern QR code scanners (smartphone cameras, Google Lens, dedicated apps) typically:

Detect Byte mode data
Check for ECI indicator (if present, use declared encoding)
If no ECI, attempt UTF-8 decoding
If UTF-8 fails, fall back to ISO-8859-1 or system locale

Scanner compatibility for Unicode

Scanner	UTF-8 Support	ECI Support
iOS Camera	Excellent	Yes
Android Camera (Google)	Excellent	Yes
Google Lens	Excellent	Yes
WeChat (built-in scanner)	Excellent	Yes
Dedicated barcode apps	Usually good	Varies
Older industrial scanners	May need ECI	Partial
ZXing library	Excellent	Yes

Common scanning issues

Problem	Cause	Fix
Garbled text after scan	Scanner assumed Latin-1 instead of UTF-8	Add ECI indicator or test with different scanner
Partial text	QR too small / too much data	Increase QR version or reduce content
Emoji not displaying	Scanner decoded correctly but display font lacks emoji	Scanner/OS issue, not QR issue
Mixed script text broken	BiDi rendering issue in display app	Not a QR encoding problem

Use Cases for Unicode QR Codes

Use Case	Content	Script
Restaurant menu	Menu items in local language	CJK, Thai, Arabic, etc.
Business card (vCard)	Name, company in native script	Any
Wi-Fi login	SSID with Unicode characters	Any
Product label	Multilingual product info	Multiple scripts
Event ticket	Attendee name in native script	Any
Cryptocurrency	Address + memo in local language	Any
Tourism	Landmark description in visitor's language	Multiple

Wi-Fi QR code with Unicode SSID

The Wi-Fi QR code format supports Unicode SSIDs:

WIFI:T:WPA;S:MyNetwork;P:password123;;

If the SSID contains Unicode characters, they are encoded as UTF-8 bytes in Byte mode. Most smartphone Wi-Fi QR scanners handle this correctly.

Key Takeaways

QR codes store Unicode text via Byte mode with UTF-8 encoding. Any Unicode character — CJK, Arabic, emoji, and more — can be encoded in a QR code.
Capacity decreases with character complexity: ASCII uses 1 byte per character, CJK uses 3, and emoji use 4. Plan your QR version and error correction level accordingly.
ECI indicators formally declare UTF-8 encoding but are optional in practice. Most modern scanners assume UTF-8 by default.
For maximum compatibility, keep QR content short, use error correction level M, and test with multiple scanners (iOS, Android, dedicated apps).
When Unicode content exceeds QR capacity, use a URL shortener to redirect to the full content rather than trying to encode everything directly.

Platform Guides 中的更多内容

Unicode in Microsoft Word

Microsoft Word supports the full Unicode character set and provides several methods …

Unicode in Google Docs & Sheets

Google Docs and Sheets use UTF-8 internally and provide a Special Characters …

Unicode in Terminal / Command Line

Modern terminals support Unicode and UTF-8, but correctly displaying all Unicode characters …

Unicode in PDF Documents

PDF supports Unicode text through embedded fonts and ToUnicode maps, but many …

Unicode in Excel

Microsoft Excel stores text in Unicode but has historically struggled with non-Latin …

Unicode in Social Media

Social media platforms handle Unicode text with varying degrees of support, affecting …

Unicode in XML and JSON

Both XML and JSON are defined to use Unicode text, but each …

Unicode in Data Science and NLP

Natural language processing and data science pipelines frequently encounter Unicode issues including …

Unicode in Passwords: Security Implications

Allowing Unicode characters in passwords increases the keyspace and can improve security, …

← 返回指南列表