🖥️ Platform Guides

Unicode in QR Codes

QR codes can encode Unicode text using UTF-8, but many QR code generators and scanners default to ISO 8859-1, causing non-Latin characters to appear garbled when scanned. This guide explains how QR codes handle Unicode, how to generate QR codes with correct Unicode encoding, and how to verify that your QR code encodes non-ASCII text properly.

·

QR codes are everywhere — on product packaging, restaurant menus, business cards, and advertising. They were originally designed for tracking automotive parts in Japanese factories (by Denso Wave in 1994), but their ability to encode text has made them a universal data carrier. What many people do not realize is that QR codes can encode full Unicode text, enabling multilingual content, emoji, and characters from any writing system. This guide explains how Unicode works in QR codes, the encoding modes available, capacity trade-offs, and best practices for generating and scanning Unicode QR codes.

QR Code Encoding Modes

QR codes support four encoding modes, each optimized for different types of data:

Mode Name Characters Bits per Character
0001 Numeric 0-9 3.33 (10 bits per 3 digits)
0010 Alphanumeric 0-9, A-Z, space, $%*+-./: 5.5 (11 bits per 2 chars)
0100 Byte (8-bit) Any byte (0x00-0xFF) 8
1000 Kanji Shift JIS double-byte characters 13

Where does Unicode fit?

Unicode text is encoded using Byte mode — the QR code stores raw bytes, and the convention is to encode those bytes as UTF-8. This means any Unicode character can be stored in a QR code, but it consumes more capacity than ASCII-optimized modes.

The flow for Unicode text:

Unicode text ("Hello, world")
    |
    v
UTF-8 encoding (bytes)
    |
    v
QR code Byte mode (each byte = 8 bits in the QR symbol)
    |
    v
Scanner reads bytes
    |
    v
UTF-8 decoding -> Unicode text

The Kanji mode exception

Kanji mode is a special case for Japanese. It encodes Shift JIS double-byte characters at 13 bits per character, which is more efficient than Byte mode (16 bits for a two-byte Shift JIS character). However, Kanji mode is limited to the Shift JIS character set and cannot encode arbitrary Unicode. For modern multilingual use, Byte mode with UTF-8 is preferred.

Capacity and Unicode

QR code capacity depends on the version (size), error correction level, and encoding mode. Here is the maximum data capacity for the largest QR code (Version 40, 177x177 modules) at the lowest error correction (Level L):

Mode Maximum Capacity (Version 40, Level L)
Numeric 7,089 digits
Alphanumeric 4,296 characters
Byte 2,953 bytes
Kanji 1,817 characters

UTF-8 byte cost per character

Since Unicode in QR codes uses Byte mode with UTF-8 encoding, the cost varies by script:

Character Type UTF-8 Bytes QR Bits (Byte mode) Example
ASCII (U+0000-U+007F) 1 8 A, 1, @
Latin Extended (U+0080-U+07FF) 2 16 e with accent, Cyrillic, Arabic
CJK, most scripts (U+0800-U+FFFF) 3 24 Chinese, Japanese, Korean, Thai
Emoji, rare scripts (U+10000+) 4 32 Emoji, musical symbols

Practical capacity examples

For a Version 10 QR code (57x57 modules) with medium error correction (Level M):

Content Characters Fits?
English URL (ASCII) 213 bytes Yes
Korean text ~71 characters (3 bytes each) Yes
Chinese text ~71 characters (3 bytes each) Yes
Emoji message ~53 emoji (4 bytes each) Yes
Arabic text ~106 characters (2 bytes each) Yes
Mixed English + emoji Varies Calculate per-character

Error correction levels

Level Recovery Overhead Use Case
L (Low) ~7% Least Clean environments, screens
M (Medium) ~15% Moderate General purpose
Q (Quartile) ~25% High Moderate damage expected
H (High) ~30% Most Harsh environments, printed labels

Higher error correction reduces data capacity. For Unicode-heavy QR codes where capacity is tight, Level L or M is recommended.

ECI (Extended Channel Interpretation)

The QR code specification includes ECI (Extended Channel Interpretation) — a mechanism to declare the encoding of Byte mode data:

ECI Value Encoding
000003 ISO-8859-1 (Latin-1)
000020 Shift JIS
000026 UTF-8
000025 UTF-16 Big Endian

The ECI problem

In theory, ECI 26 (UTF-8) should be included in any QR code containing UTF-8 data. In practice:

Scenario Reality
Most QR generators Do not include ECI
Most QR scanners Assume UTF-8 for Byte mode
Specification ECI recommended but not required
Cross-scanner compatibility Better without ECI (some old scanners ignore it)

The industry has converged on an informal standard: Byte mode data is assumed to be UTF-8 unless an ECI indicator says otherwise. Most modern scanners (smartphone cameras, barcode scanner apps) handle UTF-8 correctly without an ECI declaration.

However, if you are encoding text for high-reliability applications (logistics, medical), including ECI 26 is a safer choice. Test with your target scanners.

Generating Unicode QR Codes

Python (qrcode library)

import qrcode

# UTF-8 is the default for string input
qr = qrcode.QRCode(
    version=None,  # Auto-size
    error_correction=qrcode.constants.ERROR_CORRECT_M,
    box_size=10,
    border=4,
)
qr.add_data("Unicode text here")
qr.make(fit=True)

img = qr.make_image(fill_color="black", back_color="white")
img.save("unicode_qr.png")

JavaScript (qrcode.js)

// Browser
const qr = new QRCode(document.getElementById("qrcode"), {
    text: "Unicode text here",
    width: 256,
    height: 256,
    correctLevel: QRCode.CorrectLevel.M
});

// Node.js (qrcode package)
const QRCode = require('qrcode');
QRCode.toFile('unicode_qr.png', 'Unicode text here');

Handling capacity overflow

If your Unicode text exceeds the capacity of a given QR version:

  1. Increase QR version: Version 1 (21x21) to Version 40 (177x177)
  2. Lower error correction: From H to M or L
  3. Use a URL shortener: Encode a short URL that redirects to the full content
  4. Compress the text: For structured data, use a compact format
  5. Split across multiple QR codes: Some standards (e.g., Structured Append) allow linking multiple QR codes

Scanning and Decoding

How scanners handle Unicode

Modern QR code scanners (smartphone cameras, Google Lens, dedicated apps) typically:

  1. Detect Byte mode data
  2. Check for ECI indicator (if present, use declared encoding)
  3. If no ECI, attempt UTF-8 decoding
  4. If UTF-8 fails, fall back to ISO-8859-1 or system locale

Scanner compatibility for Unicode

Scanner UTF-8 Support ECI Support
iOS Camera Excellent Yes
Android Camera (Google) Excellent Yes
Google Lens Excellent Yes
WeChat (built-in scanner) Excellent Yes
Dedicated barcode apps Usually good Varies
Older industrial scanners May need ECI Partial
ZXing library Excellent Yes

Common scanning issues

Problem Cause Fix
Garbled text after scan Scanner assumed Latin-1 instead of UTF-8 Add ECI indicator or test with different scanner
Partial text QR too small / too much data Increase QR version or reduce content
Emoji not displaying Scanner decoded correctly but display font lacks emoji Scanner/OS issue, not QR issue
Mixed script text broken BiDi rendering issue in display app Not a QR encoding problem

Use Cases for Unicode QR Codes

Use Case Content Script
Restaurant menu Menu items in local language CJK, Thai, Arabic, etc.
Business card (vCard) Name, company in native script Any
Wi-Fi login SSID with Unicode characters Any
Product label Multilingual product info Multiple scripts
Event ticket Attendee name in native script Any
Cryptocurrency Address + memo in local language Any
Tourism Landmark description in visitor's language Multiple

Wi-Fi QR code with Unicode SSID

The Wi-Fi QR code format supports Unicode SSIDs:

WIFI:T:WPA;S:MyNetwork;P:password123;;

If the SSID contains Unicode characters, they are encoded as UTF-8 bytes in Byte mode. Most smartphone Wi-Fi QR scanners handle this correctly.

Key Takeaways

  • QR codes store Unicode text via Byte mode with UTF-8 encoding. Any Unicode character — CJK, Arabic, emoji, and more — can be encoded in a QR code.
  • Capacity decreases with character complexity: ASCII uses 1 byte per character, CJK uses 3, and emoji use 4. Plan your QR version and error correction level accordingly.
  • ECI indicators formally declare UTF-8 encoding but are optional in practice. Most modern scanners assume UTF-8 by default.
  • For maximum compatibility, keep QR content short, use error correction level M, and test with multiple scanners (iOS, Android, dedicated apps).
  • When Unicode content exceeds QR capacity, use a URL shortener to redirect to the full content rather than trying to encode everything directly.

More in Platform Guides