Unicode in Excel
Microsoft Excel stores text in Unicode but has historically struggled with non-Latin characters in CSV imports, RTL text layout, and font coverage for scripts like Devanagari or Arabic. This guide covers how to handle Unicode correctly in Excel, including the CHAR and UNICODE functions, importing CSV with the right encoding, and displaying international text.
Microsoft Excel is central to business workflows worldwide, and its handling of Unicode has improved dramatically over the years — but significant pitfalls remain, particularly around CSV import/export. This guide explains how Excel handles Unicode in its native XLSX format, the encoding traps with CSV files, and the UNICHAR/UNICODE functions that let you work with code points directly in formulas.
XLSX: Unicode by Default
The modern Excel format (.xlsx) is an Open XML format — a collection of XML files in
a ZIP container. All text is stored as UTF-8 in the XML. This means XLSX natively
supports the full range of Unicode characters.
| Aspect | XLSX Behavior |
|---|---|
| Text encoding | UTF-8 (in XML) |
| Maximum characters per cell | 32,767 |
| Emoji support | Yes (display depends on OS/font) |
| CJK characters | Full support |
| RTL text | Supported (with cell formatting) |
| Supplementary characters (U+10000+) | Stored correctly, display depends on font |
When you save as .xlsx, you never lose Unicode data. The problems arise when you move
data in or out of Excel via CSV, copy-paste, or legacy formats.
The CSV Encoding Problem
CSV (Comma-Separated Values) is a plain text format with no standard encoding declaration. When you open a CSV file in Excel, Excel must guess the encoding — and it often guesses wrong.
Excel's CSV encoding behavior
| Action | Default Encoding | Result |
|---|---|---|
| Open CSV (double-click) | System locale (e.g., Windows-1252) | Non-ASCII corrupted |
| Save as CSV | System locale (e.g., Windows-1252) | Non-ASCII may be lost |
| Save as "CSV UTF-8" | UTF-8 with BOM | Unicode preserved |
| Import via Data > From Text | User-selected | Correct if UTF-8 chosen |
The BOM (Byte Order Mark)
The UTF-8 BOM is the byte sequence EF BB BF at the start of a file. It serves as a
signal to applications that the file is UTF-8:
| File starts with | Excel interprets as |
|---|---|
EF BB BF (UTF-8 BOM) |
UTF-8 |
FF FE (UTF-16 LE BOM) |
UTF-16 Little Endian |
FE FF (UTF-16 BE BOM) |
UTF-16 Big Endian |
| No BOM | System locale (often Windows-1252 or Shift_JIS) |
Critical rule: If you generate CSV files for Excel consumption, always include the UTF-8 BOM. Without it, Excel will mangle non-ASCII characters.
Generating UTF-8 CSV with BOM
In Python:
import csv
with open("output.csv", "w", encoding="utf-8-sig", newline="") as f:
writer = csv.writer(f)
writer.writerow(["Name", "City", "Symbol"])
writer.writerow(["Muller", "Munchen", "EUR sign is here"])
The utf-8-sig encoding writes the BOM automatically. The newline="" parameter
prevents double line breaks on Windows.
In other languages:
// Node.js
const BOM = '\uFEFF';
fs.writeFileSync('output.csv', BOM + csvContent, 'utf8');
Opening UTF-8 CSV correctly in Excel
If you receive a UTF-8 CSV without BOM:
- Open Excel (do not open the CSV directly)
- Go to Data > From Text/CSV
- In the import wizard, set File Origin to "65001: Unicode (UTF-8)"
- Preview the data to confirm it looks correct
- Click Load
UNICHAR and UNICODE Functions
Excel 2013+ includes functions for working with Unicode code points:
UNICHAR(number)
Returns the character for a given Unicode code point (decimal):
| Formula | Result | Description |
|---|---|---|
=UNICHAR(65) |
A | Latin capital letter A |
=UNICHAR(169) |
(c) | Copyright sign |
=UNICHAR(8364) |
EUR | Euro sign (U+20AC) |
=UNICHAR(9731) |
Snowman | Snowman (U+2603) |
=UNICHAR(128512) |
Grin | Grinning face U+1F600 |
=UNICHAR(HEX2DEC("2603")) |
Snowman | Convert hex to decimal first |
UNICODE(text)
Returns the decimal code point of the first character in a string:
| Formula | Result | Description |
|---|---|---|
=UNICODE("A") |
65 | Code point of A |
=UNICODE("EUR sign") |
8364 | Code point of Euro sign |
=DEC2HEX(UNICODE("EUR sign")) |
20AC | Hex code point |
CHAR vs UNICHAR
Excel has both CHAR and UNICHAR, but they behave differently:
| Function | Range | Encoding |
|---|---|---|
CHAR(n) |
1-255 | Windows code page (platform-dependent) |
UNICHAR(n) |
1-1114111 | Unicode code point (universal) |
CHAR(169) returns the copyright sign on Windows (code page 1252) but may return a
different character on a system with a different active code page. UNICHAR(169) always
returns the copyright sign because it uses the Unicode standard.
Best practice: Always use UNICHAR and UNICODE instead of CHAR and CODE for
reliable cross-platform behavior.
Working with Multilingual Data in Excel
Sorting
Excel sorts based on the system locale by default. For multilingual data, this can produce unexpected results:
| Scenario | Issue | Solution |
|---|---|---|
| German umlauts (a, o, u) | May sort after Z | Custom sort order |
| Chinese characters | Sorted by Unicode code point, not pinyin | Use SORT with collation |
| Mixed scripts | Latin before CJK before emoji | Usually acceptable |
Filtering
Excel's text filters work on the stored Unicode text, so filtering for CJK characters
or accented text works correctly in .xlsx files. In CSV files opened with wrong encoding,
filters will match the garbled text, not the intended characters.
Cell formatting for RTL
To display Arabic or Hebrew text correctly:
- Select the cell(s)
- Right-click > Format Cells > Alignment
- Set Text direction to "Right-to-Left"
Excel also supports mixed BiDi text within a single cell.
Legacy Formats
| Format | Encoding | Unicode Support |
|---|---|---|
| .xlsx | UTF-8 (XML) | Full |
| .xls (97-2003) | UTF-16 internally | Good but some limitations |
| .csv | No standard | Depends on how opened |
| .tsv | No standard | Same as CSV |
| .xlsb | Binary | Full (same as XLSX) |
| .ods | UTF-8 (XML) | Full |
The legacy .xls format uses UTF-16 internally and handles Unicode reasonably well,
but it has a row limit of 65,536 and lacks modern features. Always prefer .xlsx.
Power Query and Unicode
Excel's Power Query (Get & Transform) handles encoding better than the basic CSV import:
- Go to Data > Get Data > From File > From Text/CSV
- Power Query automatically detects UTF-8 if a BOM is present
- For BOM-less files, use the File Origin dropdown to specify encoding
- Power Query preserves Unicode through all transformation steps
Power Query is the recommended way to import CSV data when Unicode correctness matters.
Practical Tips
Detecting encoding issues
If your data looks like this after import:
| Garbled | Correct | Encoding mismatch |
|---|---|---|
| é | e with accent | UTF-8 read as Windows-1252 |
| e\u0301 | e with accent | NFD form displayed without combining |
| ??? | CJK characters | Non-Unicode encoding or missing font |
| \u25A1\u25A1\u25A1 | Emoji | Font does not support emoji |
Preserving Unicode in VBA
VBA strings are internally UTF-16, so they handle Unicode well. But file I/O defaults to the system code page:
' Writing UTF-8 file from VBA
Dim stream As Object
Set stream = CreateObject("ADODB.Stream")
stream.Type = 2 ' Text
stream.Charset = "utf-8"
stream.Open
stream.WriteText "Unicode text here"
stream.SaveToFile "output.txt", 2 ' Overwrite
stream.Close
Key Takeaways
- XLSX preserves all Unicode natively (UTF-8 in XML). If you need full Unicode support, avoid CSV where possible.
- When you must use CSV, save as "CSV UTF-8 (Comma delimited)" from Excel, or
include a UTF-8 BOM (
EF BB BF) when generating CSV from code. - Use UNICHAR/UNICODE (not CHAR/CODE) for cross-platform Unicode formula work.
- Import CSV via Data > From Text/CSV or Power Query to control encoding, rather than double-clicking the file.
- The UTF-8 BOM is the simplest way to ensure Excel interprets your CSV correctly.
Mehr in Platform Guides
Microsoft Word supports the full Unicode character set and provides several methods …
Google Docs and Sheets use UTF-8 internally and provide a Special Characters …
Modern terminals support Unicode and UTF-8, but correctly displaying all Unicode characters …
PDF supports Unicode text through embedded fonts and ToUnicode maps, but many …
Social media platforms handle Unicode text with varying degrees of support, affecting …
Both XML and JSON are defined to use Unicode text, but each …
Natural language processing and data science pipelines frequently encounter Unicode issues including …
QR codes can encode Unicode text using UTF-8, but many QR code …
Allowing Unicode characters in passwords increases the keyspace and can improve security, …