🖥️ Platform Guides

Unicode in Excel

Microsoft Excel stores text in Unicode but has historically struggled with non-Latin characters in CSV imports, RTL text layout, and font coverage for scripts like Devanagari or Arabic. This guide covers how to handle Unicode correctly in Excel, including the CHAR and UNICODE functions, importing CSV with the right encoding, and displaying international text.

Published 2024-05-06 · Updated 2025-07-24

Microsoft Excel is central to business workflows worldwide, and its handling of Unicode has improved dramatically over the years — but significant pitfalls remain, particularly around CSV import/export. This guide explains how Excel handles Unicode in its native XLSX format, the encoding traps with CSV files, and the UNICHAR/UNICODE functions that let you work with code points directly in formulas.

XLSX: Unicode by Default

The modern Excel format (.xlsx) is an Open XML format — a collection of XML files in a ZIP container. All text is stored as UTF-8 in the XML. This means XLSX natively supports the full range of Unicode characters.

Aspect	XLSX Behavior
Text encoding	UTF-8 (in XML)
Maximum characters per cell	32,767
Emoji support	Yes (display depends on OS/font)
CJK characters	Full support
RTL text	Supported (with cell formatting)
Supplementary characters (U+10000+)	Stored correctly, display depends on font

When you save as .xlsx, you never lose Unicode data. The problems arise when you move data in or out of Excel via CSV, copy-paste, or legacy formats.

The CSV Encoding Problem

CSV (Comma-Separated Values) is a plain text format with no standard encoding declaration. When you open a CSV file in Excel, Excel must guess the encoding — and it often guesses wrong.

Excel's CSV encoding behavior

Action	Default Encoding	Result
Open CSV (double-click)	System locale (e.g., Windows-1252)	Non-ASCII corrupted
Save as CSV	System locale (e.g., Windows-1252)	Non-ASCII may be lost
Save as "CSV UTF-8"	UTF-8 with BOM	Unicode preserved
Import via Data > From Text	User-selected	Correct if UTF-8 chosen

The BOM (Byte Order Mark)

The UTF-8 BOM is the byte sequence EF BB BF at the start of a file. It serves as a signal to applications that the file is UTF-8:

File starts with	Excel interprets as
`EF BB BF` (UTF-8 BOM)	UTF-8
`FF FE` (UTF-16 LE BOM)	UTF-16 Little Endian
`FE FF` (UTF-16 BE BOM)	UTF-16 Big Endian
No BOM	System locale (often Windows-1252 or Shift_JIS)

Critical rule: If you generate CSV files for Excel consumption, always include the UTF-8 BOM. Without it, Excel will mangle non-ASCII characters.

Generating UTF-8 CSV with BOM

In Python:

import csv

with open("output.csv", "w", encoding="utf-8-sig", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(["Name", "City", "Symbol"])
    writer.writerow(["Muller", "Munchen", "EUR sign is here"])

The utf-8-sig encoding writes the BOM automatically. The newline="" parameter prevents double line breaks on Windows.

In other languages:

// Node.js
const BOM = '\uFEFF';
fs.writeFileSync('output.csv', BOM + csvContent, 'utf8');

Opening UTF-8 CSV correctly in Excel

If you receive a UTF-8 CSV without BOM:

Open Excel (do not open the CSV directly)
Go to Data > From Text/CSV
In the import wizard, set File Origin to "65001: Unicode (UTF-8)"
Preview the data to confirm it looks correct
Click Load

UNICHAR and UNICODE Functions

Excel 2013+ includes functions for working with Unicode code points:

UNICHAR(number)

Returns the character for a given Unicode code point (decimal):

Formula	Result	Description
`=UNICHAR(65)`	A	Latin capital letter A
`=UNICHAR(169)`	(c)	Copyright sign
`=UNICHAR(8364)`	EUR	Euro sign (U+20AC)
`=UNICHAR(9731)`	Snowman	Snowman (U+2603)
`=UNICHAR(128512)`	Grin	Grinning face U+1F600
`=UNICHAR(HEX2DEC("2603"))`	Snowman	Convert hex to decimal first

UNICODE(text)

Returns the decimal code point of the first character in a string:

Formula	Result	Description
`=UNICODE("A")`	65	Code point of A
`=UNICODE("EUR sign")`	8364	Code point of Euro sign
`=DEC2HEX(UNICODE("EUR sign"))`	20AC	Hex code point

CHAR vs UNICHAR

Excel has both CHAR and UNICHAR, but they behave differently:

Function	Range	Encoding
`CHAR(n)`	1-255	Windows code page (platform-dependent)
`UNICHAR(n)`	1-1114111	Unicode code point (universal)

CHAR(169) returns the copyright sign on Windows (code page 1252) but may return a different character on a system with a different active code page. UNICHAR(169) always returns the copyright sign because it uses the Unicode standard.

Best practice: Always use UNICHAR and UNICODE instead of CHAR and CODE for reliable cross-platform behavior.

Working with Multilingual Data in Excel

Sorting

Excel sorts based on the system locale by default. For multilingual data, this can produce unexpected results:

Scenario	Issue	Solution
German umlauts (a, o, u)	May sort after Z	Custom sort order
Chinese characters	Sorted by Unicode code point, not pinyin	Use SORT with collation
Mixed scripts	Latin before CJK before emoji	Usually acceptable

Filtering

Excel's text filters work on the stored Unicode text, so filtering for CJK characters or accented text works correctly in .xlsx files. In CSV files opened with wrong encoding, filters will match the garbled text, not the intended characters.

Cell formatting for RTL

To display Arabic or Hebrew text correctly:

Select the cell(s)
Right-click > Format Cells > Alignment
Set Text direction to "Right-to-Left"

Excel also supports mixed BiDi text within a single cell.

Legacy Formats

Format	Encoding	Unicode Support
.xlsx	UTF-8 (XML)	Full
.xls (97-2003)	UTF-16 internally	Good but some limitations
.csv	No standard	Depends on how opened
.tsv	No standard	Same as CSV
.xlsb	Binary	Full (same as XLSX)
.ods	UTF-8 (XML)	Full

The legacy .xls format uses UTF-16 internally and handles Unicode reasonably well, but it has a row limit of 65,536 and lacks modern features. Always prefer .xlsx.

Power Query and Unicode

Excel's Power Query (Get & Transform) handles encoding better than the basic CSV import:

Go to Data > Get Data > From File > From Text/CSV
Power Query automatically detects UTF-8 if a BOM is present
For BOM-less files, use the File Origin dropdown to specify encoding
Power Query preserves Unicode through all transformation steps

Power Query is the recommended way to import CSV data when Unicode correctness matters.

Practical Tips

Detecting encoding issues

If your data looks like this after import:

Garbled	Correct	Encoding mismatch
Ã©	e with accent	UTF-8 read as Windows-1252
e\u0301	e with accent	NFD form displayed without combining
???	CJK characters	Non-Unicode encoding or missing font
\u25A1\u25A1\u25A1	Emoji	Font does not support emoji

Preserving Unicode in VBA

VBA strings are internally UTF-16, so they handle Unicode well. But file I/O defaults to the system code page:

' Writing UTF-8 file from VBA
Dim stream As Object
Set stream = CreateObject("ADODB.Stream")
stream.Type = 2  ' Text
stream.Charset = "utf-8"
stream.Open
stream.WriteText "Unicode text here"
stream.SaveToFile "output.txt", 2  ' Overwrite
stream.Close

Key Takeaways

XLSX preserves all Unicode natively (UTF-8 in XML). If you need full Unicode support, avoid CSV where possible.
When you must use CSV, save as "CSV UTF-8 (Comma delimited)" from Excel, or include a UTF-8 BOM (EF BB BF) when generating CSV from code.
Use UNICHAR/UNICODE (not CHAR/CODE) for cross-platform Unicode formula work.
Import CSV via Data > From Text/CSV or Power Query to control encoding, rather than double-clicking the file.
The UTF-8 BOM is the simplest way to ensure Excel interprets your CSV correctly.

Mehr in Platform Guides

Unicode in Microsoft Word

Microsoft Word supports the full Unicode character set and provides several methods …

Unicode in Google Docs & Sheets

Google Docs and Sheets use UTF-8 internally and provide a Special Characters …

Unicode in Terminal / Command Line

Modern terminals support Unicode and UTF-8, but correctly displaying all Unicode characters …

Unicode in PDF Documents

PDF supports Unicode text through embedded fonts and ToUnicode maps, but many …

Unicode in Social Media

Social media platforms handle Unicode text with varying degrees of support, affecting …

Unicode in XML and JSON

Both XML and JSON are defined to use Unicode text, but each …

Unicode in Data Science and NLP

Natural language processing and data science pipelines frequently encounter Unicode issues including …

Unicode in QR Codes

QR codes can encode Unicode text using UTF-8, but many QR code …

Unicode in Passwords: Security Implications

Allowing Unicode characters in passwords increases the keyspace and can improve security, …

← Zurück zu den Anleitungen