🖥️ Platform Guides

Unicode in Excel

Microsoft Excel stores text in Unicode but has historically struggled with non-Latin characters in CSV imports, RTL text layout, and font coverage for scripts like Devanagari or Arabic. This guide covers how to handle Unicode correctly in Excel, including the CHAR and UNICODE functions, importing CSV with the right encoding, and displaying international text.

·

Microsoft Excel is central to business workflows worldwide, and its handling of Unicode has improved dramatically over the years — but significant pitfalls remain, particularly around CSV import/export. This guide explains how Excel handles Unicode in its native XLSX format, the encoding traps with CSV files, and the UNICHAR/UNICODE functions that let you work with code points directly in formulas.

XLSX: Unicode by Default

The modern Excel format (.xlsx) is an Open XML format — a collection of XML files in a ZIP container. All text is stored as UTF-8 in the XML. This means XLSX natively supports the full range of Unicode characters.

Aspect XLSX Behavior
Text encoding UTF-8 (in XML)
Maximum characters per cell 32,767
Emoji support Yes (display depends on OS/font)
CJK characters Full support
RTL text Supported (with cell formatting)
Supplementary characters (U+10000+) Stored correctly, display depends on font

When you save as .xlsx, you never lose Unicode data. The problems arise when you move data in or out of Excel via CSV, copy-paste, or legacy formats.

The CSV Encoding Problem

CSV (Comma-Separated Values) is a plain text format with no standard encoding declaration. When you open a CSV file in Excel, Excel must guess the encoding — and it often guesses wrong.

Excel's CSV encoding behavior

Action Default Encoding Result
Open CSV (double-click) System locale (e.g., Windows-1252) Non-ASCII corrupted
Save as CSV System locale (e.g., Windows-1252) Non-ASCII may be lost
Save as "CSV UTF-8" UTF-8 with BOM Unicode preserved
Import via Data > From Text User-selected Correct if UTF-8 chosen

The BOM (Byte Order Mark)

The UTF-8 BOM is the byte sequence EF BB BF at the start of a file. It serves as a signal to applications that the file is UTF-8:

File starts with Excel interprets as
EF BB BF (UTF-8 BOM) UTF-8
FF FE (UTF-16 LE BOM) UTF-16 Little Endian
FE FF (UTF-16 BE BOM) UTF-16 Big Endian
No BOM System locale (often Windows-1252 or Shift_JIS)

Critical rule: If you generate CSV files for Excel consumption, always include the UTF-8 BOM. Without it, Excel will mangle non-ASCII characters.

Generating UTF-8 CSV with BOM

In Python:

import csv

with open("output.csv", "w", encoding="utf-8-sig", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(["Name", "City", "Symbol"])
    writer.writerow(["Muller", "Munchen", "EUR sign is here"])

The utf-8-sig encoding writes the BOM automatically. The newline="" parameter prevents double line breaks on Windows.

In other languages:

// Node.js
const BOM = '\uFEFF';
fs.writeFileSync('output.csv', BOM + csvContent, 'utf8');

Opening UTF-8 CSV correctly in Excel

If you receive a UTF-8 CSV without BOM:

  1. Open Excel (do not open the CSV directly)
  2. Go to Data > From Text/CSV
  3. In the import wizard, set File Origin to "65001: Unicode (UTF-8)"
  4. Preview the data to confirm it looks correct
  5. Click Load

UNICHAR and UNICODE Functions

Excel 2013+ includes functions for working with Unicode code points:

UNICHAR(number)

Returns the character for a given Unicode code point (decimal):

Formula Result Description
=UNICHAR(65) A Latin capital letter A
=UNICHAR(169) (c) Copyright sign
=UNICHAR(8364) EUR Euro sign (U+20AC)
=UNICHAR(9731) Snowman Snowman (U+2603)
=UNICHAR(128512) Grin Grinning face U+1F600
=UNICHAR(HEX2DEC("2603")) Snowman Convert hex to decimal first

UNICODE(text)

Returns the decimal code point of the first character in a string:

Formula Result Description
=UNICODE("A") 65 Code point of A
=UNICODE("EUR sign") 8364 Code point of Euro sign
=DEC2HEX(UNICODE("EUR sign")) 20AC Hex code point

CHAR vs UNICHAR

Excel has both CHAR and UNICHAR, but they behave differently:

Function Range Encoding
CHAR(n) 1-255 Windows code page (platform-dependent)
UNICHAR(n) 1-1114111 Unicode code point (universal)

CHAR(169) returns the copyright sign on Windows (code page 1252) but may return a different character on a system with a different active code page. UNICHAR(169) always returns the copyright sign because it uses the Unicode standard.

Best practice: Always use UNICHAR and UNICODE instead of CHAR and CODE for reliable cross-platform behavior.

Working with Multilingual Data in Excel

Sorting

Excel sorts based on the system locale by default. For multilingual data, this can produce unexpected results:

Scenario Issue Solution
German umlauts (a, o, u) May sort after Z Custom sort order
Chinese characters Sorted by Unicode code point, not pinyin Use SORT with collation
Mixed scripts Latin before CJK before emoji Usually acceptable

Filtering

Excel's text filters work on the stored Unicode text, so filtering for CJK characters or accented text works correctly in .xlsx files. In CSV files opened with wrong encoding, filters will match the garbled text, not the intended characters.

Cell formatting for RTL

To display Arabic or Hebrew text correctly:

  1. Select the cell(s)
  2. Right-click > Format Cells > Alignment
  3. Set Text direction to "Right-to-Left"

Excel also supports mixed BiDi text within a single cell.

Legacy Formats

Format Encoding Unicode Support
.xlsx UTF-8 (XML) Full
.xls (97-2003) UTF-16 internally Good but some limitations
.csv No standard Depends on how opened
.tsv No standard Same as CSV
.xlsb Binary Full (same as XLSX)
.ods UTF-8 (XML) Full

The legacy .xls format uses UTF-16 internally and handles Unicode reasonably well, but it has a row limit of 65,536 and lacks modern features. Always prefer .xlsx.

Power Query and Unicode

Excel's Power Query (Get & Transform) handles encoding better than the basic CSV import:

  1. Go to Data > Get Data > From File > From Text/CSV
  2. Power Query automatically detects UTF-8 if a BOM is present
  3. For BOM-less files, use the File Origin dropdown to specify encoding
  4. Power Query preserves Unicode through all transformation steps

Power Query is the recommended way to import CSV data when Unicode correctness matters.

Practical Tips

Detecting encoding issues

If your data looks like this after import:

Garbled Correct Encoding mismatch
é e with accent UTF-8 read as Windows-1252
e\u0301 e with accent NFD form displayed without combining
??? CJK characters Non-Unicode encoding or missing font
\u25A1\u25A1\u25A1 Emoji Font does not support emoji

Preserving Unicode in VBA

VBA strings are internally UTF-16, so they handle Unicode well. But file I/O defaults to the system code page:

' Writing UTF-8 file from VBA
Dim stream As Object
Set stream = CreateObject("ADODB.Stream")
stream.Type = 2  ' Text
stream.Charset = "utf-8"
stream.Open
stream.WriteText "Unicode text here"
stream.SaveToFile "output.txt", 2  ' Overwrite
stream.Close

Key Takeaways

  • XLSX preserves all Unicode natively (UTF-8 in XML). If you need full Unicode support, avoid CSV where possible.
  • When you must use CSV, save as "CSV UTF-8 (Comma delimited)" from Excel, or include a UTF-8 BOM (EF BB BF) when generating CSV from code.
  • Use UNICHAR/UNICODE (not CHAR/CODE) for cross-platform Unicode formula work.
  • Import CSV via Data > From Text/CSV or Power Query to control encoding, rather than double-clicking the file.
  • The UTF-8 BOM is the simplest way to ensure Excel interprets your CSV correctly.

เพิ่มเติมใน Platform Guides