Platform Guides
Unicode usage in specific platforms and applications
10 guías en esta serie
Microsoft Word supports the full Unicode character set and provides several methods for inserting special characters, including Alt+X code point entry, the Symbol dialog, and autocorrect substitutions. This guide covers how to insert, search, and troubleshoot Unicode characters in Microsoft Word documents.
Google Docs and Sheets use UTF-8 internally and provide a Special Characters panel for inserting Unicode symbols by drawing, searching by name, or browsing by category. This guide explains how to insert and work with Unicode characters in Google Docs and Sheets, including formula functions for character conversion.
Modern terminals support Unicode and UTF-8, but correctly displaying all Unicode characters requires a compatible terminal emulator, the right locale settings, and a font with adequate Unicode coverage. This guide covers how to configure your terminal for Unicode, insert special characters from the command line, and debug character display issues.
PDF supports Unicode text through embedded fonts and ToUnicode maps, but many PDFs created from scans or older tools produce files where copy-pasting text yields garbled output or missing characters. This guide explains how Unicode is stored in PDF files, how to diagnose text extraction problems, and best practices for creating accessible Unicode PDFs.
Microsoft Excel stores text in Unicode but has historically struggled with non-Latin characters in CSV imports, RTL text layout, and font coverage for scripts like Devanagari or Arabic. This guide covers how to handle Unicode correctly in Excel, including the CHAR and UNICODE functions, importing CSV with the right encoding, and displaying international text.
Social media platforms handle Unicode text with varying degrees of support, affecting how emoji, RTL text, special characters, and invisible formatting appear in posts, bios, and usernames. This guide explains how Twitter, Instagram, TikTok, and LinkedIn handle Unicode, and how to use special characters effectively across social platforms.
Both XML and JSON are defined to use Unicode text, but each has its own rules for encoding characters, escaping special code points, and declaring the document encoding. This guide explains Unicode in XML (including the XML declaration and character references) and JSON (including \uXXXX escape sequences and surrogate pair handling).
Natural language processing and data science pipelines frequently encounter Unicode issues including encoding errors, normalization mismatches, invisible characters, and language detection challenges. This guide addresses Unicode challenges specific to data science and NLP, covering pandas, text preprocessing, tokenization, and multilingual datasets.
QR codes can encode Unicode text using UTF-8, but many QR code generators and scanners default to ISO 8859-1, causing non-Latin characters to appear garbled when scanned. This guide explains how QR codes handle Unicode, how to generate QR codes with correct Unicode encoding, and how to verify that your QR code encodes non-ASCII text properly.
Allowing Unicode characters in passwords increases the keyspace and can improve security, but it also introduces normalization ambiguity, where the same visible password maps to different byte sequences. This guide explores the security and usability implications of Unicode passwords, covering normalization, SASLprep, and how major platforms handle Unicode credentials.