How New Characters Get Added to Unicode
Adding a new character to Unicode requires submitting a detailed proposal to the Unicode Consortium that demonstrates the character's need, usage, and distinctness from existing characters. This guide walks through the Unicode proposal process step by step, from initial submission to final approval and inclusion in a Unicode release.
Unicode does not grow by accident. Every one of the 154,998 characters in Unicode 16.0 was added through a formal process that can take years from initial idea to published standard. Understanding this process reveals both the rigor behind the standard and the practical barriers that keep some scripts and characters waiting for decades.
Who Can Submit a Proposal?
Anyone can submit a character proposal to the Unicode Technical Committee — you do not need to be a member of the Consortium. Academic researchers, language community representatives, governments, independent scholars, and employees of member companies have all successfully submitted proposals.
However, writing a successful proposal requires significant technical knowledge. The UTC expects proposals to follow detailed documentation standards, and weak proposals are typically returned with feedback rather than approved. Many successful proposals are submitted by specialists in linguistics, computer science, or both.
The Script Encoding Initiative (SEI) at UC Berkeley actively assists communities in preparing proposals for minority and historical scripts. SEI has been particularly important for scripts that lack advocates within major technology companies.
What Unicode Encodes (and What It Does Not)
A foundational principle of Unicode is that it encodes characters, not glyphs. A character is an abstract unit of meaning — the letter "A," the currency symbol "$," the Devanagari vowel "अ." A glyph is a specific visual rendering of a character, dependent on font, style, and rendering context.
This distinction has practical consequences. Unicode does not encode:
- Different fonts or typefaces of the same character
- Stylistic variations that do not change meaning (italic, bold)
- Characters that are simply stylistic alternates encoded elsewhere
The most famous application of this principle is Han Unification: Chinese, Japanese, and Korean ideographs that have the same origin and meaning are encoded as a single character, even if they are written somewhat differently in each tradition. This decision remains controversial (see the CJK Unification article), but it reflects the character-not-glyph principle.
The Formal Proposal Document
A complete character proposal must include:
- Character names: Formal names following Unicode naming conventions (uppercase, with hyphens, no abbreviations)
- Character properties: Script, category (letter, digit, punctuation, etc.), bidi class, combining class
- Code point range: A suggested range within the Unicode code space
- Representativeness: Evidence that the characters are actually used in documents, books, software, or communities
- Cross-references: Comparison with similar characters already encoded, explaining why new encoding is needed rather than using existing characters
- Font samples: At least one font rendering of each proposed character
- Collation and sorting: How the characters interact with existing sorting algorithms
- Keyboard input: Evidence or proposals for how users would type these characters
The requirement for font samples is significant. Unicode does not develop fonts, but a proposal with no font demonstrates that no implementation infrastructure exists, which can slow approval.
The Review Cycle
Proposals are submitted to the UTC mailing list and tracked in the Pipeline — a public document listing all characters under consideration, along with their status (Under Consideration, Accepted in Principle, In Progress, etc.).
The UTC reviews proposals at its quarterly meetings. A proposal may go through several rounds:
- Initial review: The UTC reads the proposal and may return it with questions or requests for more information.
- Accepted in Principle: The UTC agrees the characters should be encoded but may request revisions to names, code points, or properties.
- Accepted: Final approval. The characters are scheduled for a specific Unicode version.
- Publication: The new version of the standard is published, typically once per year.
Between meetings, technical working groups and individual committee members correspond with proposal authors via email. This informal feedback loop often determines whether a proposal succeeds or fails.
Timelines: Measured in Years
The UTC is candid that encoding new scripts takes time — often many years. The reasons are practical:
- Research: For historical scripts, scholarly consensus on decipherment may still be forming. Unicode will not encode characters with disputed meanings.
- Stability requirement: Once a character is assigned a code point, that assignment is permanent. Mistakes are very difficult to correct. The UTC is therefore conservative.
- Capacity: The UTC processes dozens of proposals simultaneously. Complex scripts (with many characters, complex joining behavior, or bidirectional properties) require more review time.
- Font development: The requirement for reference fonts creates a dependency on type designers, who may not be immediately available.
Simple additions (a few characters to an existing script, a new symbol or emoji) can be processed in one or two meeting cycles — roughly 6 to 18 months. A new script encoding, especially for a living language, typically takes 3 to 7 years from first proposal to publication. Historical scripts with scholarly uncertainty can take a decade or more.
From Proposal to Publication
Once a character is accepted, it is assigned to a specific Unicode version under development. The Unicode Consortium publishes one major version per year, typically in the second quarter. The development cycle for each version includes:
- An alpha period with a proposed character list
- A public review period (typically 90 days) where anyone can comment on proposed additions and changes
- Beta testing of data files
- Final publication
Even after publication, operating system vendors need time to implement the new characters in their fonts and rendering engines. A character published in Unicode 16.0 may not be visible to end users until their OS ships a font update, which can take months to years.
Why Proposals Are Rejected
The UTC rejects or defers proposals for several reasons:
- Duplicate encoding: A character with the same meaning already exists at another code point.
- Not a character: Purely stylistic variants, logo elements, or private-use symbols.
- Insufficient evidence of use: Characters invented for a proposal without demonstrated real-world deployment.
- Unstable script: Scripts still under active scholarly revision.
- Glyph variation: Differences that belong in font design, not character encoding.
Understanding these criteria helps explain why some communities feel their scripts are excluded — the barrier of "demonstrated use" can disadvantage scripts used primarily in oral or non-digital contexts.
Unicode History & Culture のその他のガイド
ASCII was created in 1963 by the American Standards Association to standardize …
EBCDIC (Extended Binary Coded Decimal Interchange Code) was IBM's character encoding used …
The Unicode Consortium is the non-profit organization responsible for developing and maintaining …
Getting a new emoji into Unicode requires a formal proposal to the …
CJK unification was Unicode's decision to assign the same code points to …
Mojibake — Japanese for 'character transformation' — is the garbled text that …
From the first Unicode draft in 1988 to the addition of emoji, …
Before Unicode became universal, the web was fragmented by incompatible national encodings …
Unicode is full of surprising, obscure, and occasionally humorous characters — from …