ISO 10646 / 万国文字集合
Unicodeと同期している国際標準(ISO/IEC 10646)で、同じ文字目録とコードポイントを定義しますが、Unicodeの追加アルゴリズムやプロパティは含みません。
What is ISO/IEC 10646?
ISO/IEC 10646 is the international standard that defines the Universal Coded Character Set (UCS) — a character repertoire and encoding architecture developed jointly by ISO (International Organization for Standardization) and IEC (International Electrotechnical Commission). It is, in practical terms, the same character set as Unicode.
The two standards are maintained in close synchronization by their respective organizations: the Unicode Consortium and ISO/IEC JTC 1/SC 2/WG 2. Every character assigned a code point in Unicode has the same code point in ISO 10646, and vice versa. The character names and code point values are identical.
History: Parallel Origins
In the late 1980s, two independent efforts began simultaneously:
- Unicode: Led by Xerox and Apple engineers, later formalized as the Unicode Consortium (1991)
- ISO/IEC 10646: ISO's Working Group 2 (WG2) began work on a universal character set in 1984
Both projects recognized the impossibility of two incompatible universal character sets, and in 1991 they agreed to merge their character repertoires. Unicode 1.0 and ISO 10646-1:1993 were aligned at the code point level, and the two organizations have maintained synchronization since.
How They Differ
Despite sharing the same character repertoire, the two standards differ in scope:
| Aspect | Unicode Standard | ISO/IEC 10646 |
|---|---|---|
| Character repertoire | Identical | Identical |
| Character names | Identical | Identical |
| Encoding forms | UTF-8, UTF-16, UTF-32 defined | UCS-2, UCS-4, UTF-8, UTF-16 defined |
| Character properties | Extensive (UCD) | Minimal |
| Algorithms | Bidi, collation, normalization | Not included |
| Emoji specifications | Detailed | Not included |
| Locale data (CLDR) | Via Consortium | Not included |
In practice, ISO 10646 defines "what" (the characters and their code points); the Unicode Standard defines "what and how" (characters plus their properties and processing algorithms). A system claiming ISO 10646 conformance is compatible with Unicode at the character level but may not support Unicode-specific features like bidirectional text rendering.
UCS Encoding Forms
ISO 10646 introduced the terminology UCS (Universal Coded Character Set) and originally defined:
- UCS-2: Fixed 2-byte encoding, BMP-only (no supplementary characters)
- UCS-4: Fixed 4-byte encoding (identical to UTF-32)
UTF-8 and UTF-16 were later incorporated into 10646 as additional encoding forms. UCS-2 is now considered obsolete; UTF-16 supersedes it by adding surrogate pair support for supplementary characters.
Why Both Standards Exist
Both standards exist because of different institutional ecosystems:
- Government procurement: Many national governments require ISO standards for technology purchasing. Having ISO 10646 alignment means Unicode-based software meets ISO compliance requirements.
- Telecommunications: ITU (International Telecommunication Union) references ISO 10646 in protocols like ASN.1 and X.400.
- Industrial standards: Many domain-specific standards (healthcare HL7, automotive AUTOSAR) reference ISO 10646.
For a software developer, the distinction is largely irrelevant — implementing Unicode is implementing ISO 10646, and vice versa.
Common Misconceptions
"ISO 10646 and Unicode are different character sets" — They are the same character set, maintained in sync. Differences are in the supplemental specifications only.
"UCS-2 is the same as UTF-16" — UCS-2 is BMP-only (no surrogate support). UTF-16 extends UCS-2 with surrogate pairs. Legacy systems claiming "UCS-2 support" cannot handle emoji or characters above U+FFFF.
Quick Facts
| Property | Value |
|---|---|
| Full name | ISO/IEC 10646 |
| Also known as | UCS, Universal Coded Character Set |
| Maintained by | ISO/IEC JTC 1/SC 2/WG 2 |
| First edition | ISO 10646-1:1993 |
| Current edition | ISO/IEC 10646:2020 (regularly amended) |
| Character repertoire | Identical to Unicode |
| Encoding forms defined | UTF-8, UTF-16, UTF-32, UCS-4 |
| Synchronization with Unicode | Maintained by both organizations |
関連用語
Unicode 標準 のその他の用語
中国語・日本語・韓国語 — Unicodeにおける統合漢字ブロックと関連スクリプトをまとめた総称。CJK統合漢字は20,992文字以上を含みます。
The process of mapping Chinese, Japanese, and Korean ideographs that share a …
The individual consonant and vowel components (jamo) of the Korean Hangul writing …
あらゆる文字システムのすべての文字に固有の番号(コードポイント)を割り当てる普遍的文字エンコーディング規格。バージョン16.0には154,998個の割り当て済み文字が含まれます。
Normative or informative documents that are integral parts of the Unicode Standard. …
Informational documents published by the Unicode Consortium covering specific topics like security …
Unicode標準を開発・維持する非営利団体。Apple・Google・Microsoft・Metaなど多くの企業が会員です。
サロゲートコードポイント(U+D800〜U+DFFF)を除くすべてのコードポイント。実際の文字を表すことができる有効な値の集合で、合計1,112,064個です。
新しい文字・文字体系・機能を追加するUnicode標準の主要リリース。現在のバージョンはUnicode 16.0(2025年9月)です。
一度割り当てられた文字のコードポイントと名前は絶対に変更されないことを保証するポリシー。プロパティは改訂される場合がありますが、割り当ては永続的です。