มาตรฐาน Unicode

ISO 10646 / Universal Character Set

มาตรฐานสากล (ISO/IEC 10646) ที่ซิงโครไนซ์กับ Unicode กำหนดชุดอักขระและจุดรหัสเดียวกัน แต่ไม่มีอัลกอริธึมและคุณสมบัติเพิ่มเติมของ Unicode

2021-07-21 · Updated 2024-09-02

What is ISO/IEC 10646?

ISO/IEC 10646 is the international standard that defines the Universal Coded Character Set (UCS) — a character repertoire and encoding architecture developed jointly by ISO (International Organization for Standardization) and IEC (International Electrotechnical Commission). It is, in practical terms, the same character set as Unicode.

The two standards are maintained in close synchronization by their respective organizations: the Unicode Consortium and ISO/IEC JTC 1/SC 2/WG 2. Every character assigned a code point in Unicode has the same code point in ISO 10646, and vice versa. The character names and code point values are identical.

History: Parallel Origins

In the late 1980s, two independent efforts began simultaneously:

Unicode: Led by Xerox and Apple engineers, later formalized as the Unicode Consortium (1991)
ISO/IEC 10646: ISO's Working Group 2 (WG2) began work on a universal character set in 1984

Both projects recognized the impossibility of two incompatible universal character sets, and in 1991 they agreed to merge their character repertoires. Unicode 1.0 and ISO 10646-1:1993 were aligned at the code point level, and the two organizations have maintained synchronization since.

How They Differ

Despite sharing the same character repertoire, the two standards differ in scope:

Aspect	Unicode Standard	ISO/IEC 10646
Character repertoire	Identical	Identical
Character names	Identical	Identical
Encoding forms	UTF-8, UTF-16, UTF-32 defined	UCS-2, UCS-4, UTF-8, UTF-16 defined
Character properties	Extensive (UCD)	Minimal
Algorithms	Bidi, collation, normalization	Not included
Emoji specifications	Detailed	Not included
Locale data (CLDR)	Via Consortium	Not included

In practice, ISO 10646 defines "what" (the characters and their code points); the Unicode Standard defines "what and how" (characters plus their properties and processing algorithms). A system claiming ISO 10646 conformance is compatible with Unicode at the character level but may not support Unicode-specific features like bidirectional text rendering.

UCS Encoding Forms

ISO 10646 introduced the terminology UCS (Universal Coded Character Set) and originally defined:

UCS-2: Fixed 2-byte encoding, BMP-only (no supplementary characters)
UCS-4: Fixed 4-byte encoding (identical to UTF-32)

UTF-8 and UTF-16 were later incorporated into 10646 as additional encoding forms. UCS-2 is now considered obsolete; UTF-16 supersedes it by adding surrogate pair support for supplementary characters.

Why Both Standards Exist

Both standards exist because of different institutional ecosystems:

Government procurement: Many national governments require ISO standards for technology purchasing. Having ISO 10646 alignment means Unicode-based software meets ISO compliance requirements.
Telecommunications: ITU (International Telecommunication Union) references ISO 10646 in protocols like ASN.1 and X.400.
Industrial standards: Many domain-specific standards (healthcare HL7, automotive AUTOSAR) reference ISO 10646.

For a software developer, the distinction is largely irrelevant — implementing Unicode is implementing ISO 10646, and vice versa.

Common Misconceptions

"ISO 10646 and Unicode are different character sets" — They are the same character set, maintained in sync. Differences are in the supplemental specifications only.

"UCS-2 is the same as UTF-16" — UCS-2 is BMP-only (no surrogate support). UTF-16 extends UCS-2 with surrogate pairs. Legacy systems claiming "UCS-2 support" cannot handle emoji or characters above U+FFFF.

Quick Facts

Property	Value
Full name	ISO/IEC 10646
Also known as	UCS, Universal Coded Character Set
Maintained by	ISO/IEC JTC 1/SC 2/WG 2
First edition	ISO 10646-1:1993
Current edition	ISO/IEC 10646:2020 (regularly amended)
Character repertoire	Identical to Unicode
Encoding forms defined	UTF-8, UTF-16, UTF-32, UCS-4
Synchronization with Unicode	Maintained by both organizations

คำศัพท์ที่เกี่ยวข้อง

Unicode จุดรหัส

เพิ่มเติมใน มาตรฐาน Unicode

Basic Multilingual Plane (BMP)

ระนาบ 0 (U+0000–U+FFFF) ประกอบด้วยอักขระที่ใช้บ่อยที่สุด ได้แก่ Latin, Greek, Cyrillic, CJK, Arabic และสัญลักษณ์ส่วนใหญ่ อักขระในระนาบนี้พอดีกับหนึ่งหน่วยรหัส …

CJK

จีน ญี่ปุ่น และเกาหลี คำรวมสำหรับบล็อกอักษรจีน Han ที่รวมกันและอักษรที่เกี่ยวข้องใน Unicode CJK Unified Ideographs มีอักขระมากกว่า 20,992 …

Han Unification

The process of mapping Chinese, Japanese, and Korean ideographs that share a …

Hangul Jamo

The individual consonant and vowel components (jamo) of the Korean Hangul writing …

Unicode

มาตรฐานการเข้ารหัสอักขระสากลที่กำหนดหมายเลขเฉพาะ (จุดรหัส) ให้กับทุกอักขระในทุกระบบการเขียน เวอร์ชัน 16.0 มีอักขระที่กำหนดแล้ว 154,998 ตัว

Unicode Character Database (UCD)

คอลเลกชันไฟล์ข้อมูลที่อ่านได้ด้วยเครื่องซึ่งกำหนดคุณสมบัติอักขระ Unicode ทั้งหมด รวมถึง UnicodeData.txt, Blocks.txt, Scripts.txt และอื่นๆ

Unicode Standard Annex (UAX)

Normative or informative documents that are integral parts of the Unicode Standard. …

Unicode Technical Report (UTR)

Informational documents published by the Unicode Consortium covering specific topics like security …

ค่าสเกลาร์ Unicode

จุดรหัสใดๆ ยกเว้นจุดรหัส surrogate (U+D800–U+DFFF) ชุดค่าที่ถูกต้องซึ่งสามารถแทนอักขระจริงได้ รวมทั้งสิ้น 1,112,064 ค่า

จุดรหัส

ค่าตัวเลขในพื้นที่รหัส Unicode (U+0000 ถึง U+10FFFF) เขียนในรูปแบบ U+XXXX ไม่ใช่ทุกจุดรหัสที่จะถูกกำหนดให้กับอักขระ

← กลับไปยังอภิธานศัพท์