What is สมาคม Unicode?

องค์กรไม่แสวงหาผลกำไรที่พัฒนาและดูแล Unicode Standard สมาชิกได้แก่ Apple, Google, Microsoft, Meta และอีกหลายองค์กร

มาตรฐาน Unicode

เวอร์ชัน Unicode

เวอร์ชันหลักของ Unicode Standard แต่ละเวอร์ชันเพิ่มอักขระ อักษร และคุณสมบัติใหม่ เวอร์ชันปัจจุบันคือ Unicode 16.0 (กันยายน 2025)

2021-06-30 · Updated 2024-07-16

What is a Unicode Version?

A Unicode version is a numbered, dated release of the Unicode Standard that may add new characters, scripts, blocks, emoji, or property values, and may update algorithms, documentation, or normalization tables. Each version is identified by a major.minor numbering scheme (e.g., 15.0, 15.1, 16.0), and versions are published on a roughly annual cycle.

Understanding Unicode versions matters for software developers because: - An application may not support characters added in versions newer than its Unicode library - Character properties can change between minor versions (though stability policies limit this) - Emoji added in a new version will display as boxes ("tofu") on devices with older Unicode support

Version History Highlights

Version	Year	Notable Additions
1.0	1991	First release — 7,129 characters, 24 scripts
2.0	1996	Extended to 17 planes; surrogate pairs standardized
3.0	1999	49,194 characters; Braille, Ethiopic, Cherokee
4.0	2003	Cypriot, Gothic, Shavian
5.0	2006	NKo, Balinese, Coptic
6.0	2010	Emoji officially added (722 characters)
7.0	2014	Pictographic symbols, chess symbols
8.0	2015	Skin tone modifiers for emoji
10.0	2017	Bitcoin sign (₿), 56 emoji
12.0	2019	61 new emoji including gender-neutral options
13.0	2020	55 new emoji; Chorasmian, Dives Akuru scripts
14.0	2021	37 new emoji; Toto, Vithkuqi scripts
15.0	2022	20 new emoji; Kawi, Nag Mundari
15.1	2023	Minor update; 627 new CJK ideographs
16.0	2024	154,998 total characters; Garay, Sunuwar scripts

Major vs Minor Versions

Unicode uses a two-level versioning scheme:

Major versions (e.g., 15.0 → 16.0): Can add new scripts, new blocks, new characters, and may update normalization data
Minor versions (e.g., 15.0 → 15.1): More limited — typically add characters to existing blocks, fix documentation errors, or update property values for already-assigned characters. The Unicode Stability Policy restricts what can change in minor versions.

Checking Unicode Version in Code

import sys
import unicodedata

# Python's supported Unicode version
print(sys.version)                    # includes Python version
print(unicodedata.unidata_version)    # e.g., "15.0.0"

# Check if a character was assigned in the current version
name = unicodedata.name("😀", "UNKNOWN")
print(name)  # "GRINNING FACE"

// JavaScript environments don't expose Unicode version directly
// but Intl.Collator and RegExp /u flag use the engine's Unicode version

// In V8 (Node.js), check:
// node --version alongside ICU version in process.versions.icu
console.log(process.versions.unicode); // e.g., "15.0"

Version Compatibility Issues

The "tofu" problem: Emoji added in Unicode 15.0 display as a box or question mark on devices that shipped before version 15.0 support was added to their OS. This is most common on older Android versions.

Library lag: Python 3.11 uses Unicode 15.0; Python 3.13 uses Unicode 15.1. If your application processes text with characters from a version newer than your Python's Unicode library, those characters may be misclassified.

Normalization stability: Once a character's decomposition is defined, it cannot change. This means normalization form conversions are version-stable for all previously assigned characters.

Quick Facts

Property	Value
Current version	16.0 (September 2024)
Total assigned (v16.0)	154,998 characters
New scripts in v16.0	Garay, Sunuwar, Gurung Khema, Kirat Rai, Ol Onal
Release cadence	Approximately annual
Version format	major.minor (e.g., 15.1)
Stability guarantee	Assigned characters never removed or reassigned
Emoji count (v16.0)	~3,790 (with all variations/sequences)

คำศัพท์ที่เกี่ยวข้อง

Unicode สมาคม Unicode

เพิ่มเติมใน มาตรฐาน Unicode

Basic Multilingual Plane (BMP)

ระนาบ 0 (U+0000–U+FFFF) ประกอบด้วยอักขระที่ใช้บ่อยที่สุด ได้แก่ Latin, Greek, Cyrillic, CJK, Arabic และสัญลักษณ์ส่วนใหญ่ อักขระในระนาบนี้พอดีกับหนึ่งหน่วยรหัส …

CJK

จีน ญี่ปุ่น และเกาหลี คำรวมสำหรับบล็อกอักษรจีน Han ที่รวมกันและอักษรที่เกี่ยวข้องใน Unicode CJK Unified Ideographs มีอักขระมากกว่า 20,992 …

Han Unification

The process of mapping Chinese, Japanese, and Korean ideographs that share a …

Hangul Jamo

The individual consonant and vowel components (jamo) of the Korean Hangul writing …

ISO 10646 / Universal Character Set

มาตรฐานสากล (ISO/IEC 10646) ที่ซิงโครไนซ์กับ Unicode กำหนดชุดอักขระและจุดรหัสเดียวกัน แต่ไม่มีอัลกอริธึมและคุณสมบัติเพิ่มเติมของ Unicode

Unicode

มาตรฐานการเข้ารหัสอักขระสากลที่กำหนดหมายเลขเฉพาะ (จุดรหัส) ให้กับทุกอักขระในทุกระบบการเขียน เวอร์ชัน 16.0 มีอักขระที่กำหนดแล้ว 154,998 ตัว

Unicode Character Database (UCD)

คอลเลกชันไฟล์ข้อมูลที่อ่านได้ด้วยเครื่องซึ่งกำหนดคุณสมบัติอักขระ Unicode ทั้งหมด รวมถึง UnicodeData.txt, Blocks.txt, Scripts.txt และอื่นๆ

Unicode Standard Annex (UAX)

Normative or informative documents that are integral parts of the Unicode Standard. …

Unicode Technical Report (UTR)

Informational documents published by the Unicode Consortium covering specific topics like security …

ค่าสเกลาร์ Unicode

จุดรหัสใดๆ ยกเว้นจุดรหัส surrogate (U+D800–U+DFFF) ชุดค่าที่ถูกต้องซึ่งสามารถแทนอักขระจริงได้ รวมทั้งสิ้น 1,112,064 ค่า

← กลับไปยังอภิธานศัพท์