Unicode 標準

Unicode Technical Report (UTR)

Informational documents published by the Unicode Consortium covering specific topics like security mechanisms (UTR#39), text segmentation (UTR#29), and line breaking (UTR#14).

What is a Unicode Technical Report (UTR)?

A Unicode Technical Report (UTR) is an informative document published by the Unicode Consortium that provides guidance, analysis, data, or algorithms related to Unicode text processing. UTRs are not normative — they do not define requirements that implementations must follow to be conformant — but they carry significant authority as the Consortium's official technical recommendations and are widely implemented across platforms and programming languages.

UTRs are numbered sequentially and cover topics ranging from security vulnerabilities to text layout algorithms. They are distinct from the Unicode Standard itself and from Unicode Standard Annexes.

Key UTRs

UTR #39 — Unicode Security Mechanisms Perhaps the most security-critical Unicode document, UTR #39 defines algorithms and data for detecting potentially dangerous uses of Unicode in identifiers and text. It introduces the concepts of confusables, mixed-script detection, and identifier profiles. The associated data files (confusables.txt, intentional.txt) are used by browsers, programming language parsers, and security tools worldwide.

UTR #36 — Unicode Security Considerations A companion to UTR #39, this report catalogs the full range of security problems that arise from Unicode's complexity: visual spoofing, delimiter confusion, canonicalization attacks, overlong encodings (historical UTF-8 issue), and more. Essential reading for anyone building security-sensitive systems that process Unicode text.

UTR #29 — Unicode Text Segmentation (now a UAX) Originally a Technical Report, UTR #29 defined rules for finding grapheme cluster, word, and sentence boundaries. It has since been promoted to a Unicode Standard Annex (UAX #29), reflecting its normative status. Its history as a UTR illustrates the lifecycle: TR → promoted to UAX as implementations matured.

UTR #14 — Unicode Line Breaking Algorithm (now a UAX) Similarly, the line breaking algorithm — defining where text may be broken across lines — began as UTR #14 and was promoted to UAX #14 as it became essential infrastructure.

UTR #50 — Unicode Vertical Text Layout Defines the Vertical_Orientation property used by CSS writing-mode and other vertical layout implementations for East Asian text.

How UTRs Differ from UAXes

Aspect UTR UAX
Normative status Informative only Normative (part of the Standard)
Conformance Not required May be required
Publication Standalone document Integral to Unicode version releases
Lifecycle May be superseded, withdrawn, or promoted Updated with each Unicode version

UTRs often serve as the research and development stage for features that may eventually become normative UAXes. When the community achieves consensus on an algorithm or property, it may be promoted to a UAX and bundled with the Unicode Standard release.

Finding UTRs

All current UTRs are published at https://www.unicode.org/reports/ alongside UAXes and Unicode Technical Standards (UTSes). Each document lists its current status, revision history, and whether it has been superseded.

Quick Facts

Property Value
Normative status Informative (not required for conformance)
Naming convention UTR #N (e.g., UTR #39)
Key security UTR UTR #39 — Unicode Security Mechanisms
Key security data confusables.txt, intentional.txt
Publication URL unicode.org/reports/
Lifecycle May be promoted to UAX or withdrawn
Related document types UAX (normative), UTS (standard), UTS (data)

関連用語

Unicode 標準 のその他の用語

CJK(漢字・かな・ハングル)

中国語・日本語・韓国語 — Unicodeにおける統合漢字ブロックと関連スクリプトをまとめた総称。CJK統合漢字は20,992文字以上を含みます。

Han Unification

The process of mapping Chinese, Japanese, and Korean ideographs that share a …

Hangul Jamo

The individual consonant and vowel components (jamo) of the Korean Hangul writing …

ISO 10646 / 万国文字集合

Unicodeと同期している国際標準(ISO/IEC 10646)で、同じ文字目録とコードポイントを定義しますが、Unicodeの追加アルゴリズムやプロパティは含みません。

Unicode

あらゆる文字システムのすべての文字に固有の番号(コードポイント)を割り当てる普遍的文字エンコーディング規格。バージョン16.0には154,998個の割り当て済み文字が含まれます。

Unicode Standard Annex (UAX)

Normative or informative documents that are integral parts of the Unicode Standard. …

Unicode コンソーシアム

Unicode標準を開発・維持する非営利団体。Apple・Google・Microsoft・Metaなど多くの企業が会員です。

Unicode スカラー値

サロゲートコードポイント(U+D800〜U+DFFF)を除くすべてのコードポイント。実際の文字を表すことができる有効な値の集合で、合計1,112,064個です。

Unicode バージョン

新しい文字・文字体系・機能を追加するUnicode標準の主要リリース。現在のバージョンはUnicode 16.0(2025年9月)です。

Unicode 安定性ポリシー

一度割り当てられた文字のコードポイントと名前は絶対に変更されないことを保証するポリシー。プロパティは改訂される場合がありますが、割り当ては永続的です。