What is Unicode 双方向アルゴリズム (UBA)?

文字の双方向カテゴリと明示的な方向オーバーライドを使って、混在方向テキスト（例：英語＋アラビア語）の表示順序を決定するアルゴリズム。

What is RTL（右から左）?

文字が右から左に流れるテキスト方向。アラビア語・ヘブライ語・ターナ文字などで使われ、正しい表示のために双方向アルゴリズムが必要です。

プロパティ

双方向カテゴリー

双方向テキスト（LTR・RTL・弱・中立）における文字の振る舞いを決定するプロパティ。表示順序を決定するためにUnicode双方向アルゴリズムが使います。

2022-02-09 · Updated 2024-08-19

What Is the Bidirectional Category?

The Bidi Class (formally Bidi_Class, also called bidirectional category) is a Unicode property that controls how characters are positioned in a line of mixed left-to-right (LTR) and right-to-left (RTL) text. It is the primary input to the Unicode Bidirectional Algorithm (UBA, described in Unicode Standard Annex #9), which determines the visual display order of characters in a paragraph.

Every code point is assigned one of 23 Bidi Class values. The algorithm uses these values—along with explicit directional override characters—to resolve the correct rendering order for Arabic mixed with English, Hebrew mixed with numbers, or any other bidirectional combination.

The Major Bidi Class Values

Code	Name	Typical Characters
L	Left-to-Right	Latin letters, digits in LTR context
R	Right-to-Left	Hebrew letters
AL	Arabic Letter	Arabic and Thaana letters
EN	European Number	0–9
AN	Arabic Number	Arabic-Indic digits ٠–٩
ES	European Separator	+ −
ET	European Terminator	$ % °
ON	Other Neutral	most punctuation
BN	Boundary Neutral	Format chars, ZWJ
NSM	Non-Spacing Mark	combining marks (inherit from base)
WS	Whitespace	space, tab
B	Paragraph Separator	U+2029
S	Segment Separator	tab in certain contexts
LRE/RLE/LRO/RLO	Explicit Embedding	directional embedding characters
LRM/RLM	Mark	U+200E LEFT-TO-RIGHT MARK, U+200F
LRI/RLI/FSI/PDI	Isolate	Unicode 6.3+ directional isolates

import unicodedata

chars = [("A", "Latin"), ("ب", "Arabic"), ("5", "Digit"),
         ("\u200F", "RLM"), ("\u200E", "LRM")]

for char, label in chars:
    bc = unicodedata.bidirectional(char)
    print(f"  {label:12} U+{ord(char):04X}  Bidi={bc}")

# Latin        U+0041  Bidi=L
# Arabic       U+0628  Bidi=AL
# Digit        U+0035  Bidi=EN
# RLM          U+200F  Bidi=R
# LRM          U+200E  Bidi=L

Why It Matters in Practice

Without correct bidi handling, a string like "Hello مرحبا World" will display with the Arabic word in the wrong position or with punctuation displaced. HTML provides dir attributes and the Unicode characters U+200F (RLM), U+200E (LRM), and the bidi isolate characters (U+2066–U+2069) to guide the algorithm. Web developers working with RTL content must understand that the visual order of characters on screen differs from their logical (storage) order.

Quick Facts

Property	Value
Unicode property name	`Bidi_Class`
Short alias	`bc`
Number of values	23
Python function	`unicodedata.bidirectional(char)` → string code
Algorithm spec	Unicode Standard Annex #9 (UAX #9)
Key characters	U+200E LRM, U+200F RLM, U+2066–U+2069 isolates