What is Unicode 双向算法 (UBA)?

利用字符双向类别和明确方向覆盖，确定混合方向文本（如英语+阿拉伯语）显示顺序的算法。

What is RTL（从右到左）?

字符从右向左流动的文本方向，用于阿拉伯语、希伯来语、塔阿纳等文字，正确显示需要双向算法。

字符属性

双向类别

决定字符在双向文本中（LTR、RTL、弱、中性）行为方式的属性，由Unicode双向算法用于确定显示顺序。

2022-02-09 · Updated 2024-08-19

What Is the Bidirectional Category?

The Bidi Class (formally Bidi_Class, also called bidirectional category) is a Unicode property that controls how characters are positioned in a line of mixed left-to-right (LTR) and right-to-left (RTL) text. It is the primary input to the Unicode Bidirectional Algorithm (UBA, described in Unicode Standard Annex #9), which determines the visual display order of characters in a paragraph.

Every code point is assigned one of 23 Bidi Class values. The algorithm uses these values—along with explicit directional override characters—to resolve the correct rendering order for Arabic mixed with English, Hebrew mixed with numbers, or any other bidirectional combination.

The Major Bidi Class Values

Code	Name	Typical Characters
L	Left-to-Right	Latin letters, digits in LTR context
R	Right-to-Left	Hebrew letters
AL	Arabic Letter	Arabic and Thaana letters
EN	European Number	0–9
AN	Arabic Number	Arabic-Indic digits ٠–٩
ES	European Separator	+ −
ET	European Terminator	$ % °
ON	Other Neutral	most punctuation
BN	Boundary Neutral	Format chars, ZWJ
NSM	Non-Spacing Mark	combining marks (inherit from base)
WS	Whitespace	space, tab
B	Paragraph Separator	U+2029
S	Segment Separator	tab in certain contexts
LRE/RLE/LRO/RLO	Explicit Embedding	directional embedding characters
LRM/RLM	Mark	U+200E LEFT-TO-RIGHT MARK, U+200F
LRI/RLI/FSI/PDI	Isolate	Unicode 6.3+ directional isolates

import unicodedata

chars = [("A", "Latin"), ("ب", "Arabic"), ("5", "Digit"),
         ("\u200F", "RLM"), ("\u200E", "LRM")]

for char, label in chars:
    bc = unicodedata.bidirectional(char)
    print(f"  {label:12} U+{ord(char):04X}  Bidi={bc}")

# Latin        U+0041  Bidi=L
# Arabic       U+0628  Bidi=AL
# Digit        U+0035  Bidi=EN
# RLM          U+200F  Bidi=R
# LRM          U+200E  Bidi=L

Why It Matters in Practice

Without correct bidi handling, a string like "Hello مرحبا World" will display with the Arabic word in the wrong position or with punctuation displaced. HTML provides dir attributes and the Unicode characters U+200F (RLM), U+200E (LRM), and the bidi isolate characters (U+2066–U+2069) to guide the algorithm. Web developers working with RTL content must understand that the visual order of characters on screen differs from their logical (storage) order.

Quick Facts

Property	Value
Unicode property name	`Bidi_Class`
Short alias	`bc`
Number of values	23
Python function	`unicodedata.bidirectional(char)` → string code
Algorithm spec	Unicode Standard Annex #9 (UAX #9)
Key characters	U+200E LRM, U+200F RLM, U+2066–U+2069 isolates