What is 유니코드 양방향 알고리즘 (UBA)?

문자 양방향 범주와 명시적 방향 재정의를 사용하여 혼합 방향 텍스트(예: 영어 + 아랍어)의 표시 순서를 결정하는 알고리즘.

What is RTL (오른쪽에서 왼쪽)?

문자가 오른쪽에서 왼쪽으로 흐르는 텍스트 방향. 아랍어, 히브리어, 타아나 문자 등에서 사용되며, 올바른 표시를 위해 양방향 알고리즘이 필요합니다.

속성

양방향 범주

양방향 텍스트(왼쪽에서 오른쪽, 오른쪽에서 왼쪽, 약, 중립)에서 문자의 동작 방식을 결정하는 속성. 표시 순서를 결정하기 위해 유니코드 양방향 알고리즘이 사용합니다.

2022-02-09 · Updated 2024-08-19

What Is the Bidirectional Category?

The Bidi Class (formally Bidi_Class, also called bidirectional category) is a Unicode property that controls how characters are positioned in a line of mixed left-to-right (LTR) and right-to-left (RTL) text. It is the primary input to the Unicode Bidirectional Algorithm (UBA, described in Unicode Standard Annex #9), which determines the visual display order of characters in a paragraph.

Every code point is assigned one of 23 Bidi Class values. The algorithm uses these values—along with explicit directional override characters—to resolve the correct rendering order for Arabic mixed with English, Hebrew mixed with numbers, or any other bidirectional combination.

The Major Bidi Class Values

Code	Name	Typical Characters
L	Left-to-Right	Latin letters, digits in LTR context
R	Right-to-Left	Hebrew letters
AL	Arabic Letter	Arabic and Thaana letters
EN	European Number	0–9
AN	Arabic Number	Arabic-Indic digits ٠–٩
ES	European Separator	+ −
ET	European Terminator	$ % °
ON	Other Neutral	most punctuation
BN	Boundary Neutral	Format chars, ZWJ
NSM	Non-Spacing Mark	combining marks (inherit from base)
WS	Whitespace	space, tab
B	Paragraph Separator	U+2029
S	Segment Separator	tab in certain contexts
LRE/RLE/LRO/RLO	Explicit Embedding	directional embedding characters
LRM/RLM	Mark	U+200E LEFT-TO-RIGHT MARK, U+200F
LRI/RLI/FSI/PDI	Isolate	Unicode 6.3+ directional isolates

import unicodedata

chars = [("A", "Latin"), ("ب", "Arabic"), ("5", "Digit"),
         ("\u200F", "RLM"), ("\u200E", "LRM")]

for char, label in chars:
    bc = unicodedata.bidirectional(char)
    print(f"  {label:12} U+{ord(char):04X}  Bidi={bc}")

# Latin        U+0041  Bidi=L
# Arabic       U+0628  Bidi=AL
# Digit        U+0035  Bidi=EN
# RLM          U+200F  Bidi=R
# LRM          U+200E  Bidi=L

Why It Matters in Practice

Without correct bidi handling, a string like "Hello مرحبا World" will display with the Arabic word in the wrong position or with punctuation displaced. HTML provides dir attributes and the Unicode characters U+200F (RLM), U+200E (LRM), and the bidi isolate characters (U+2066–U+2069) to guide the algorithm. Web developers working with RTL content must understand that the visual order of characters on screen differs from their logical (storage) order.

Quick Facts

Property	Value
Unicode property name	`Bidi_Class`
Short alias	`bc`
Number of values	23
Python function	`unicodedata.bidirectional(char)` → string code
Algorithm spec	Unicode Standard Annex #9 (UAX #9)
Key characters	U+200E LRM, U+200F RLM, U+2066–U+2069 isolates