What is Categoría bidireccional?

Propiedad que determina cómo se comporta un carácter en texto bidireccional (LTR, RTL, débil, neutro). Utilizada por el Algoritmo Bidireccional de Unicode para determinar el orden de visualización.

What is RTL (Right-to-Left)?

Direccionalidad del texto donde los caracteres fluyen de derecha a izquierda. Utilizada por el árabe, el hebreo, el thaana y otros alfabetos; requiere el Algoritmo Bidireccional para su correcta visualización.

What is Ataque de anulación bidireccional?

Uso de caracteres de anulación bidireccional Unicode (U+202A–U+202E, U+2066–U+2069) para disfrazar nombres de archivos o código malicioso. 'readme‮fdp.exe' se muestra como 'readmeexe.pdf'.

Algoritmos

Algoritmo bidireccional

Algoritmo que determina el orden de visualización de los caracteres en texto con dirección mixta (por ejemplo, inglés + árabe), usando las categorías bidi de los caracteres y anulaciones direccionales explícitas.

2022-08-08 · Updated 2024-10-08

The Challenge of Mixed-Direction Text

English reads left-to-right (LTR). Arabic and Hebrew read right-to-left (RTL). When you have both in the same paragraph — a common situation in multilingual documents, URLs in Arabic text, or numbers in Hebrew — the rendering engine needs a precise set of rules to determine the visual order of characters. That set of rules is the Unicode Bidirectional Algorithm (UBA), specified in Unicode Standard Annex #9.

The UBA operates on logical order (the order characters are stored) and produces visual order (the order glyphs are rendered on screen). Most of the time this is invisible to users — text just displays correctly. But when it goes wrong, entire paragraphs can appear mirrored, or security-relevant filenames can be displayed in a different order than they are stored.

Implicit vs. Explicit Directionality

The UBA assigns every character a Bidi category based on its Unicode property. Common categories:

Category	Abbr	Examples
Left-to-Right	L	Latin, Cyrillic, CJK
Right-to-Left	R	Hebrew
Arabic Letter	AL	Arabic, Thaana
European Number	EN	0–9
Common Separator	CS	`,` `.`
Paragraph Separator	B	newline
Boundary Neutral	BN	formatting characters

Using these categories, the algorithm assigns embedding levels (even = LTR, odd = RTL) and resolves the visual order automatically. This implicit handling covers the vast majority of cases.

Explicit Directional Formatting Characters

When implicit resolution produces the wrong order, Unicode provides directional formatting characters to override it:

Character	Code point	Name	Purpose
LRE	U+202A	Left-to-Right Embedding	Start LTR embedded text
RLE	U+202B	Right-to-Left Embedding	Start RTL embedded text
LRO	U+202D	Left-to-Right Override	Force LTR regardless of characters
RLO	U+202E	Right-to-Left Override	Force RTL regardless of characters
PDF	U+202C	Pop Directional Formatting	End embedding/override
LRI	U+2066	Left-to-Right Isolate	Isolate LTR run (Unicode 6.3+)
RLI	U+2067	Right-to-Left Isolate	Isolate RTL run (Unicode 6.3+)
FSI	U+2068	First Strong Isolate	Auto-detect direction
PDI	U+2069	Pop Directional Isolate	End isolate

The LRI/RLI/FSI/PDI isolate controls (added in Unicode 6.3) are preferred over the older embedding controls because isolates do not affect the surrounding text's bidi resolution — they are fully contained.

Security: The Bidi Trojan Source Attack

The RLO character (U+202E) can be used maliciously to display a filename or code string in a different order than it is stored. A file named innocent‮fdp.exe can display as innocent.pdf. This "Trojan Source" attack (CVE-2021-42574) affected code editors that rendered bidi formatting in source files. Mitigation: strip or escape U+202A–U+202E and U+2066–U+2069 in user-supplied text displayed in security contexts.

Quick Facts

Property	Value
Specification	Unicode Standard Annex #9 (UAX #9)
Also known as	UBA, Bidi Algorithm
Paragraph base direction	Determined by first strong character, or explicit override
CSS property	`direction: rtl/ltr`, `unicode-bidi: embed/bidi-override/isolate`
HTML attribute	`dir="rtl"`, `dir="ltr"`, `dir="auto"`
Security risk	RLO spoofing (Trojan Source, CVE-2021-42574)
Preferred controls	Isolates (LRI/RLI/FSI/PDI) over legacy embeddings

Términos relacionados

Categoría bidireccional RTL (Right-to-Left) Ataque de anulación bidireccional

Más en Algoritmos

Algoritmo de cotejo

Algoritmo estándar para comparar y ordenar cadenas Unicode mediante comparación multinivel: carácter …

Algoritmo de salto de línea

Reglas para determinar dónde puede dividirse el texto para pasar a la …

Case Folding

Mapping characters to a common case form for case-insensitive comparison. More comprehensive …

Exclusión de composición

Caracteres excluidos de la composición canónica (NFC) para evitar la descomposición no …

Grapheme Cluster Boundary

Rules (UAX#29) for determining where one user-perceived character ends and another begins. …

Límite de oración

La posición entre oraciones según las reglas de Unicode. Más complejo que …

Límite de palabra

La posición entre palabras según las reglas de separación de palabras de …

NFC (Canonical Composition)

Forma de Normalización C: descomponer y luego recomponer canónicamente, produciendo la forma …

NFD (Canonical Decomposition)

Forma de Normalización D: descomposición total sin recomponer. Usada por el sistema …

NFKC (Compatibility Composition)

Forma de Normalización KC: descomposición de compatibilidad seguida de composición canónica. Fusiona …

← Volver al glosario