What is Büyük/küçük harf dönüşümü?

Karakterleri büyük harf, küçük harf ve başlık harfi arasında dönüştürme kuralları. Yerel ayara bağlı olabilir (Türkçe I problemi) ve bire-çok olabilir (ß → SS).

What is Harmanlama algoritması?

Unicode dizilerini çok seviyeli karşılaştırma kullanarak karşılaştırma ve sıralama için standart algoritma: temel karakter → aksanlar → büyük/küçük harf → dengeleyiciler. Yerel ayara göre özelleştirilebilir.

What is Normalleştirme?

Unicode metnini standart kanonik forma dönüştürme işlemi. Dört form: NFC (birleştirilmiş), NFD (ayrıştırılmış), NFKC (uyumluluk birleştirilmiş), NFKD (uyumluluk ayrıştırılmış).

Algoritmalar

Case Folding

Mapping characters to a common case form for case-insensitive comparison. More comprehensive than lowercasing: German ß → ss, Turkish İ → i (with locale considerations).

What is Case Folding?

Case folding is a Unicode operation that converts text to a form suitable for case-insensitive comparison. It is defined in the Unicode Standard and supported by the CaseFolding.txt data file in the Unicode Character Database. Case folding is closely related to, but distinct from, simple lowercasing: while lowercasing converts a string to its lowercase representation for display, case folding converts a string to a canonical form specifically optimized for string comparison regardless of case.

The practical difference is that case folding handles edge cases that simple lowercasing misses — particularly in languages with complex case mapping behavior.

CaseFolding.txt: The Data File

The Unicode Consortium publishes CaseFolding.txt as part of the Unicode Character Database. It maps each character to its case-folded form using one of four status codes:

Status	Meaning
C (Common)	Safe for all contexts; included in both simple and full folding
F (Full)	Full case folding only; maps one character to multiple characters
S (Simple)	Simple case folding only; maps one character to one character
T (Turkic)	Special folding for Turkic languages (replaces C/S mappings)

Simple vs. Full Case Folding

Simple case folding maps every character to at most one character — a one-to-one mapping. It is suitable for environments where string length must be preserved.

Full case folding allows one character to map to a sequence of multiple characters. The classic example is the German sharp S:

ß (U+00DF, Latin Small Letter Sharp S)
Simple fold: ß → ß (no change — no uppercase in simple mapping)
Full fold: ß → ss (two characters)

This means that a full case-fold comparison of "STRASSE" and "Straße" would correctly identify them as equal (both fold to "strasse"), while a simple lowercase comparison would not.

# Python uses full case folding via str.casefold()
"STRASSE".casefold() == "Straße".casefold()  # True
"STRASSE".lower() == "Straße".lower()         # False

# The key difference
"Straße".casefold()  # "strasse"
"Straße".lower()     # "straße"  ← ß preserved

Python's str.casefold() implements full Unicode case folding, while str.lower() implements Unicode simple lowercasing.

Locale-Sensitive Folding: The Turkish Problem

The most significant locale-specific case folding issue involves the Turkish and Azerbaijani I. In most languages:

Uppercase I → lowercase i
Uppercase İ does not exist (or is rare)

In Turkish and Azerbaijani: - Uppercase İ (U+0130, Latin Capital Letter I with Dot Above) → lowercase i (U+0069) - Uppercase I (U+0049, Latin Capital Letter I) → lowercase ı (U+0131, Latin Small Letter Dotless I)

The T (Turkic) status entries in CaseFolding.txt provide the Turkic-specific mappings. Standard Unicode case folding without the T entries is incorrect for Turkish text: it would map I → i rather than I → ı, causing "KISA" and "kısa" (meaning "short") to compare as unequal while "KISA" and "kisa" would compare as equal — the wrong result.

# Correct Turkish case comparison requires locale awareness
import locale
# Python's str.casefold() uses C-locale folding (non-Turkic)
# For Turkish: use icu-python or a locale-aware library

How Case Folding Differs from Lowercasing

Operation	Purpose	Handles ß→ss	Handles Turkish İ	String length
`str.lower()`	Display (lowercase)	No (ß→ß)	No (I→i)	Preserved
`str.casefold()`	Comparison	Yes (ß→ss)	No	May increase
Turkic case fold	Comparison in TR/AZ	Yes	Yes	May increase

Quick Facts

Property	Value
Data file	`CaseFolding.txt` in Unicode Character Database
Status codes	C (Common), F (Full), S (Simple), T (Turkic)
Key difference from lower()	Full folding expands ß→ss
Turkish exception	I→ı and İ→i (T status entries)
Python simple fold	`str.lower()`
Python full fold	`str.casefold()`
Use case	Case-insensitive string comparison and search

İlgili Terimler

Büyük/küçük harf dönüşümü Harmanlama algoritması Normalleştirme

Algoritmalar içinde daha fazlası

Bileşim dışlama

Başlatıcı olmayan ayrıştırmayı önlemek ve algoritmik kararlılığı sağlamak için kanonik birleştirmeden (NFC) …

Cümle sınırı

Unicode kurallarına göre cümleler arasındaki konum. Noktalara göre bölmekten daha karmaşıktır — …

Grapheme Cluster Boundary

Rules (UAX#29) for determining where one user-perceived character ends and another begins. …

Harmanlama algoritması

Unicode dizilerini çok seviyeli karşılaştırma kullanarak karşılaştırma ve sıralama için standart algoritma: …

Kelime sınırı

Unicode kelime kesme kurallarına göre belirlenen kelimeler arasındaki konum. Boşluklara göre basit …

Metin bölümleme

Metinde sınır bulma algoritmaları: grafem kümesi, kelime ve cümle sınırları. İmleç hareketi, …

NFC (Canonical Composition)

Normalization Form C: kanonik olarak ayrıştırıp sonra yeniden birleştirerek en kısa formu …

NFD (Canonical Decomposition)

Normalization Form D: yeniden birleştirmeden tamamen ayrıştırır. macOS HFS+ dosya sistemi tarafından …

NFKC (Compatibility Composition)

Normalization Form KC: uyumluluk ayrıştırması ardından kanonik birleştirme. Görsel olarak benzer karakterleri …

NFKD (Compatibility Decomposition)

Normalization Form KD: yeniden birleştirme olmadan uyumluluk ayrıştırması. En agresif normalleştirme, en …

← Sözlüğe Geri Dön