사용자 정의 영역 (PUA)
조직이 자체 문자를 할당할 수 있도록 예약된 영역: BMP PUA(U+E000~U+F8FF)와 제15·16평면의 보충 PUA를 포함합니다.
What is the Private Use Area?
The Private Use Area (PUA) refers to three ranges of Unicode code points that are permanently reserved for applications to define their own characters. Unlike most of the Unicode code space, PUA code points will never be assigned official characters by the Unicode Consortium. Instead, any organization can use them for proprietary characters — custom icons, corporate logos, game symbols, or glyphs not yet in Unicode.
There are three PUA regions in Unicode:
| Name | Range | Size |
|---|---|---|
| BMP Private Use Area | U+E000–U+F8FF | 6,400 code points |
| Supplementary Private Use Area A | U+F0000–U+FFFFF | 65,534 code points |
| Supplementary Private Use Area B | U+100000–U+10FFFF | 65,534 code points |
Total: 137,468 code points — by far the largest reserved region in Unicode.
How the PUA is Used
Because PUA code points have no standard meaning, their interpretation is entirely up to the parties exchanging the text. This requires both sides to agree on a mapping — typically through a custom font that maps PUA code points to specific glyphs.
Common use cases:
-
Icon fonts — Font Awesome, Material Icons, and similar libraries map their icons to PUA code points (e.g., U+F000+ for Font Awesome). The font renders the PUA code point as the intended icon.
-
Corporate logo characters — Companies sometimes use PUA slots for brand marks in specialized documents.
-
Pre-standardization characters — Klingon, Tengwar (Tolkien's Elvish script), and other scripts not yet in Unicode have community-defined PUA assignments (the ConScript Unicode Registry, CSUR).
-
Regional/historic writing systems — Script communities waiting for official Unicode approval use the PUA for interoperability within their community.
The Interoperability Problem
PUA usage is inherently non-interoperable across different applications or organizations unless
both use the same font and the same mapping. A PUA code point U+E001 might be a "thumbs up"
icon in one font and a currency symbol in another. When text with PUA characters is exchanged
between systems using different fonts, the result is meaningless glyphs.
# PUA code points have no official name
import unicodedata
cp = 0xE001 # PUA code point
try:
name = unicodedata.name(chr(cp))
except ValueError as e:
print(e) # no such name
category = unicodedata.category(chr(cp))
print(category) # "Co" (Private Use)
PUA in Emoji History
Before emoji were standardized in Unicode 6.0 (2010), Japanese mobile carriers (DoCoMo, KDDI, SoftBank) each used their own PUA encodings for emoji. DoCoMo used the range U+E63E–U+E757; SoftBank used a different range. This is why early cross-carrier emoji were garbled — each carrier had a different PUA mapping. Unicode 6.0 unified these into standardized code points.
Detecting PUA Characters
import unicodedata
def is_pua(char: str) -> bool:
return unicodedata.category(char) == "Co"
print(is_pua("\uE001")) # True (BMP PUA)
print(is_pua("\U000F0001")) # True (Supplementary PUA A)
print(is_pua("A")) # False
Common Pitfalls
Assuming PUA characters are portable: Never embed PUA characters in data exchanged with external systems without documenting the required font/mapping.
Font Awesome characters in databases: Storing Font Awesome PUA icons in a database works only if the rendering system also uses Font Awesome. On different systems, PUA values appear as blank boxes or unrelated glyphs.
Quick Facts
| Property | Value |
|---|---|
| BMP PUA range | U+E000–U+F8FF |
| Supplementary PUA A | U+F0000–U+FFFFF |
| Supplementary PUA B | U+100000–U+10FFFF |
| Total PUA code points | 137,468 |
| General category | Co (Private Use) |
| Official character assignment | Never — permanently private |
| Common use | Icon fonts (Font Awesome, Material Icons) |
| Registry for scripts | CSUR (ConScript Unicode Registry) |
관련 용어
유니코드 표준의 더 많은 용어
한중일 — 유니코드에서 통합 한자 블록 및 관련 문자 체계를 아우르는 집합적 …
The process of mapping Chinese, Japanese, and Korean ideographs that share a …
The individual consonant and vowel components (jamo) of the Korean Hangul writing …
유니코드와 동기화된 국제 표준(ISO/IEC 10646)으로, 동일한 문자 목록과 코드 포인트를 정의하지만 유니코드의 …
모든 문자 체계의 모든 문자에 고유 번호(코드 포인트)를 부여하는 범용 문자 인코딩 …
Normative or informative documents that are integral parts of the Unicode Standard. …
Informational documents published by the Unicode Consortium covering specific topics like security …
평면 0(U+0000~U+FFFF)으로, 라틴, 그리스, 키릴, CJK, 아랍 문자 및 대부분의 기호 등 …
어느 유니코드 버전에서도 문자가 할당되지 않은 코드 포인트로, Cn(미할당)으로 분류됩니다. 향후 버전에서 …
평면 1~16(U+10000~U+10FFFF)으로, 이모지, 고대 문자, CJK 확장, 악보 등을 포함합니다. UTF-16에서는 서로게이트 …