유니코드 스칼라 값
Embed This Widget
Add the script tag and a data attribute to embed this widget.
Embed via iframe for maximum compatibility.
<iframe src="https://unicodefyi.com/iframe/glossary/unicode-scalar-value/" width="420" height="400" frameborder="0" style="border:0;border-radius:10px;max-width:100%" loading="lazy"></iframe>
Paste this URL in WordPress, Medium, or any oEmbed-compatible platform.
https://unicodefyi.com/glossary/unicode-scalar-value/
Add a dynamic SVG badge to your README or docs.
[](https://unicodefyi.com/glossary/unicode-scalar-value/)
Use the native HTML custom element.
서로게이트 코드 포인트(U+D800~U+DFFF)를 제외한 모든 코드 포인트. 실제 문자를 나타낼 수 있는 유효한 값의 집합으로, 총 1,112,064개입니다.
What is a Unicode Scalar Value?
A Unicode scalar value is any Unicode code point except the 2,048 surrogate code points (U+D800–U+DFFF). In other words, scalar values are all code points in the ranges U+0000–U+D7FF and U+E000–U+10FFFF, totaling 1,112,064 values.
The concept was introduced because UTF-8 and UTF-32 cannot encode surrogate code points — they are only meaningful in UTF-16 as part of surrogate pairs. A value that can validly be encoded in all three UTF forms is called a scalar value.
The Rust programming language uses char to represent exactly a Unicode scalar value,
making this concept central to Rust's string semantics.
Why Exclude Surrogates?
In UTF-16, surrogates only have meaning as paired code units. A lone surrogate (not part of a valid pair) represents a malformed sequence. Including surrogates in "valid Unicode scalars" would mean applications must handle them as standalone code points — but there is nothing meaningful they could represent.
By defining scalar values as the set of code points that can be independently encoded, Unicode avoids the ambiguity of "what does a lone surrogate mean outside UTF-16?"
All code points: U+0000–U+10FFFF = 1,114,112
Surrogate code points: U+D800–U+DFFF = 2,048
Unicode scalar values: all minus surrogates = 1,112,064
Scalar Values in Rust
Rust's char type is defined as "a Unicode scalar value" — exactly 4 bytes, guaranteed to be in
the range U+0000–U+D7FF or U+E000–U+10FFFF:
// Rust: char = Unicode scalar value
let a: char = 'A'; // U+0041
let emoji: char = '😀'; // U+1F600
let max: char = '\u{10FFFF}'; // Maximum scalar value
// This would be a compile error (surrogates are not scalar values):
// let bad: char = '\u{D800}'; // ERROR: not a valid Unicode scalar
// u32::from(char) always gives a valid scalar value
println!("{}", u32::from(emoji)); // 128512
Scalar Values in Swift
Swift's Unicode.Scalar type mirrors Rust's concept:
let scalar = Unicode.Scalar("A")!
print(scalar.value) // 65
// Surrogate values are rejected:
let invalid = Unicode.Scalar(0xD800) // nil — not a scalar value
Scalar Values in Python
Python 3's str type is a sequence of Unicode code points. Technically, Python allows lone
surrogates in strings (using \ud800 etc.), but they are not proper scalar values:
# Python allows surrogates in strings but they are problematic
s = "\uD800" # lone high surrogate
print(s.encode("utf-8")) # raises UnicodeEncodeError — not a scalar value
# Proper scalar values encode without error
s = "😀"
print(s.encode("utf-8")) # b'\xf0\x9f\x98\x80' — fine
Scalar Values vs Characters
Unicode scalar values are a subset of code points, but they are still not identical to user-perceived characters (grapheme clusters). A single grapheme cluster like 🏳️🌈 (rainbow flag) consists of four scalar values:
U+1F3F3 WHITE FLAG
U+FE0F VARIATION SELECTOR-16
U+200D ZERO WIDTH JOINER
U+1F308 RAINBOW
All four are valid scalar values, but together they form one grapheme cluster.
Quick Facts
| Property | Value |
|---|---|
| Definition | All code points except surrogates |
| Range | U+0000–U+D7FF and U+E000–U+10FFFF |
| Total count | 1,112,064 |
| Excluded | Surrogates U+D800–U+DFFF (2,048 values) |
| Rust type | char (exactly a Unicode scalar value) |
| Swift type | Unicode.Scalar |
| Encodable in UTF-8? | Yes — all scalar values have valid UTF-8 encodings |
| Encodable in UTF-32? | Yes |
| Encodable in UTF-16? | Yes (with surrogate pairs for supplementary scalars) |
관련 용어
유니코드 표준의 더 많은 용어
한중일 — 유니코드에서 통합 한자 블록 및 관련 문자 체계를 아우르는 집합적 …
The process of mapping Chinese, Japanese, and Korean ideographs that share a …
The individual consonant and vowel components (jamo) of the Korean Hangul writing …
유니코드와 동기화된 국제 표준(ISO/IEC 10646)으로, 동일한 문자 목록과 코드 포인트를 정의하지만 유니코드의 …
모든 문자 체계의 모든 문자에 고유 번호(코드 포인트)를 부여하는 범용 문자 인코딩 …
Normative or informative documents that are integral parts of the Unicode Standard. …
Informational documents published by the Unicode Consortium covering specific topics like security …
평면 0(U+0000~U+FFFF)으로, 라틴, 그리스, 키릴, CJK, 아랍 문자 및 대부분의 기호 등 …
어느 유니코드 버전에서도 문자가 할당되지 않은 코드 포인트로, Cn(미할당)으로 분류됩니다. 향후 버전에서 …
평면 1~16(U+10000~U+10FFFF)으로, 이모지, 고대 문자, CJK 확장, 악보 등을 포함합니다. UTF-16에서는 서로게이트 …