Text to Unicode Converter

The Text to Unicode Converter turns any string into its Unicode code points and the reverse. Five output formats — U+XXXX (the Unicode standard), \uXXXX (JavaScript escape), &#x; (HTML hex entity), &#; (HTML decimal entity), or bare decimal numbers. Emojis and rare CJK characters that use surrogate pairs in UTF-16 are decoded correctly via String.prototype.codePointAt — you get the actual Unicode code point, not two half-code-units.

Built by Bob Article by Lace QA by Ben Shipped May 9, 2026

How to use

1
Pick mode: Text → Unicode encodes characters into code points; Unicode → Text decodes a list of code points back into the original text.
2
Type or paste your text. The output appears instantly.
3
Encoding: pick the output format. U+XXXX is the Unicode standard. \uXXXX matches JavaScript string escapes. &#x; / &#; are the HTML entity forms. Decimal is just numbers.
4
Encoding: pick a separator (space, comma, none, or newline) for between code points.
5
Decoding accepts any of the formats mixed together — U+0048 \u0065 l 108 111 all decode to 'Hello'.
6
Tap Copy to grab the result.

🔤 HTML Encoder/Decoder 🔗 URL Encoder/Decoder 😀 Emoji Picker 🔐 Base64 Encoder/Decoder

Frequently asked questions

What's a Unicode code point?

A unique number assigned to every character in the Unicode standard (170,000+ characters as of 2024). 'A' is U+0041 (decimal 65), '世' is U+4E16, '👋' is U+1F44B. The encoding (UTF-8, UTF-16, UTF-32) determines how the code point is stored as bytes; the code point itself is just the abstract character ID.

Why are there 5 output formats?

Different ecosystems use different conventions. U+XXXX is what the Unicode Consortium uses in documentation. \uXXXX is JavaScript's string-escape syntax. &#x; and &#; are the HTML entity formats (hex and decimal). Plain decimal is what databases and CSVs typically store. Pick whichever matches the place you're pasting.

What about emojis?

Handled correctly. Emojis like 👋 (U+1F44B) live above U+FFFF, so JavaScript stores them as a surrogate pair (two 16-bit code units) — but a naive .charCodeAt(0) only gives you the first half. This tool uses .codePointAt(0) which returns the full code point. Emoji decode the same way: \u{1F44B} or U+1F44B both produce 👋.

Can I round-trip — encode then decode?

Yes — encoding produces a list of code points; decoding accepts that list and reconstructs the original text. The two modes are inverses for any valid Unicode input.

What's the difference between Unicode and UTF-8?

Unicode is the character set (which characters exist, what each one's code point is). UTF-8 / UTF-16 / UTF-32 are encodings — different ways to store those code points as actual bytes. UTF-8 uses 1-4 bytes per character; UTF-32 uses always 4. This tool works at the code point level, which is encoding-agnostic.

Why don't all my decimal numbers decode?

Bare decimal numbers are ambiguous — '108' could be a code point or just the number 108. The decoder requires a 2+ digit number to be considered a code point candidate, and it must be a valid Unicode value (0 to 0x10FFFF). Stray 0s and 1s in surrounding text get ignored.

What's the biggest valid Unicode code point?

U+10FFFF (decimal 1,114,111). Above that is invalid — the encoder/decoder will refuse. The space is divided into 17 'planes' of 65,536 code points each; plane 0 (BMP, Basic Multilingual Plane) covers most modern scripts; planes 1-16 cover emoji, historic scripts, and uncommon CJK.

Is my text saved or sent anywhere?

No. Everything happens in your browser. Nothing is uploaded, logged, or stored.

Ratings & Reviews

Rate this tool

Loading reviews…

What Unicode Code Points Are

Every character in every script — Latin, Cyrillic, CJK, emoji, mathematical symbols, ancient hieroglyphs — has a unique number assigned by the Unicode Consortium. A is U+0041 (decimal 65). 世 is U+4E16. 👋 is U+1F44B. The number is called a code point; the encoding (UTF-8, UTF-16, UTF-32) determines how the code point is stored as bytes. Code points themselves are encoding-agnostic — they're the abstract identity of each character.

The Text to Unicode Converter goes both directions: encode any text into its code points (in five different formats), and decode a list of code points back into text. The conversion handles emoji and rare CJK correctly, which sounds trivial but isn't — naive JavaScript breaks on anything outside the Basic Multilingual Plane.

How the Microapp Text to Unicode Converter Works

Pick mode (encode or decode). Encode mode: type or paste text, pick the output format (U+XXXX, \\uXXXX, &#x; HTML hex, &#; HTML decimal, or plain decimal numbers), pick a separator (space, comma, none, or newline). The output appears below as you type.

Decode mode: paste any list of code points in any of the five formats — they can even be mixed in the same input. The decoder finds them with a regex and reconstructs the original text. Both modes use String.prototype.codePointAt and String.fromCodePoint, the modern JavaScript APIs that handle the entire Unicode space correctly.

Worked example. Encode Hi 👋:
U+0048 U+0069 U+0020 U+1F44B
Notice the wave emoji becomes a single code point (U+1F44B), not two surrogate halves (U+D83D U+DC4B). Decode the same string back and you get Hi 👋. Round-trip safe.

Five Output Formats — Why So Many?

Different ecosystems use different conventions for representing the same code point. Pick the one that matches the place you're pasting:

Format	Example	Where you'll see it
U+XXXX	`U+0048`	Unicode Consortium docs, character pickers, OS info dialogs
\\uXXXX	`\\u0048`	JavaScript and JSON string literals
&#x; HTML hex	`H`	HTML entities (modern style)
&#; HTML decimal	`H`	HTML entities (legacy/decimal style)
Plain decimal	`72`	Database fields, CSVs, raw integer arrays

The Surrogate-Pair Trap

Code points up to U+FFFF fit in a single 16-bit JavaScript "code unit." Code points above (emoji, ancient scripts, supplementary CJK) require surrogate pairs — two 16-bit values that together encode one code point. Naive code that uses str.charCodeAt(0) only sees the first half of a surrogate pair, which is meaningless on its own.

Example of the trap: "👋".charCodeAt(0) returns 55357 (the high-surrogate half), not the actual code point 128075. Old code that splits text into "characters" using array indexing breaks on emoji for the same reason. Modern str.codePointAt(0) looks at both halves and returns the real code point. Same fix on output: String.fromCharCode(0x1F44B) produces garbage; String.fromCodePoint(0x1F44B) produces 👋.

This tool uses the modern APIs throughout, which is why emoji round-trip cleanly.

What's the Difference Between Unicode and UTF-8?

Unicode is the character set — it answers "which characters exist and what's each one's code point?" UTF-8, UTF-16, and UTF-32 are encodings — they answer "how do I store these code points as actual bytes?"

UTF-8 uses 1-4 bytes per code point depending on the value (ASCII gets 1 byte; emoji get 4). UTF-16 uses 2 or 4 bytes. UTF-32 always uses 4. UTF-8 won the web because it's backwards-compatible with ASCII (an ASCII file is also a valid UTF-8 file) and it doesn't waste bytes on Latin text. This tool works at the code point level — encoding-agnostic — so the output is the same regardless of how your text is stored.

Common Pitfalls

Mixing escape syntaxes. JavaScript's \\uXXXX only handles up to U+FFFF; for larger code points you need \\u{XXXXX} with curly braces (introduced in ES2015). The encoder uses the right form automatically based on the code point's value.

Combining characters. Some characters render as one visual glyph but are encoded as multiple code points — accented letters can be a base letter plus a combining accent (é = U+0065 + U+0301) or a single precomposed character (é = U+00E9). The encoder shows you the actual code points, which can be more than the visible character count. Use Unicode normalization (str.normalize("NFC")) if you need a canonical form.

Skin-tone and ZWJ-joined emoji. Compound emoji like 👨‍👩‍👧 are sequences of multiple emoji joined by zero-width joiners (U+200D). The encoder shows every code point in the sequence — usually 5+ for family emoji. That's correct, just longer than expected.

Related Tools

For browsing emoji visually, use the Emoji Picker or search by name with the Emoji Search. To encode characters as HTML entities for embedding in pages, the HTML Encoder/Decoder is the right tool. For URL-encoding (different from Unicode encoding — covers reserved URL characters), see the URL Encoder/Decoder. To encode arbitrary binary as ASCII-safe text, use the Base64 Encoder/Decoder.

Text to Unicode Converter

How to use

Related tools

Frequently asked questions

Ratings & Reviews

Rate this tool

What Unicode Code Points Are

How the Microapp Text to Unicode Converter Works

Five Output Formats — Why So Many?

The Surrogate-Pair Trap

What's the Difference Between Unicode and UTF-8?

Common Pitfalls

Related Tools