Improve UTF-8 decoding and encoding functions #410
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Ensure proper UTF-8 encoding (1 to 4 bytes).
Handle invalid encodings (return 0xFFFD and consume a single byte) Individually encoded surrogate code points are accepted.
utf8_scan()
to analyze a byte array for UTF-8 contents detects invalid encoding, computes number of codepoints and content kind: plain ASCII, 8-bit, 16-bit or larger codepoints.utf8_encode_len(c)
to compute the number of bytes to encodec
unicode_to_utf8
asutf8_encode
unicode_from_utf8
asutf8_decode
utf8_decode_buf8(dest, size, src, len)
to decode a UTF-8 encoded byte array known to contain only ASCII and 8-bit codepoints.utf8_decode_buf16(dest, size, src, len)
to decode a UTF-8 encoded byte array into an array of 16-bit codepoints using UTF-16 surrogate pairs for non-BMP1 codepoints.utf8_encode_buf8(dest, size, src, len)
to encode an array of 8-bit codepoints as a UTF-8 encoded null terminated stringutf16_encode_buf8(dest, size, src, len)
to decode an array of 16-bit codepoints (including surrogate pairs) as a UTF-8 encoded null terminated stringJS_AtomGetStrRT
,JS_NewStringLen
using the above functionsThis commit is preliminary for another PR fixing some
JSAtom
creation inconsistencies and inefficiencies.