-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid UTF-8 literal strings with leading surrogate character at end of string #10973
Comments
related: #10274 |
I've been running into further inconsistencies and bugs with character and string literals, and would appreciate some guidance: I would consider both of these to be bugs... you shouldn't have single surrogate characters in UTF-8, |
We completely ignored surrogate pair issues when writing this code. Should ideally be fixed if it can be done without wrecking performance, but I'm not entirely sure what major trouble this causes. |
Well, in the conversion code I've been writing, I'm getting it 3-60x faster so far (and that's only trying short strings and 64K strings, not anything really large), and they all check for invalid surrogate pairs... So I do think it can be done without sacrificing performance! |
Closed by #11203 |
According to the comments in issue #10 by @StefanKarpinski,
"\ud800" should give an error, since that is an invalid UTF-8 string:
The text was updated successfully, but these errors were encountered: