-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement std::convert traits for char #35755
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @aturon (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
/// | ||
/// Surrogates are used in the UTF-16 encoding, and therefore are not characters. | ||
SurrogateCodePoint, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're never going to need to add any extra cases to this enum, right? Should we stick a __ForExtensibility
variant just in case?
I personally feel okay about the cc @rust-lang/libs |
Seems reasonable to me, but I'd prefer to use an opaque struct with optional method accessors rather than an enum for the error type in |
249b789
to
82678c5
Compare
@sfackler Re extensibility, I don’t expect this to ever be needed. The range of Unicode Scalar Values changed exactly once in the history of Unicode. At first it was “16 bits ought to be enough for everybody” 0x0000...0xFFFF. When that turned out not to be enough and a lot of systems were already using In Unicode 9.0, 76% of the million and some code points are unassigned, so they’re not expected to run out. And breaking compatibility with UTF-16 is such a breaking change that I imagine it’s not even considered. And this concern disappears with… @alexcrichton Yeah, I also considered an opaque struct. Even without accessor method since I can’t think of a use case for it. (Code like a WTF-8 implementation that wants to deal with surrogate code points will likely do its own code point arithmetic anyway.) And a method cal always be added later. I’ve updated the PR. |
Discussed during @rust-lang/libs triage today, conclusion was to merge. Thanks for the update @SimonSapin! @bors: r+ |
📌 Commit 82678c5 has been approved by |
…hton Implement std::convert traits for char This is motivated by avoiding the `as` operator, which sometimes silently truncates, and instead use conversions that are explicitly lossless and infallible. I’m less certain that `From<u8> for char` should be implemented: while it matches an existing behavior of `as`, it’s not necessarily the right thing to use for non-ASCII bytes. It effectively decodes bytes as ISO/IEC 8859-1 (since Unicode designed its first 256 code points to be compatible with that encoding), but that is not apparent in the API name.
☔ The latest upstream changes (presumably #35656) made this pull request unmergeable. Please resolve the merge conflicts. |
@@ -176,6 +172,41 @@ pub unsafe fn from_u32_unchecked(i: u32) -> char { | |||
transmute(i) | |||
} | |||
|
|||
#[stable(feature = "char_convert", since = "1.12.0")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these not be "1.13.0"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at the time I first opened this PR, but now yes. Fixed.
🔒 Merge conflict |
These fit with other From implementations between integer types. This helps the coding style of avoiding the 'as' operator that sometimes silently truncates, and signals that these specific conversions are lossless and infaillible.
For symmetry with From<char> for u32.
82678c5
to
f040208
Compare
Implement std::convert traits for char This is motivated by avoiding the `as` operator, which sometimes silently truncates, and instead use conversions that are explicitly lossless and infallible. I’m less certain that `From<u8> for char` should be implemented: while it matches an existing behavior of `as`, it’s not necessarily the right thing to use for non-ASCII bytes. It effectively decodes bytes as ISO/IEC 8859-1 (since Unicode designed its first 256 code points to be compatible with that encoding), but that is not apparent in the API name.
Mini-reminder: let's tag more user-visible stuff with relnotes |
This is motivated by avoiding the
as
operator, which sometimes silently truncates, and instead use conversions that are explicitly lossless and infallible.I’m less certain that
From<u8> for char
should be implemented: while it matches an existing behavior ofas
, it’s not necessarily the right thing to use for non-ASCII bytes. It effectively decodes bytes as ISO/IEC 8859-1 (since Unicode designed its first 256 code points to be compatible with that encoding), but that is not apparent in the API name.