-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix #6540: utf16(s) for binary data, etc. #6546
Conversation
…ary data, and add is_valid_utf16(s)
lgtm |
i'm not that well versed in UTF16 strings. should the |
elseif data[1] == 0xfffe # byte-swapped | ||
convert(T, Uint16[bswap(data[i]) for i=2:length(data)]) | ||
else | ||
convert(T, copy(data)) # assume native byte order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this copy
necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not strictly necessary, but my thinking was that whether or not this function makes a copy of the data should not depend on the contents of the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a converting copy?
eg.
function copy(T::Type, x)
a = convert(T, x)
if a === x
a = copy(a)
end
a
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could always do convert(T, copy(x))
if you need to force a copy in a particular case. In any case, ===
won't work here since I don't think the UTF16String
will compare equal to an Array{Uint16}
even if they share the same data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that if the user has an array containing a UTF-16 string in the native byte order, with no BOM, and it is important to convert it to a string without making a copy, she can always force that behavior by doing utf16(reinterpret(Uint16, array))
. So, I don't think the copy
here could pose a big problem in practice.
@vtjnash, my feeling was that if the user supplies an |
fix #6540: utf16(s) for binary data, etc.
This patch adds
utf16(s)
for binary data, anis_valid_utf16(s)
function analogous tois_valid_utf8
, and restrictsIOBuffer(s::String)
toByteString
s as discussed in #6540.