-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix symbolize_keys: true
to properly create UTF-8 symbols
#246
Conversation
@tagomoris apologies for the ping, but do you think you'll have time to review this PR soon? It's causing issue for people using |
@casperisfine Probably I'll be able to find time to review changes early next week |
ext/msgpack/buffer.h
Outdated
@@ -494,9 +494,9 @@ static inline VALUE msgpack_buffer_read_top_as_string(msgpack_buffer_t* b, size_ | |||
#endif // HAVE_RB_ENC_INTERNED_STR | |||
} | |||
|
|||
static inline VALUE msgpack_buffer_read_top_as_symbol(msgpack_buffer_t* b, size_t length) | |||
static inline VALUE msgpack_buffer_read_top_as_symbol(msgpack_buffer_t* b, size_t length, int utf8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
msgpack_buffer_read_top_as_string
defines the last argument as bool
.
Is there any reason to use int
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, just an oversight, I'll fix it right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
69230d6
to
0179a1d
Compare
spec/unpacker_spec.rb
Outdated
it 'MessagePack.unpack symbolize_keys preserve encoding' do | ||
hash = { :ascii => 1, :utf8_é => 2} | ||
loaded_hash = MessagePack.load(MessagePack.pack(hash), :symbolize_keys => true) | ||
|
||
loaded_hash.keys[0].should == hash.keys[0] | ||
loaded_hash.keys[0].encoding.should == hash.keys[0].encoding | ||
|
||
loaded_hash.keys[1].should == hash.keys[1] | ||
loaded_hash.keys[1].encoding.should == hash.keys[1].encoding | ||
|
||
MessagePack.unpack(MessagePack.pack(hash), :symbolize_keys => true).should == hash | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case is not enough clear to me because the encodings of :ascii
and :utf8_é
depend on the behavior of Ruby runtime and be are not described in this test case.
IMO it will be more clear to me if the original encoding of serialized symbols and expected encoding of deserialized symbols are written explicitly.
@casperisfine What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I'll update right away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it look now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks!
When `symbolize_keys: true` was set, we'd already read the string as binary, even if the native type is UTF-8. This was causing non-ASCII keys to be improperly deserialized. `rb_str_intern` will still notice strings that are ASCII only, and return ASCII symbol for them. So there's no backward compatibility concerns.
0179a1d
to
01b57bd
Compare
👍 |
When
symbolize_keys: true
was set, we'd already read the string as binary, even if the native type is UTF-8. This was causing non-ASCII keys to be improperly deserialized.rb_str_intern
will still notice strings that are ASCII only, and return ASCII symbol for them. So there's no backward compatibility concerns.This fixes: ruby-i18n/i18n#606