Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow sending messages with static or shared data #175

Closed
wants to merge 4 commits into from

Conversation

icewind1991
Copy link

This allows the server to skip allocating buffers holding the message data to be send if the data is static.

  • This doesn't really save anything for clients atm due to the need of masking, although changing the masking logic to run on the output buffer instead of the message buffer could solve that.
  • The change in the Message fields data make this a breaking change for any code directly creating or deconstructing the Message enum.
  • Keeping the text and binary methods intact requires adding separate static_* methods, an alternative might be to change text/binary to take an Into<Cow<_>> directly, this would break any code calling those methods with anything that doesn't implement Into<Cow<_>> but should be fine for the most common parameters of String/&str/Vec<u8>/&[u8]

@daniel-abramov
Copy link
Member

an alternative might be to change text/binary to take an Into<Cow<_>> directly

Another approach would be to implement #96 similar to the way it was tried to be implemented here: #104 , but to use Bytes as a least common denominator for the type of the data that the message holds. Of course it's not the same as holding 'static data, yet more efficient than storing Vec<u8> or String.

@icewind1991
Copy link
Author

I've changed things around a bit, switching to custom enums for the binary and string data to allow dealing with shared data trough Bytes and changed the Message::text/Mesage::binary methods to handle both the unique(Vec<u8>/String), shared(Bytes) and static(&'static [u8]/&'static str) cases transparently.
I've also made the datatype that stores the message data opaque so things can be changed in the future without any api breaks.

I've currently not added support from shared string data, as there is no way to go from Arc<String> to Bytes without copying, it is still possible to add this later by making the Frame deal with the shared text case seperate.

@icewind1991 icewind1991 changed the title allow sending messages with static data allow sending messages with static or shared data Feb 13, 2021
@strohel
Copy link
Contributor

strohel commented Feb 15, 2021

I'd eventually like to review this, but I'll probably need a couple more days before I manage so. @jxs, do you want too?

Copy link
Contributor

@strohel strohel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks generally good. I'm trying to understand whether this would make the client side any more inefficient.

The general question we should answer is: are the (efficiency) gains worth increasing the complexity slightly? It would be interesting to see a benchmark quantifying the gains.

src/protocol/data.rs Outdated Show resolved Hide resolved
src/protocol/data.rs Outdated Show resolved Hide resolved
Message::Text(string) => string.into_bytes(),
Message::Binary(data) | Message::Ping(data) | Message::Pong(data) => data,
Message::Text(string) => String::from(string).into(),
Message::Binary(data) => data.into(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it true that this would now copy in case when data is Bytes, while previously this never copied? Not sure whether that's a concern at all.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is an issue in practice, since you're probably only interested in the message data when it's an incoming message, in which case it's always unique (and thus stored as an Vec<u8>) anyway.

@icewind1991
Copy link
Author

I'm trying to understand whether this would make the client side any more inefficient.

I have some mostly finished code that removes the need to make the data a unique copy when applying the mask that I was originally going to put into a follow up pr but I can add it here is desired,

@daniel-abramov
Copy link
Member

I have some mostly finished code that removes the need to make the data a unique copy when applying the mask that I was originally going to put into a follow up pr but I can add it here is desired

That would be interesting to check (probably in a separate commit?).

this removes the need to have unique mutable access to the payload data
@icewind1991
Copy link
Author

pushed

@daniel-abramov daniel-abramov requested a review from jxs August 9, 2021 13:35
@Selyatin
Copy link

When will this be fully merged? I think it's a very nice feature and doesn't make stuff too complex.

@Dushistov
Copy link
Contributor

I need to send data that allocated by custom allocator.

Can it be trait instead of enum MessageData, with method to provide &[u8] and Vec<u8>?

@daniel-abramov
Copy link
Member

When will this be fully merged? I think it's a very nice feature and doesn't make stuff too complex.

It has some unresolved merge conflicts. It seems like replacing the binary bytes with Bytes would be the simplest solution and will take less code, so unless it has some unfortunate consequences for the performance, I think it would be a way to go.

Can it be trait instead of enum MessageData, with method to provide &[u8] and Vec?

Do you mean Message becoming Message<T: AsRef>? In theory, it's of course possible (not sure if it would be particularly ergonomic though). If we used Bytes instead of a Vec<u8>, would it solve your problem?

@Dushistov
Copy link
Contributor

If we used Bytes instead of a Vec, would it solve your problem?

I don't see how Bytes can help without unsafe.

I have data allocated with libc::malloc and want to send them via web socket.

Yes, I can transmutate them into &'static [u8] and then call Bytes::from_static.
But this is if I really desperate for performance. So Bytes and Vec is not what I need.
&[u8] would be nice interface (but tungstenite-rs for some reason may want to modify data?),
so trait looks the most promising.

Do you mean Message becoming Message<T: AsRef>

Yes, something like this, it can be type Message = GenericMessage<Vec<u8>>,
so the end user don't see the difference.

@daniel-abramov
Copy link
Member

I don't see how Bytes can help without unsafe.
I have data allocated with libc::malloc and want to send them via web socket.

What I mean is that imagine that you're allocating memory with libc::malloc, but store it internally as BytesMut or something like this (might indeed require unsafe), then it would be easy to take Bytes and pass it to tungstenite without copying the data one more time.

&[u8] would be nice interface (but tungstenite-rs for some reason may want to modify data?),
so trait looks the most promising.

&[u8] is not a problem, but the question is how are we going to store it (taking into account that the library could be used with Tokio runtime), i.e. when users read messages, the returned Message must own the data. When users write messages, the Message must own something that allows getting a reference to the underlying data in a thread-safe way (so that the Message could be passed between threads).

The only universal solution that I have in mind is to make Message generic over T where T would implement AsRef or something like this. This means that the connect and similar user-friendly methods would use Vec<u8> for T, whereas when the users create sockets by themselves, they can specify any type they want (in your case it would mean wrapping the data into some sort of an Arc though if you're using it with Tokio for instance).

@daniel-abramov
Copy link
Member

@Dushistov, so for your particular question the workaround would be to create Vec<u8> from the data that you allocated with libc and then pass it to the library.

The only problem with this approach is that you'll have to copy the data which is not particularly efficient, but if we make the type of Message more generic (as I described in the comment above), that alone won't solve your problem, because whenever we receive a frame from the user, we write it to the send queue (that eventually gets flushed), e.g. imaging calling write_message() 4 times in a row with 4 different messages. If they all end up being in a queue, tungstenite will have to own the data inside to be able to send them.

In the case of this PR, it's mitigated by making the Frame hold Bytes (e.g. a strong shared reference to the original data). In other words: you'll have to place your data inside Bytes anyway (or inside another Arc based structure, but that would effectively be a reinvention of a Bytes 🙂).

@Dushistov
Copy link
Contributor

Dushistov commented Jul 20, 2022

so for your particular question the workaround would be to create Vec

I am already doing so. But this looks like waste, C++ allocate buffer, copy data to it
transfer ownership to Rust code, Rust code then again allocate memory via Vec, copy to it,
and then give it to tungstenite-rs.

If they all end up being in a queue, tungstenite will have to own the data inside to be able to send them.

if tungstenite only want to own data then this is not problem, wraper around allocated data has proper Drop implementation that call libc::free, so it is actually like Vec<u8> with custom allocator and without capacity field.

The only universal solution that I have in mind is to make Message generic

generic Message simplify thing, it would be possible to have two Message, one for sending, and one for receiving.

@daniel-abramov
Copy link
Member

if tungstenite only want to own data then this is not problem, wraper around allocated data has proper Drop implementation that call libc::free, so it is actually like Vec with custom allocator and without capacity field.

Ok, I think I finally got your point :) So basically you don't really need sending the shared data (like this PR mentions), you're more concerned about the fact that currently you can't pass a custom T: AsRef<[u8]> structure to the Message, right?

generic Message simplify thing, it would be possible to have two Message, one for sending, and one for receiving.

Hm, not quite. I mean, of course we could define different data types for incoming and outgoing message (strictly speaking there is nothing wrong with it), but we would need to update quite a few things to make it ergonomic with tokio-tungstenite then (e.g. won't be able to use StreamExt::split() and similar things), but there are some solutions for this available though.

@sukhmel
Copy link

sukhmel commented May 1, 2023

Just wanted to say that this PR seems capable of helping to save a lot on allocations in cases where the same messages are being broadcast to lots of web socket connections like it is in our use case. Although, admittedly, it would take some changes in other crates before this can be utilized by end users.

@Dushistov
Copy link
Contributor

There is idea for Bytes to allow user to deal with it's vtable: tokio-rs/bytes#437 .

It would make possible to support my use case, when I want pass buffer created by C/C++ code with malloc to tungstenite-rs. I convert it to Bytes with zero allocation and zero memcpy, and then it would be nice to pass it directly to tungstenite-rs with zero overhead.

@Dushistov
Copy link
Contributor

There is idea for Bytes to allow user to deal with it's vtable: tokio-rs/bytes#437 .

Looks like it is possible now via Bytes::from_owner

@daniel-abramov
Copy link
Member

The ideas from this PR have been incorporated and implemented in #462 / #465!

@icewind1991 icewind1991 deleted the static-data branch December 14, 2024 17:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants