-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow sending of shared data #96
Comments
Hmm, given your other issue, I understand your concern, but I think your usecase is rather rare. Thus putting everybodies message data always in an Arc will incur an overhead all the time for everybody that doesn't need this. I would suggest that a connection loss is an exceptional and erroneous situation, in which case it's generally acceptable to have a performance cost. What I'm suggesting is that you regenerate the messages that haven't been acknowledged in this situation. This can happen concurrently to establishing a new connection. That being said, maybe using a data type like |
In my experience, a TCP crash is only a matter of time, especially for long running connections (16bit checksum, etc.). Thus if you want reliable messaging you need to keep a copy of each message locally until it is ack'd. Currently, you need to clone a
I know that this is the approach that |
Sure, if you can't regenerate them just in case of failure, that's not nice for memory consumption. What I hope by suggesting using Bytes is that we might get ecosystem wide possibility of copyless end-to-end. From where the data enters the process to where it leaves. There is an interesting new proposal for the AsyncRead/AsyncWrite traits in a PR on tokio that could really make that happen. |
That's cool, I'll take a look. |
@najamelan What are your thoughts on having a asymettric pub enum SendMsg {
Text(String),
ShText(Arc<String>),
Binary(Vec<u8>),
ShBinary(Bytes),
Ping(Vec<u8>),
Pong(Vec<u8>),
Close(Option<CloseFrame<'static>>),
}
pub enum RecvMsg {
Text(String),
Binary(Vec<u8>),
Ping(Vec<u8>),
Pong(Vec<u8>),
Close(Option<CloseFrame<'static>>),
} This would enable zero-copy scenarios while not forcing the user to use |
An alternative would be to provide a separate enum SharedMessage {
Binary(Bytes),
} |
Can't we instead change
This would allow to cache messages if needed without cloning data and will allow users to remove unnecessary allocations, e.g. if messages are fixed or with if reusable buffer is used. The same type can be used for receiving messages as well, with the following signature changes: // this function will use an inner buffer stored inside `WebSocket` instance
pub fn read_message<'a>(&'a mut self) -> Result<Message<'a>> { .. }
// we also could add function like this
pub fn read_message_to<'a>(&mut self, buf: &'a Vec<u8>) -> Result<Message<'a>> { .. } This would allow additional optimizations, although it may complicate code a bit (especially for async). |
Not for receiving. The incoming message may be split across multiple packets. It may work with |
But it would work with |
In the current implementation it would be not more efficient than just copying the vector afterwards. We keep the half-received message in a |
I also proposed some idea in a comment to #104, perhaps it would also be a solution. |
I'll give #104 another try in a couple of weeks. I'm thinking about writing a decent benchmark to view the overhead of using
Note that both cases would lead to a breaking change. |
Is there any updates? I think replace Vec<u8> by Bytes is a better choice, even if there is a breaking change. |
@x1957 , awaiting a feedback from @agalakhov on #104 (review) |
Hello, i am new to this and very interested. Maybe it would be possible to add a |
There is #104 which is still open. Perhaps the best and simplest solution would be the one that has already been suggested by contributors in this issue, e.g. usage of |
Hello. Broadcasting data to multiple connections like (10k connections) would be very inefficient since we need to copy message 10k times. I vote for solution with |
I also like using I looked at the source code, and |
Hi! i'd like to try working on this issue if possible, but i'd need a few clarifications to implement this properly:
thanks |
Hello, would it be possible to decide on this? as it's opened for years despite it's not complicated and several users propose to do it.
Common thing to do: out_buffer of FrameCodec should be a vector of Solution 2 seems better, as first one can extra allocate and it's a breaking change in the public enum. |
by the way, i did have an incomplete branch adding the breaking change, i'll see if i can resurrect it, completely forgot about this |
@francois-random, I see your point. I also agree that it's a bit embarrassing that we did not clearly communicate a decision, sorry for that. Regarding your suggestions, let's put it like this: as long as there are no objections or strong opinions from those who made significant contributions rel. to the performance and/or even created crates that depend on tungstenite/tokio-tungstenite (@alexheretic and @sdroege come to mind), I would be okay with any of the proposed strategies (I'm also okay with a small breaking change) as long as the implementation and the API surface remain clean/easy to understand and the performance penalty is negligible. |
I'd have to see the exact implementation to say for sure, but it generally sounds good to me and for me the For the string case, |
We could try to resolve this using traits and making pub enum Message<S: AsRef<str>> {
Text<S>,
...
}
type RecvMsg = Message<String>; The original idea behind the interface is to make memory allocations explicit: "If there has to be |
|
Worth a try but this often makes the API harder to use. You'd need two types (one for |
Another potential problem with generics that comes to mind is that essentially it goes in the direction of having separate types of |
Working with shared data effectively means using different types for reading and writing. Using |
We know how to make it. An enum with both OwnedData (String for text and Vec for binary, same as now) and with SharedData, like solution 2- I mentionned earlier, OwnedData will be used in all case by the lib to fill the message when it's received, and it's up to the user to decide if he wants to send an OwnedData or a SharedData. |
This is similar to pub enum Text {
Owned(String),
Shared(Arc<str>),
}
pub enum Message {
Text(Text),
...
} This is more idiomatic and could be better when (and if) we eventually port Tungstenite to |
No strong preference for me but perhaps your solution with enum of enum is more understandable from the library user's point of view, as it keeps one enum type per kind of websocket message. And inside the type, you choose Owned or Shared. |
For construct - Text(String),
+ Text(Arc<str>), is free for users who enough -Message::Text(s)
+Message::Text(Arc::from(s)) // free For users who need - Text(String),
+ #[cfg(feature = "rt-multi-thread")]
+ Text(Arc<str>),
+ #[cfg(not(feature = "rt-multi-thread"))]
+ Text(Rc<str>), pub enum Text {
Owned(String),
Shared(Arc<str>),
} it's worse than just casting |
@qRoC there is extra allocation on client's side if you force all text messages to be 'Arc' because you will need to apply a mask and it requires the message to be mutable (it will be the same if you also force all binary message to be Bytes). |
By the way, Bytes type has now https://docs.rs/bytes/latest/bytes/struct.Bytes.html#method.from_owner ,
would be good fit send any data in form of |
I think that it would make sense that
Message
be instead:This is because the websocket transport cannot guarantee messages to not be lost (from what I understand). Thus someone who wants to prevent messages from being lost would keep a list of messages and only consider them "sent" when they are ack'd by the peer. Using a
Arc
would greatly reduce the cost of doing so since clone operations would be more efficient.The text was updated successfully, but these errors were encountered: