message: support `bytes::Bytes` as a payload type #462

daniel-abramov · 2024-12-11T12:08:48Z

Preface

Recently, there has been another PR trying to solve the long-standing #96.

The issue had already been addressed in another PR that has been there for a while. I've noticed that I even approved the PR a while ago. Yet, it has not been merged for reasons I can't remember (AFAIR, I was waiting for another approval from another maintainer(s)/active users, but somehow it went stalled). Now, the discussions on the PR and the issue are so long that it's hard to get a brief overview of the current state and blockers, if any.

Yesterday, I quickly went through #175 and the discussions on the issue to see if I could come up with something that would give it at least some resolution (#175 has conflicts with a master branch) and create a small and easy-to-review PR to see if it's something that we can merge rather quickly.

TL;DR

Essentially, the only thing this one changes is that all binary messages are backed by Bytes instead of a Vec<u8>. The distinction between Owned and Shared is internal and invisible to the end user. (see #462 (comment)). I also have not touched the text messages for now (as there have been several proposals, and I'm not sure there is a consensus as to which one is the best).

So the question is if we are good with having Bytes instead of a Vec<u8> or if we want to expose that Payload type and make Message rely on it (like in #175). If I'm not mistaken, the cost of Bytes is negligible (see #462 (comment)); the main concern for using Bytes as the type for the Message::Binary was the additional allocation/copy of a buffer that is done before applying a mask (since Bytes is immutable), and #175 did so by creating a copy before applying a mask. I changed it so we don't make this copy if Bytes has unique access.

Does it make any sense, or did I miss something?

@Dushistov, @sdroege, @francois-random: I decided to mention you folks since you participated in the discussion the most in the past year and seem interested in the outcome of this one. I'm going to also mention @alexheretic (you seem to be interested in performance-related improvements 🙂).

--

Fixes #96
Supersedes #175 if we agree that it's a sensible solution.

src/protocol/frame/frame.rs

Zarathustra2 · 2024-12-11T14:30:11Z

I am not sure if this is just on my end but when I run cargo bench on master I get:

     Running benches/write.rs (target/release/deps/write-9e4d5d42fd6d53b9)
Gnuplot not found, using plotters backend
write 100k small texts then flush
                        time:   [5.3589 ms 5.3623 ms 5.3665 ms]
                        change: [-0.1032% +0.0279% +0.1456%] (p = 0.67 > 0.05)
                        No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

and when running for this PR/branch:

     Running benches/write.rs (target/release/deps/write-9e4d5d42fd6d53b9)
Gnuplot not found, using plotters backend
write 100k small texts then flush
                        time:   [7.2634 ms 7.2756 ms 7.2928 ms]
                        change: [+35.427% +35.679% +36.006%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

daniel-abramov · 2024-12-11T14:55:48Z

Thanks, benchmarking is a good thing! I'll investigate what's going on.

UPD: Ok, when I run write benchmarks, I get +/- 5% difference pretty often, even without changes, but 35% of a difference is quite stark. Apparently, using Bytes in this context is not as "negligible" of a cost as I thought. I'm not sure what this could be attributed to; my only assumption is that it was due to the vtable that Bytes contains.

Anyway, if the performance cost is so high, then perhaps Bytes is not suitable as the underlying type for Message. So, I updated the PR to use Payload across all of the places (similarly to #175). Now, the benchmarks are identical (no changes, within the noise threshold, but seldom 2-3% difference).

alexheretic · 2024-12-13T12:18:09Z

I think this is a step in the right direction.

Nice to see the write bench was useful :) I think we would benefit from some more scenarios benched in this way, binary writes & reads etc. If I can find some time I'll raise a PR proposing some more.

agalakhov · 2024-12-13T15:13:41Z

@alexheretic IIRC, I looked at the bytes crate while writing the very first version of Tungstenite, and the only reason I did not take it was the API instability at that time. As bytes is a solid part of the ecosystem now, it's a good idea to use it.

If implemented correctly, converting to Bytes should not affect benchmarks at all, being of true zero cost. It is only important to make correct choices between Bytes and Buf/BufMut at all places. I believe that it is even possible to improve performance compared to the old approach, as scatter-gather I/O is now stabilized in std. Using read_vectored()/write_vectored() together with a collection of BufMut/Bytes could make Tungstenite to true zero-copy.

alexheretic · 2024-12-13T17:40:15Z

I noticed you can use this branch for writing text too with Frame::message(str_bytes, OpCode::Data(Data::Text), true)

I struggled to produce a benchmark win using that though, even though it should really eliminate allocation. Maybe allocation just isn't significant enough to the other write work currently?

I raised a PR with the extra benches: #463

To make it pass `tungstenite`'s CI for min Rust version

daniel-abramov · 2024-12-13T20:09:01Z

I struggled to produce a benchmark win using that though, even though it should really eliminate allocation. Maybe allocation just isn't significant enough to the other write work currently?

I don't expect any improvements in benchmarks from this PR, at least for the benches in question. The reason is that this change does not really switch to Bytes completely (I've just updated the PR title because it referred to the first attempt to just switch to Bytes). The TL;DR is that it switches from Vec<u8> as a payload and message type to a Payload type that is defined as:

pub enum Payload {
    /// Owned data with unique ownership.
    Owned(Vec<u8>),
    /// Shared data with shared ownership.
    Shared(Bytes),
}

In other words, folks that used tungstenite with owned data (String or Vec<u8>) should not notice any difference at all because their messages are internally converted to Payload::Owned(Vec<u8>). But those who wanted to use Bytes, are now capable of doing it as well. This means that the only performance improvement I expect from this kind of change is that the users who write servers (specifically those who broadcast messages to many recipients or send the same message several times for whatever reasons).

That's why there was no difference in performance when I compared it to a master branch (the benchmarks in master use owned data).

However, there is a difference when testing it against your new benchmarks. This PR version shows, on average, a 4-5% decrease in performance in reading and writing. I'm still unsure what this could be attributed to, given that the underlying type for all benchmarks remains the same even for this PR (no Bytes involved, just Vec<u8>). My only [non-scientific] guess is that the machine representation of a Payload is a couple of bytes larger. This is probably invisible for regular use, but for a benchmark that sends 100k messages, this tiny cost might accumulate and result in something visible as a minor performance decrease (after all, all messages in the benchmark are pretty small).

I then modified your writing benchmark for the use cases explicitly mentioned in #96. Instead of creating small messages in a loop, I made a moderately sized message (1KB) and sent it in that loop in the writing benchmark. I ran it on this branch and then compared it to the version that uses Vec<u8> in master, and the performance increase is quite massive:

(master)
write 100k small messages then flush (server)
                        time:   [8.3416 ms 8.3952 ms 8.4669 ms]
                        change: [+123.71% +126.18% +128.52%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low severe
  7 (7.00%) low mild
  1 (1.00%) high severe

So the master version is 126% slower than the version that uses Bytes (due to the massive amount of allocations caused by constant cloning).

The net result of the changes is that performance for general use cases drops by about 5% (and I'm not sure why, apart from the guess mentioned above). However, for server users, the performance wins may be above 125% if they specifically choose to use Bytes for cases when it's appropriate (broadcasting the same message, etc) since they avoid all of the costs associated with expensive cloning.

Tbh, if that small performance drop for the general use case could somehow be eliminated, then switching to the new Payload type would be a no-brainer.

agalakhov · 2024-12-13T21:24:35Z

In fact, Vec is not needed here at all, as it could be replaced as well. Bytes is just Arc-based Vec per definition, which makes it probably more efficient for passing around. It is, however, less efficient in single-producer, single-consumer model unless the compiler manages to optimize away the Arc since atomic reference counting isn't cheap.

alexheretic · 2024-12-13T23:11:51Z

fwiw I didn't see much difference between the branches in the new benches. Though I did convert them to this branch slightly differently and I think I stumbled upon slice::to_vec being faster which skewed the comparison in this branch's favour.

Now I run using to_vec in both branches I see the following, which I interpret as

Read speed is about the same
Write speed is slightly impacted

$ cargo bench --bench \* -- --quick --noplot --baseline a
read+unmask 100k small messages (server)
                        time:   [13.010 ms 13.028 ms 13.101 ms]
                        change: [+2.1102% +2.8253% +3.5454%] (p = 0.10 > 0.05)

read 100k small messages (client)
                        time:   [11.052 ms 11.053 ms 11.053 ms]
                        change: [+2.0613% +2.2854% +2.5105%] (p = 0.08 > 0.05)

write 100k small messages then flush (server)
                        time:   [4.6399 ms 4.6452 ms 4.6661 ms]
                        change: [+11.700% +12.027% +12.354%] (p = 0.09 > 0.05)

write+mask 100k small messages then flush (client)
                        time:   [5.6127 ms 5.6169 ms 5.6338 ms]
                        change: [+3.1398% +3.6457% +4.1546%] (p = 0.10 > 0.05)

daniel-abramov · 2024-12-13T23:51:49Z

Thanks! Yep, this roughly aligns with the results that I'm getting.

I then modified your writing benchmark for the use cases explicitly mentioned in #96. Instead of creating small messages in a loop, I made a moderately sized message (1KB) and sent it in that loop in the writing benchmark. I ran it on this branch and then compared it to the version that uses Vec in master, and the performance increase is quite massive:

I'm thinking about adding this benchmark. This means that the version in master would then create a single message and send it multiple times (cloning it each before sending), while the version from the branch would wrap it in Bytes and pass it to tungstenite (eliminating the need to do expensive cloning).

alexheretic · 2024-12-14T00:07:04Z

tbh I don't see the minor benches perf regression as blocking this PR.

The scenario where a server broadcasts a message to N clients is clearly a massive advantage to supporting Bytes. The regression reported by the benches is not large.

alexheretic · 2024-12-14T06:18:36Z

I got carried away a bit trying to adding text support 😅 #465

daniel-abramov · 2024-12-14T17:07:13Z

Awesome! Since yours is rebased on top of this branch, I'll merge it instead of integrating these two separately! So far, it seems like no one objected, so I think we're good to go!

Dushistov reviewed Dec 11, 2024

View reviewed changes

src/protocol/frame/frame.rs Outdated Show resolved Hide resolved

daniel-abramov force-pushed the use-bytes-for-messages branch from 34f14bd to 663d3da Compare December 11, 2024 17:05

daniel-abramov added 6 commits December 13, 2024 18:57

message: switch to bytes::Bytes

71bf8b2

message: use Payload in public interface

8fba5f4

payload: use match instead of let Ok(..) else

f21fac6

To make it pass `tungstenite`'s CI for min Rust version

payload: suppress the clippy warning for now

596fe8a

benches: adjust for the new benchmarks

4eb13ac

payload: inline functions

5a5586a

daniel-abramov changed the title ~~message: switch to bytes::Bytes~~ message: support bytes::Bytes as a payload type Dec 13, 2024

daniel-abramov force-pushed the use-bytes-for-messages branch from 588c27a to 5a5586a Compare December 13, 2024 20:14

alexheretic mentioned this pull request Dec 13, 2024

Bench: optimise binary data generation #464

Merged

alexheretic mentioned this pull request Dec 14, 2024

Support bytes::Bytes & bytes::BytesMut payloads for binary & text messaging #465

Merged

daniel-abramov closed this Dec 14, 2024

This was referenced Dec 14, 2024

Allow sending of shared data #96

Closed

allow sending messages with static or shared data #175

Closed

alexheretic mentioned this pull request Dec 17, 2024

breaking: update tokio-tungstenite to 0.26 and update examples tokio-rs/axum#3078

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

message: support `bytes::Bytes` as a payload type #462

message: support `bytes::Bytes` as a payload type #462

daniel-abramov commented Dec 11, 2024 •

edited

Loading

Zarathustra2 commented Dec 11, 2024

daniel-abramov commented Dec 11, 2024 •

edited

Loading

alexheretic commented Dec 13, 2024

agalakhov commented Dec 13, 2024

alexheretic commented Dec 13, 2024

daniel-abramov commented Dec 13, 2024 •

edited

Loading

agalakhov commented Dec 13, 2024 •

edited

Loading

alexheretic commented Dec 13, 2024 •

edited

Loading

daniel-abramov commented Dec 13, 2024

alexheretic commented Dec 14, 2024

alexheretic commented Dec 14, 2024

daniel-abramov commented Dec 14, 2024

message: support bytes::Bytes as a payload type #462

message: support bytes::Bytes as a payload type #462

Conversation

daniel-abramov commented Dec 11, 2024 • edited Loading

Preface

TL;DR

Zarathustra2 commented Dec 11, 2024

daniel-abramov commented Dec 11, 2024 • edited Loading

alexheretic commented Dec 13, 2024

agalakhov commented Dec 13, 2024

alexheretic commented Dec 13, 2024

daniel-abramov commented Dec 13, 2024 • edited Loading

agalakhov commented Dec 13, 2024 • edited Loading

alexheretic commented Dec 13, 2024 • edited Loading

daniel-abramov commented Dec 13, 2024

alexheretic commented Dec 14, 2024

alexheretic commented Dec 14, 2024

daniel-abramov commented Dec 14, 2024

message: support `bytes::Bytes` as a payload type #462

message: support `bytes::Bytes` as a payload type #462

daniel-abramov commented Dec 11, 2024 •

edited

Loading

daniel-abramov commented Dec 11, 2024 •

edited

Loading

daniel-abramov commented Dec 13, 2024 •

edited

Loading

agalakhov commented Dec 13, 2024 •

edited

Loading

alexheretic commented Dec 13, 2024 •

edited

Loading