Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

breaking: update tokio-tungstenite to 0.26 and update examples #3078

Merged
merged 7 commits into from
Dec 18, 2024

Conversation

adryzz
Copy link
Contributor

@adryzz adryzz commented Dec 15, 2024

Motivation

tokio-tungstenite had a new (breaking) release, 0.25/0.26 that finally allows sending shared data over websockets, and it would be cool if this new breaking change could get in axum 0.8.0 given it's a new major release

Solution

the websocket extract has been changed, and it now exposes Bytes and Utf8Bytes (UTF-8 wrapper around Bytes)

@adryzz adryzz force-pushed the tokio-tungstenite-25 branch 2 times, most recently from 42bdf98 to 23f96a3 Compare December 15, 2024 22:13
@yanns
Copy link
Collaborator

yanns commented Dec 16, 2024

The CI checks are failing. Can you check?

@adryzz
Copy link
Contributor Author

adryzz commented Dec 16, 2024

given that we have our own downstream Message type that is 1:1 with tungstenite's, should we have our own downstream Payload/Utf8Payload types too?

also, will clean up the commits in a bit

@jplatte
Copy link
Member

jplatte commented Dec 16, 2024

Yes, we don't want to have tungstenite as a public dep. However I'm a bit undecided about what to expose exactly - I'll comment later after work.

Also cc @SabrinaJewson

@SabrinaJewson
Copy link
Contributor

So, my intial observation is that Tungstenite’s Payload type is very weirdly-designed. It has three separate variants for BytesMut, Bytes and Vec<u8>. But the thing is:

  • the BytesMut variant seems pointless, since one can convert Bytes ↔ BytesMut with zero cost;
  • the only benefit of having the Vec<u8> variant is to avoid a small heap allocation if one has a Vec<u8> with excess capacity.

To me, neither of these advantage seem worth it to add a whole new Payload type to the public API instead of using Bytes directly (and I do question why tungstenite made this strange design choice). For the UTF-8 case, it seems like this should come from a utility crate, but in the absence of that making our own UTF-8 wrapper of Bytes isn’t too bad…? related: tokio-rs/bytes#697

@jplatte
Copy link
Member

jplatte commented Dec 16, 2024

Those were pretty much my thoughts looking at the tungstenite change too 😅

So let's switch to Bytes then? And a Utf8Bytes type in addition? Pretty sure we even have one in the codebase already, used for path deserialization IIRC. Just not public at the moment.

@alexheretic
Copy link
Contributor

alexheretic commented Dec 17, 2024

The payload variants do have reasons to exist. For example, removing Payload::Vec variant impacts benched write performance by 10-30%. See snapview/tungstenite-rs#465 & snapview/tungstenite-rs#462

axum/src/extract/ws.rs Outdated Show resolved Hide resolved
axum/src/extract/ws.rs Outdated Show resolved Hide resolved
@@ -75,7 +75,7 @@ async fn ws_handler(
res = ws.recv() => {
match res {
Some(Ok(ws::Message::Text(s))) => {
let _ = sender.send(s);
let _ = sender.send(s.to_string());
Copy link
Contributor

@alexheretic alexheretic Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use case could show the shared bytes reuse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could probably be left as a follow up task, if so maybe just add a TODO here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An issue usually works better than a fixme comment for stuff like this. Feel free to open one.

@SabrinaJewson
Copy link
Contributor

For example, removing Payload::Vec variant impacts benched write performance by 10-30%.

Do you know why that’s the case? Is it really just the memcpy due to having excess capacity in the Vec? Is it the cost of virtualization?

Looking at the implementation of Bytes, it should be possible to avoid the memcpy. Currently, Bytes (in its Vec representation) stores the base ptr twice, which is unnecessary; if it instead used that field to store capacity, there would be no need for shrinking the allocation. Maybe we could look into making this change in Bytes upstream, thus allowing the storage of Vec<u8> inside Bytes to be truly zero-cost?

The reason I’m somewhat against a Payload enum is that it strikes me as a code smell: Bytes is already intended to be the type representing “an enum over all the 'static thread-safe bytes containers”. So the addition of Payload means we have two layers of bytes-type-that-can-hold-any-container. If one is to go ahead with the design, I think it should be made very clear in the documentation what the exact limitations of Bytes are that it can’t subsume this use-case.

@alexheretic
Copy link
Contributor

alexheretic commented Dec 17, 2024

I do agree, I think payload started as Bytes. It is just the performance regressions driving the enum. I'm (re)investigating if those perf issues can be resolved: snapview/tungstenite-rs#473

@alexheretic
Copy link
Contributor

It looks like the perf issues can be resolved. If possible please review that upstream PR and suggest changes so we can get the next breaking tungstenite version as right as possible.

@adryzz
Copy link
Contributor Author

adryzz commented Dec 17, 2024

alright, will update this PR as soon as i get home with the newly released version.

what to do about Utf8Bytes? i can't seem to find a similiar type in the codebase already

@jplatte
Copy link
Member

jplatte commented Dec 17, 2024

I just checked, looks like I was wrong. I think we can just newtype tungstenite's type then.

@jplatte jplatte added this to the 0.8 milestone Dec 17, 2024
@adryzz adryzz changed the title breaking: update tokio-tungstenite to 0.25 and update examples breaking: update tokio-tungstenite to 0.26 and update examples Dec 18, 2024
@adryzz adryzz force-pushed the tokio-tungstenite-25 branch from cf67e9f to 67c2ffc Compare December 18, 2024 09:59
axum/src/extract/ws.rs Outdated Show resolved Hide resolved
axum/src/extract/ws.rs Outdated Show resolved Hide resolved
axum/src/extract/ws.rs Outdated Show resolved Hide resolved
axum/src/extract/ws.rs Outdated Show resolved Hide resolved
Co-Authored-By: Alex Butler <[email protected]>
@jplock jplock mentioned this pull request Dec 18, 2024
1 task
Copy link
Member

@jplatte jplatte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few things I'd like to look at and possibly improve before releasing, but this is definitely a big step in the right direction so I'll merge 👍

Self::Binary(ref data) | Self::Ping(ref data) | Self::Pong(ref data) => {
Ok(std::str::from_utf8(data).map_err(Error::new)?)
}
Self::Close(None) => Ok(""),
Self::Close(Some(ref frame)) => Ok(&frame.reason),
}
}

/// Create a new text WebSocket message from a stringable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how to word this, but "stringable" is not a word in my vocabulary :D

Comment on lines +654 to +656
impl<T> PartialEq<T> for Utf8Bytes
where
for<'a> &'a str: PartialEq<T>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: revisit.

@jplatte jplatte merged commit 96e071c into tokio-rs:main Dec 18, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants