Benchmark is mostly idle at 10 connections #29

uNetworkingAB · 2023-05-07T01:19:55Z

The current run of only 10 connections is not enough to stress the servers, at least not uWS. Here is a list of differences between uWS and fastwebsockets at different number of connections assuming 1 kB messages. 10 connections has the least difference, so it's a natural pick if one wants to convey a minimal diff:

at 10 connections the diff is 7%
at 100 connections the diff is 9%
at 200 connections the diff is 16%
at 500 connections the diff is 34%

So it's pretty easy to tell there's some scaling issues that aren't being conveyed with the low count of only 10 connections. This can be improved by using more connections.

Edit: oh wow for 16 kB messages the diff is 56% at 200 connections

uNetworkingAB · 2023-05-07T04:15:41Z

For next rerun, I have a few relevant changes in v20.41.0

on master,

load_test now takes byte length and you can specify any length (it swaps from short, medium to long messages as needed)

uNetworkingAB · 2023-05-07T05:29:00Z

For 16kb messages at 500 connections there's more than 100% diff:

Using message size of 16000 bytes
Running benchmark now...
Msg/sec: 60466.250000
Msg/sec: 60521.250000
Msg/sec: 61029.250000

Using message size of 16000 bytes
Running benchmark now...
Msg/sec: 124614.000000
Msg/sec: 122536.500000

So those graphs are quite misleading as of now

littledivy · 2023-05-07T11:17:16Z

Can reproduce this 👍

Areas to improve:

Payloads are always copied over, it should be a clone-on-write view to a shared recv buffer. I wanted to do this earlier but Rust lifetimes won't let us do this with the current API.

We also cannot use the normal std::borrow::Cow here because masking happens in-place and we need a mutable borrow to the recv buffer. Instead something like this:

pub enum MutCow<'a, B>
where
    B: 'a + ToOwned + ?Sized,
    <B as ToOwned>::Owned: AsRef<B> + AsMut<B>,
{
    Borrowed(&'a mut B),
    Owned(<B as ToOwned>::Owned),
}

Be smart about using vectored writes. I think we should just enable writev when frame size is large enough. Alternatively, we should just improve the write buffer logic with sendto.

Excessive yields back to the Tokio scheduler. Under heavy load (~500 conns), I/O resources are almost always ready and quickly fill up the coop budget in Tokio - this forces Tokio to yield back to the scheduler so that "other tasks" can get a chance to be polled.

However in this particular echo_server benchmark there are no "other
tasks" we care about and we essentially end up wasting time.

littledivy · 2023-05-07T11:25:35Z

Meh, I just realised MutCow is an overkill and Frame payloads can just be a &'f mut [u8] :)

bartlomieju · 2023-05-07T12:27:50Z

Excessive yields back to the Tokio scheduler. Under heavy load (~500 conns), I/O resources are almost always ready and quickly fill up the coop budget in Tokio - this forces Tokio to yield back to the scheduler so that "other tasks" can get a chance to be polled.

Wrap relevant task in https://docs.rs/tokio/latest/tokio/task/fn.unconstrained.html to avoid forced yields.

uNetworkingAB · 2023-05-10T11:28:56Z

I've added initial io_uring in v21:

littledivy · 2023-05-10T11:50:06Z

Cool, I was playing with tokio-uring someday and it seems doable to add feature-gated code to support tokio-uring tcp streams. https://docs.rs/tokio-uring/latest/tokio_uring/net/struct.TcpStream.html#method.read

littledivy · 2023-05-11T06:01:44Z

Published fastwebsockets 0.4.2

@uNetworkingAB you might be interested in these charts:

littledivy · 2023-05-11T06:01:47Z

Current analysis:

fastwebsockets	uWS	conn	size	% (+/-)
197921	203761	10	20	-3%
211226	214914	200	20	-2%
213680	227030	500	20	-5%
101496	86058	10	16386	18%
122088	97946	200	16386	25%
106938	80347	500	16386	33%

uNetworkingAB · 2023-05-11T07:12:02Z

Ah, yes writev with 2 chunks beats write for long messages, not something I've bothered with (yet?). The short message bars make no sense though, they definitely do not match what I see here. I see at least 40% better short message perf. (1 kb and less) with uWS . You never tried v21, right? Even v20 beats fastwebsockets v0.4.2 on small messages by at least 15%, but the diff is extremely apparent in v21.

littledivy · 2023-05-11T07:19:48Z

Does v21 use epoll/kqueue by default for EchoServer?

uNetworkingAB · 2023-05-11T07:24:17Z

Don't get me wrong, this competition is good. I'm already looking at adding no-copy writev sends for anything above a threshold. This is good, and I can confirm those numbers, but current short message numbers are way off.

v21 defaults are epoll, there is a release post how to compile with io_uring but you need Linux 6.0 or later.

littledivy · 2023-05-11T07:41:09Z

Small msgs with uWS v21 EchoServer

fastwebsockets	uWS	conn	size	% (+/-)
191362	208341	10	20	-8%
211942	216165	200	20	-1.9%
200574	224980	500	20	-10%

Linux divy 5.19.0-1022-gcp 
#24~22.04.1-Ubuntu SMP x86_64 GNU/Linux

32GiB System memory
Intel(R) Xeon(R) CPU @ 3.10GHz

It does degrade to 10% but I cannot reproduce the drastic ~40% here.

uNetworkingAB · 2023-05-11T07:45:56Z

It needs Linux 6.0. You are on 5.19. You also need to recompile the load_test so that it uses io_uring. Otherwise you just have epoll trying to stress io_uring. You know it's right if strace only lists io_uring_enter, for both EchoServer and load_test.

littledivy · 2023-05-11T07:52:14Z

I want to compare epoll based implementations for now to find out why there is a 40% degrade you see.

The uWS EchoServer compiled is epoll and above results are for that. Is the 40% diff you see because of io_uring? (then that explains the diff)

uNetworkingAB · 2023-05-11T08:01:18Z

Yes 40% is from io_uring on Linux 6.0. There are features of 6.0 that are very central to that bigger diff and that's why I target this kernel version as minimum. This backend will be default as soon as it is stable, so it would be very strange to exclude it.

Anyways, first thing is probably adding this writev send path so we don't have gigantic diffs on bigger messages. I did remember why I never added it though - it's not applicable for compressed messages or SSL, so it's a very specific bypass for only non-ssl, non-compressed, big messages.

littledivy · 2023-05-11T08:15:02Z

Cool, the 40% diff will be relevant once fastwebsockets has a iouring backend. Opened #31 for tracking iouring support.

Self note: Add SSL benchmarks sometime in the future.

Anyways, I believe most of the things have been fixed and I'll continue to improve perf on small msgs (max 10% diff is fine for now). Feel free to open more related issues - this has been constructive 👍

uNetworkingAB · 2023-05-11T08:28:16Z

Yes competition creates incentive to improve, which is good. I will have writev fix done any time now.

uNetworkingAB · 2023-05-11T08:59:36Z

Oh wow, uWS is 10% faster on 16 kb echoes with writev now :D

littledivy closed this as completed May 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark is mostly idle at 10 connections #29

Benchmark is mostly idle at 10 connections #29

uNetworkingAB commented May 7, 2023 •

edited

Loading

uNetworkingAB commented May 7, 2023 •

edited

Loading

uNetworkingAB commented May 7, 2023

littledivy commented May 7, 2023 •

edited

Loading

littledivy commented May 7, 2023

bartlomieju commented May 7, 2023

uNetworkingAB commented May 10, 2023

littledivy commented May 10, 2023

littledivy commented May 11, 2023

littledivy commented May 11, 2023

uNetworkingAB commented May 11, 2023

littledivy commented May 11, 2023 •

edited

Loading

uNetworkingAB commented May 11, 2023

littledivy commented May 11, 2023

uNetworkingAB commented May 11, 2023 •

edited

Loading

littledivy commented May 11, 2023 •

edited

Loading

uNetworkingAB commented May 11, 2023

littledivy commented May 11, 2023

uNetworkingAB commented May 11, 2023

uNetworkingAB commented May 11, 2023

Benchmark is mostly idle at 10 connections #29

Benchmark is mostly idle at 10 connections #29

Comments

uNetworkingAB commented May 7, 2023 • edited Loading

uNetworkingAB commented May 7, 2023 • edited Loading

uNetworkingAB commented May 7, 2023

littledivy commented May 7, 2023 • edited Loading

littledivy commented May 7, 2023

bartlomieju commented May 7, 2023

uNetworkingAB commented May 10, 2023

littledivy commented May 10, 2023

littledivy commented May 11, 2023

littledivy commented May 11, 2023

uNetworkingAB commented May 11, 2023

littledivy commented May 11, 2023 • edited Loading

uNetworkingAB commented May 11, 2023

littledivy commented May 11, 2023

uNetworkingAB commented May 11, 2023 • edited Loading

littledivy commented May 11, 2023 • edited Loading

uNetworkingAB commented May 11, 2023

littledivy commented May 11, 2023

uNetworkingAB commented May 11, 2023

uNetworkingAB commented May 11, 2023

uNetworkingAB commented May 7, 2023 •

edited

Loading

uNetworkingAB commented May 7, 2023 •

edited

Loading

littledivy commented May 7, 2023 •

edited

Loading

littledivy commented May 11, 2023 •

edited

Loading

uNetworkingAB commented May 11, 2023 •

edited

Loading

littledivy commented May 11, 2023 •

edited

Loading