Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default connect timeouts are too low #1701

Closed
tarzanek opened this issue Jun 6, 2023 · 4 comments · Fixed by #1719
Closed

default connect timeouts are too low #1701

tarzanek opened this issue Jun 6, 2023 · 4 comments · Fixed by #1719

Comments

@tarzanek
Copy link

tarzanek commented Jun 6, 2023

default cassandra timeouts are usually
5s for reads or 10s for other operations

while https://github.com/gocql/gocql/blob/master/cluster.go#L54 defaults in go are
600ms for both above

As you all know, client timeouts should be bigger than server ones, otherwise a reconnect storm or request retry storm might happen.

Could you increase default timeouts above (Timeout and ConnectTimeout ) to be above server side defaults ?

e.g. 6s for Timeout and 11s for ConnectTimeout

While at the same topic the default retry policy is also a bit hardcore (I think 3x retry per 1s), wouldn't it make sense to change it to some backoff policy? (but that said, main focus of this issue should be default timeouts, please)

@martin-sucha
Copy link
Contributor

Hi @tarzanek , thank you for the suggestion and sorry for the delayed reply.

Aligning the default values with the Cassandra defaults sounds good to me.

These are Cassandra 4.1 defaults:

Config option Default value
read_request_timeout 5s
range_request_timeout 10s
write_request_timeout 2s
counter_write_request_timeout 5s
cas_contention_timeout 1s
truncate_request_timeout 60s
request_timeout 10s

Here is a survey of some other clients:

Driver Version Connect timeout Request Timeout
DataStax Java Driver 4.16 connect-timeout=5s, init-query-timeout=5s 2s
DataStax Python Driver 3.28 5s 10s
scylla-go-driver ce81923df69 500ms 500ms
DataStax C++ Driver 2.16.2 5s 12s

It seems it might be worth opening tickets there as well.

Regarding gocql, it seems to me that we should set both connect timeout and request timeout to slightly more than 10 seconds, because range requests have default timeout 10s in Cassandra. Truncate requests are not that often and 60s seems excessive.

What was the reason you suggested Timeout lower than ConnectTimeout in the original post? Is it that multiple requests need to happen during session initialization?

@tarzanek
Copy link
Author

tarzanek commented Jun 9, 2023

so my suggestion was based on assumption that connect timeout is the first contact - which should be 10s default
but subsequent queries e.g. system.peers, ... should already use read request
but better than me assuming, we should check it in code, I didn't have time to go that deep or ask a developer of that part of codebase

@mykaul
Copy link

mykaul commented Aug 27, 2023

It'd be great to move forward with this. We've had a use case which used the default and really hit the poor node hard - with more clients trying to reconnect way too fast due to the low timeout. Should we send a PR fixing the default connect timeout value to higher value?

martin-sucha added a commit to kiwicom/gocql that referenced this issue Aug 28, 2023
Client timeouts need to be higher than server timeouts,
so that work does not accumulate on the server with retries.

This was not true by default, the gocql default timeout was lower
than the Cassandra default timeout.

Closes apache#1671
Closes apache#1701
martin-sucha added a commit to kiwicom/gocql that referenced this issue Aug 28, 2023
Client timeouts need to be higher than server timeouts,
so that work does not accumulate on the server with retries.

This was not true by default, the gocql default timeout was lower
than the Cassandra default timeout.

Closes apache#1671
Closes apache#1701
@martin-sucha
Copy link
Contributor

@mykaul @tarzanek I prepared PR #1719 fixing the defaults and updating the docs. Could you please review?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants