New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

RFC: Load-based replica read #105

Merged

ekexium merged 7 commits into tikv:master from sticnarf:load-based-replica-read

Sep 18, 2023

Contributor

sticnarf commented Jan 12, 2023

No description provided.


          RFC: Load-based replica read

c01fd7c

Signed-off-by: Yilin Chen <[email protected]>

Contributor Author

sticnarf commented Jan 12, 2023 •

edited by cfzjywxk

Loading

This design only maintains the load of TiKVs in the client. And the client only receives the load info only when ServerIsBusy is returned. Hope we can get enough benefit only with the simple design.

I'm still not confident about the retry strategy in this document. Completely different strategy designs are welcome.

sticnarf requested review from cfzjywxk, ekexium and you06

January 12, 2023 09:25

cfzjywxk reviewed

View reviewed changes

text/0105-load-based-replica-read.md Outdated Show resolved Hide resolved

text/0105-load-based-replica-read.md Outdated Show resolved Hide resolved

text/0105-load-based-replica-read.md Show resolved Hide resolved

text/0105-load-based-replica-read.md Show resolved Hide resolved


          Elaborate the calculation of average time slice length

115839e

Signed-off-by: Yilin Chen <[email protected]>

ekexium reviewed

View reviewed changes

text/0105-load-based-replica-read.md Outdated


		The current queue length is easily known. But we have to predict the average time slice in the short future. We can use the EWMA of the previous time slices to estimate it. $S_{now}$ is the average time slice length of the read pool in the past second. We update the latest EWMA $S_{i}$ every second using the following formula:

		$$S_{i}=\alpha \cdot S_{now}+(1-\alpha) \cdot S_{i-1}$$

Contributor

ekexium Jan 12, 2023 •

edited

Loading

This seems a bit vague to me. ~~Does $S_j$ represent either an estimate value or observed data? Maybe we should distinguish them.~~ Oh I get it. $S_j$ always means estimate. Only $S_{now}$ is observed data. We could use other symbols to distinguish them

Contributor Author

sticnarf Jan 13, 2023

Now I use $\hat S$ for the predicted value (EWMA) and $Y_{t}$ as the observed value.

text/0105-load-based-replica-read.md Outdated


		Knowing the current queue length $L$ and the average time slice $S$ of the read pool, we can estimate that the wait duration is $T_{waiting} =L \cdot S$.

		The current queue length is easily known. But we have to predict the average time slice in the short future. We can use the EWMA of the previous time slices to estimate it. $S_{now}$ is the average time slice length of the read pool in the past second. We update the latest EWMA $S_{i}$ every second using the following formula:

Contributor

ekexium Jan 12, 2023

Seems it can take at most 1 second for the mechanism to recognize a spike of load. Underestimating the load might undermine the optimization.
Does a shorter interval improve the sensitivity while not introduce much more overhead?

Contributor Author

sticnarf Jan 13, 2023

I change it to 200ms. The average time slice does not change much under a spike of load. So, the update frequency needn't be very short.

ekexium reviewed

View reviewed changes

text/0105-load-based-replica-read.md Outdated


		Knowing the current queue length $L$ and the average time slice $S$ of the read pool, we can estimate that the wait duration is $T_{waiting} =L \cdot S$.

		The current queue length is easily known. But we have to predict the average time slice in the short future. We can use the EWMA of the previous time slices to estimate it. $S_{now}$ is the average time slice length of the read pool in the past second. We update the latest EWMA $S_{i}$ every second using the following formula:

Contributor

ekexium Jan 12, 2023

When load is extremely low (e.g. there is only 1 large read request, or even 0), could it misestimate $S_{now}$ by simply calculating the average?

Contributor Author

sticnarf Jan 13, 2023

Yes, it's a good point. I add a paragraph below for this case.

sticnarf added 2 commits

January 13, 2023 14:04


          Update EWMA calculation formula

f50388c

Signed-off-by: Yilin Chen <[email protected]>


          Increase busy_threshold_ms for leader request when all TiKVs are busy

6fb42c0

Signed-off-by: Yilin Chen <[email protected]>

cfzjywxk reviewed

View reviewed changes

text/0105-load-based-replica-read.md


		To make use of as many resources as possible, the load we predict should not be larger than the current load. Otherwise, we may skip a node that is already free for executing requests and not get the best performance.

		We use `estimatedWait - (time.Now().Since(waitTimeUpdatedAt))` as the estimated waiting duration in the client. It's mostly certain that this estimated value is smaller than real because the TiKV accepts requests meanwhile and some queries don't finish in a single time slice.

Contributor

cfzjywxk Jan 18, 2023 •

edited

Loading

My initial thought is to let the client use the observed metrics like cop_task_avg_wait_duration in a recent time interval or something like that to decide which replica to choose next. This estimatedWait - (time.Now().Since(waitTimeUpdatedAt)) looks simpler and could avoid retrying already busy replicas 🤔

you06 reviewed

View reviewed changes

text/0105-load-based-replica-read.md

+              }
+              ```
+              Because we will retry in replica-read mode, we don't need the follower or learner to issue a read index RPC again after knowing the applied index.

you06 Jan 19, 2023

What will the replica-read node do when it's applied index is not satisfied?

Contributor Author

sticnarf Jan 19, 2023

It waits until it applies the index. This saves the read index RPC, and the other procedures are the same with the original replica read.

This was referenced Jan 19, 2023

Add busy threshold to Context and current read pool load to ServerIsBusy pingcap/kvproto#1045

Merged

copr: reject request when estimated waiting duration exceeds threshold tikv/tikv#14077

Merged

Support load-based replica read tikv/client-go#675

Merged

copr: support load_based_replica_read_threshold pingcap/tidb#40742

Merged

Member

zhangjinpeng87 commented Jan 30, 2023

Please also consider cross AZ data transfer fee when deploy tikv cross AZs.

Contributor Author

sticnarf commented Jan 30, 2023

Please also consider cross AZ data transfer fee when deploy tikv cross AZs.

If user experience is more important, this feature is also worth considering in spite of the extra cost.

Anyway, this mode is not available to users using closest-replica/adaptive mode now.

sticnarf mentioned this pull request

Tracking issue: Load-based replica read tikv/tikv#14151

Open

4 tasks

sticnarf added 2 commits

February 6, 2023 17:56


          Add the tracking issue link

Signed-off-by: Yilin Chen <[email protected]>


          Add notes for users that are sensitive to traffic fees

52c5dbf

Signed-off-by: Yilin Chen <[email protected]>

you06 mentioned this pull request

ReplicaReadMode: introduce new replica_read mode PreferLeader. pingcap/tidb#40906

Merged

12 tasks

cfzjywxk approved these changes

View reviewed changes

sticnarf mentioned this pull request

Support load-based replica read pingcap/tidb#41664

Closed

ekexium approved these changes

View reviewed changes

Contributor

ekexium commented Sep 18, 2023

/merge


          Merge branch 'master' into load-based-replica-read

ekexium merged commit 23a29b6 into tikv:master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet