-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add timeout to Ceph GET API calls #900
Comments
Can you be more specific about what APIs you mean? When I read "Get API calls" I think RGW (HTTP) APIs, but when I look at the linked issue it doesn't seem to be RGW specific. The APIs that wrap C calls from Ceph do not support things like Go's contexts so the typical methods for timing out in Go do not work. There are some timeout related parameters in the ceph configuration that you could apply to a rados connection. You'd probably need to experiment with them to see what works for your use-case (if any). |
Hi @phlogistonjohn, the problem that we are trying to solve is csi pod hang when there is something wrong in the ceph cluster or some network problems. In such cases pod restart is the only manual fix available at the moment. So we are trying to add timeouts to such csi calls (mainly the get calls). So if it is possible to do that directly on rados that would be great. Or else we might need to write wrappers around the get calls to handle it. Some more context on this can be found here (a bit old though). Thanks for your inputs on the timeout related parameters in ceph configs. Let me check whether those can be useful here. |
any updates? |
I would not prefer implementing timeouts at go-ceph as it is supposed to be a simple wrapper around C libraries. If one were to modify go-ceph to include support for timeouts it would lead to major refactors to the project as well as the consumers of this project. Storage systems are expected to be transparent, i.e. if something is not in an expected state, it should be clear, we should not try to pretend otherwise. Moreover timeout is not something that would be useful in every use case there is. Since we only need timeouts in csi driver for GET calls, we can implement a wrapper on driver side of things with something like: // This is just a mockup
func TimedWrapper(ctx context.Context) (string, error) {
// type this chan to the return type
done := make(chan string, 1)
go func() {
defer close(done)
// mockup the call to go-ceph API func
chunks := 10
for i := 0; i < chunks; i++ {
time.Sleep(time.Second)
}
// Return the value
done <- "success!"
}()
select {
case a := <-done:
return a, nil
case <-ctx.Done():
return "", ctx.Err()
}
} The drawback of this approach is we have no way to kill the command post flight. We can terminate the goroutine itself using signals. As it is just a simple GET call, would leaving it as is be an issue? Please share your thoughts on this.. Thank you! |
The main question that I have, is if it is possible to terminate a goroutine if it is executing a librados, librbd or libcephfs call. Interrupting the C-library call may not be possible, or may not be reliable depending on the call? Experimenting with that and sharing research results would be needed to really understand if this approach gives any benefits. @ansiwen might have ideas about interrupting CGo calls too. |
This is to add neccessary changes in go-ceph to handle the ceph-csi issue #ceph/ceph-csi#3657.
Provide a way to configure the timeout for the ceph Get API calls to avoid command stuck if there is some problem between the ceph cluster and the csi driver (cluster health, slow ops, or short network connectivity problem)
For more info please refer to the ceph-csi issue.
The text was updated successfully, but these errors were encountered: