-
-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ready check does not include current database connectivity #831
Comments
Good point, that should really be the case. |
The ready-checkers are registered here: keto/internal/driver/registry_default.go Line 88 in e9e6385
Currently none are registered, which means that Keto appears healthy as soon as it runs. |
From a kubernetes point of view, you dont want to include external dependencies, such as a database, in your readiness checks. Otherwise you might end up a in a cascading failures scenario where all pods are taken down and are unable to serve requests, and you are greeted with some generic error that does'nt really inform about whats causing the issue. |
Interesting standpoint, maybe @Demonsthere can give his opinion on this? Keto is generally not able to serve any request without a working database connection. Init migration jobs will also not complete, so you will end up in an error loop anyways on helm install. |
Imho, from a deployment perspective:
|
Sounds good, so basically we would ping the database on startup and report as ready once that succeeded. Further ready checks will not ping the database again, but always return true. |
In Kubernetes, we can define the failure threshold to retry before restarting pods. Also, we can define initialDelaySeconds to wait for some operational tasks to be complete before sending health/readiness requests. IMHO, I think that adding database health check might be good as well. |
In the helm charts the values for probes are exposed and can be configured to your liking :) |
Edit: we actually run into a related issue some time ago 😅 which caused us to rethink the setup a bit. We now have exposed the option to change the probes to custom ones, as seen here in kratos, and will work on reworking the healthchecks in general |
Isn't this solved now? I think one of the probes now checks DB connectivity |
They would have to be added here right? keto/internal/driver/registry_default.go Line 122 in 9215c06
Maybe that was a different project, and we can transfer the change? |
:O Yes, definitely, that needs to be checked! Otherwise we could run into an outage if we encounter one of those SQL connection bugs with cockroach that need a pod restart |
This comment was marked as duplicate.
This comment was marked as duplicate.
I just ran into an issue using Postgresql as backend, with calls to Keto reporting something like:
DB was up and retries didn't work. However, restarting the pod worked. I am wondering if there's a chance of this issue making it over the finish line? |
Preflight checklist
Describe the bug
The
health/ready
endpoint returns OK when database connectivity is no longer given. I would expect it to check this because according to the docs:This endpoint returns a 200 status code when the HTTP server is up running and the environment dependencies (e.g. the database) are responsive as well.
.Reproducing the bug
However when I try to insert/query tuples I will obviously be greeted with an error code.
Relevant log output
No response
Relevant configuration
No response
Version
0.6.0-alpha.1
On which operating system are you observing this issue?
No response
In which environment are you deploying?
Kubernetes with Helm
Additional Context
No response
The text was updated successfully, but these errors were encountered: