-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recover gracefully from laptop sleep #1558
Comments
cc @luryus This is a problem across Connectors and we do want to improve support for sleep. |
Hi! I noticed that this (or a very similar issue) got fixed in the Go connector: GoogleCloudPlatform/cloud-sql-go-connector#686 Any chance this could now get ported to this Java library as well? |
Ack. We'll port the fix over shortly. |
I think this change should match the Go connector -- checking for an invalid cert and blocking until one is available if the cert is in fact expired. |
Hi @ttosta-google , it seems that this is still not fixed in v1.16.0 and with #1771. Even though the library now immediately starts a refresh when it notices that the current cert is expired, the force refresh does not succeed. Logs indicate that it gets stuck in the refresh rate limiter somehow. The library logs "Acquiring rate limiter permit.", but not "Rate limiter permit acquired". All subsequent connection retries then just get the good old This reproduces with both my reproducer project and DBeaver. Here's sample logs:
|
Thanks for the info @luryus! I've reopened this issue and will investigate it further. |
This is another example where our rate limiting approach to refreshes doesn't work so well. The rate limiter should be only wait 30 seconds to try again, though. |
Hi, any updates on this? With 1.17.1 this still happens. I just retested this with my test program and it seems that after the pc has been asleep, the rate limiter lets the force refresh succeed only after a long time - probably only after the duration for the next scheduled refresh is reached. Until that the connection attempts fail with the errors I pasted in my previous message, even if I wait more than 30 seconds between each attempt. It looks like a thread pool is starved or something, and that's why the refresh operations get stuck in the rate limiter. Please take a look at this again - I run to this almost daily while using DBeaver and it's super annoying. |
We've recently fixed a similar issue in the Proxy. For now I'd recommend connecting through the Proxy (which handles sleep without issue). And meanwhile, we've moved this up in our list of priorities and will investigate soon. |
This issue seems to be fixed by switching to the lazy refresh strategy. Makes sense - Windows sleeping disturbs timers just like idling on serverless platforms does, so avoiding them entirely fixes the problem. There's a slight delay when the certificates are refreshed after I wake up my laptop and make a query in DBeaver, but that's not a problem at all. Perhaps the docs could be updated to cover this use case? |
@luryus I would agree. Probably a good idea for us to update the README's Lazy Refresh section to mention the awaking from sleep use case. |
Right now if a laptop goes to sleep with an active connection to a Cloud SQL instance, the socket factory won't recover on its own once the laptop has been restored.
There are limitations in Java to handling this:
https://stackoverflow.com/questions/52876556/how-does-java-calculate-sleep-time-when-pc-goes-into-hibernate-mode
Nonetheless, we should throw out a bad connection and force a refresh, such that the socket factory recovers gracefully from sleep.
See https://github.com/luryus/cloud-sql-jdbc-hang for a reproduction of this issue.
Related to GoogleCloudPlatform/cloud-sql-proxy#1788.
Whatever we do here, we should port to the AlloyDB Java Connector as well.
The text was updated successfully, but these errors were encountered: