Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose with_retry in storage options? #3182

Open
oceanusxiv opened this issue Nov 27, 2024 · 1 comment
Open

Expose with_retry in storage options? #3182

oceanusxiv opened this issue Nov 27, 2024 · 1 comment

Comments

@oceanusxiv
Copy link

The underlying object_store crate being used supports setting a with_retry configuration which is useful for exponential backoff and jitter when you have temporary network outages. It should be exposed to the user via the storage_options API (or some other API) so it can be set, as it stands I don't think there's any exponential backoff to the download retries?

@westonpace
Copy link
Contributor

Correct, there is no exponential backoff to the download retries (though it depends on your definition of exponential). However, I'm not sure that object_store is the place to configure retry backoffs due to network outages. What sort of max duration are you looking for?

If you are looking for something over 5 minutes then you will encounter this warning from object_store:

As requests are retried without renewing credentials or regenerating request payloads, this number should be kept below 5 minutes to avoid errors due to expired credentials and/or request payloads

If you are looking for something less than 5 minutes then you can probably get there by exposing with_retry in some way. It should be a fairly straightforward change. Probably the simplest thing to expose would be init_backoff. I'd advise anyone working on this to read up on the actual algorithm used which is "decorrelated jitter" and not "classic exponential growth". It is designed to avoid waves of concurrent requests and not solve network outages. Its growth is sub-linear.

We do have an outer retry loop that we use in most places which can be configured with (sadly not documented download_retry_count but this only applies to the download of the data and not the initial transmission of headers).

So, if we want a retry loop for intermittent network timeouts it probably needs to be a new retry loop. I'd be open to the idea but also slightly cautious as this feels like something not all users will need and the users that do can build their own retry loop outside of Lance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants