-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can not download ~200GB file using storage blob #1425
Comments
What version of Additionally, what type/size of instance are you trying to download the blob to? |
Hi @BenWhitehead |
Thanks for confirming. I'll see what I can determine for a fix. In the meantime, if you're blocked on this an alternative approach that could be used is as follows:
This method will be slower than |
Hi, @BenWhitehead Thanks for alternative approach. I have already used it, but the reason I use Thanks in advance. |
Wanted to let you know, I have been able to get the error to happen with some reliability (~60%). I'm still trying to figure out why it's erroring and not retrying from where it left off. The implementation of downloadTo should be attempting to retry the request. |
…fset on requests When invoking downloadTo(..., OutputStream) if a retry was attempted the proper byte offset was not being sent in the retried request. Update the low level method we use to send the request so that it does send the byte offset. Update ITRetryConformanceTest to run Scenario 8 test cases, which cover resuming a download which could have caught this error sooner. Update StorageException.translate(IOException) to classify `IOException: Premature EOF` as the existing retryable reason `connectionClosedPrematurely`. Add case to DefaultRetryHandlingBehaviorTest to ensure conformance to this categorization. Related to #1425
…fset on requests When invoking downloadTo(..., OutputStream) if a retry was attempted the proper byte offset was not being sent in the retried request. Update the low level method we use to send the request so that it does send the byte offset. Update ITRetryConformanceTest to run Scenario 8 test cases, which cover resuming a download which could have caught this error sooner. Update StorageException.translate(IOException) to classify `IOException: Premature EOF` as the existing retryable reason `connectionClosedPrematurely`. Add case to DefaultRetryHandlingBehaviorTest to ensure conformance to this categorization. Related to #1425
…fset on requests When invoking downloadTo(..., OutputStream) if a retry was attempted the proper byte offset was not being sent in the retried request. Update the low level method we use to send the request so that it does send the byte offset. Update ITRetryConformanceTest to run Scenario 8 test cases, which cover resuming a download which could have caught this error sooner. Update StorageException.translate(IOException) to classify `IOException: Premature EOF` as the existing retryable reason `connectionClosedPrematurely`. Add case to DefaultRetryHandlingBehaviorTest to ensure conformance to this categorization. Related to #1425
…fset on requests When invoking downloadTo(..., OutputStream) if a retry was attempted the proper byte offset was not being sent in the retried request. Update the low level method we use to send the request so that it does send the byte offset. Update ITRetryConformanceTest to run Scenario 8 test cases, which cover resuming a download which could have caught this error sooner. Update StorageException.translate(IOException) to classify `IOException: Premature EOF` as the existing retryable reason `connectionClosedPrematurely`. Add case to DefaultRetryHandlingBehaviorTest to ensure conformance to this categorization. Related to #1425
…fset on requests When invoking downloadTo(..., OutputStream) if a retry was attempted the proper byte offset was not being sent in the retried request. Update logic of HttpStorageRpc.read to manually set the range header rather than trying to rely on MediaDownloader to do it along with not automatically decompressing the byte stream. Update ITRetryConformanceTest to run Scenario 8 test cases, which cover resuming a download which could have caught this error sooner. Update StorageException.translate(IOException) to classify `IOException: Premature EOF` as the existing retryable reason `connectionClosedPrematurely`. Add case to DefaultRetryHandlingBehaviorTest to ensure conformance to this categorization. Break downloadTo integration test out into their own class, and separate the multiple scenarios being tested in the same method. Related to #1425
…fset on requests (#1434) * fix: update request method of HttpStorageRpc to properly configure offset on requests When invoking downloadTo(..., OutputStream) if a retry was attempted the proper byte offset was not being sent in the retried request. Update logic of HttpStorageRpc.read to manually set the range header rather than trying to rely on MediaDownloader to do it along with not automatically decompressing the byte stream. Update ITRetryConformanceTest to run Scenario 8 test cases, which cover resuming a download which could have caught this error sooner. Update StorageException.translate(IOException) to classify `IOException: Premature EOF` as the existing retryable reason `connectionClosedPrematurely`. Add case to DefaultRetryHandlingBehaviorTest to ensure conformance to this categorization. Break downloadTo integration test out into their own class, and separate the multiple scenarios being tested in the same method. Related to #1425
Hello @BenWhitehead! We are using google-cloud-storage:2.9.2 now, which contains your fixes, but still facing the same problem: Thanks in advance! |
I spent some more time trying to diagnose why it may still be happening for you, but unfortunately haven't been able to narrow down exactly what's happening. My best guess, but still unverified (the code is pretty difficult to trace precisely for order of operations), that that somehow the retry budget is exhausted to such a point that when the retry tries to determine if it can retry there isn't any budget left. If I'm able to prove one way or the other I'll report back. |
Hi @BenWhitehead! Do you have any updates on this issue? Please let me now if there is some progress |
Apologies for the long delay, lots of things got in the way. From what I've been able to suss out, this behavior seems to be related to the The reason we don't see the error when using the ReadChannel is that currently the ReadChannel issues a new RPC for each chunk's worth of data, whereas downloadTo will open a single request and read from the socket as long as it can. To allow retries to happen on such large files you'll need to increase the In case you're curious below is the minimized repro I was able to generate after digging about: Minimized Reproduction of behaviorCodepackage zx;
import com.google.api.core.NanoClock;
import com.google.api.gax.retrying.BasicResultRetryAlgorithm;
import com.google.api.gax.retrying.RetrySettings;
import com.google.cloud.RetryHelper;
import com.google.common.base.Stopwatch;
import java.util.concurrent.atomic.AtomicInteger;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.threeten.bp.Duration;
final class Retry {
private static final Logger LOGGER = LoggerFactory.getLogger(Retry.class);
public static void main(String[] args) {
LOGGER.debug(">>> info(args : {})", args);
RetrySettings retrySettings = RetrySettings.newBuilder()
.setMaxAttempts(5)
.setInitialRpcTimeout(Duration.ofSeconds(2))
.setMaxRpcTimeout(Duration.ofSeconds(4))
.setTotalTimeout(Duration.ofSeconds(12))
.build();
AtomicInteger x = new AtomicInteger(1);
Stopwatch s = Stopwatch.createStarted();
try {
String result = RetryHelper.runWithRetries(
() -> {
LOGGER.info("callable");
if (true) {
Thread.sleep(5_000);
throw new RuntimeException("kaboom " + x.getAndIncrement());
}
return "something";
},
retrySettings,
new BasicResultRetryAlgorithm<>() {
@Override
public boolean shouldRetry(Throwable previousThrowable, Object previousResponse) {
LOGGER.info(">>> shouldRetry(previousThrowable : {}, previousResponse : {})", previousThrowable, previousResponse);
return true;
}
},
NanoClock.getDefaultClock()
);
LOGGER.info("result = {}", result);
} finally {
Stopwatch stop = s.stop();
LOGGER.info("stop = {}", stop);
}
}
} Output
|
Hi again, after the recent fix in #1799 the alternative approach in #1425 (comment) is now provides bandwidth competitive with downloadTo[1] and providing transparent automatic retries. For the next major version we will very likely switch the internals of |
Execution fails when ~128gb is downloaded and code fail does not depend on time spent on downloading this 128 gb of file.
Environment details
OS type and version: Linux
Java version: 11
Code example
Stack trace
The text was updated successfully, but these errors were encountered: