Can not download ~200GB file using storage blob #1425

rdubitsky-syberry · 2022-05-31T14:48:11Z

Execution fails when ~128gb is downloaded and code fail does not depend on time spent on downloading this 128 gb of file.

Environment details

OS type and version: Linux
Java version: 11

Code example

JsonObject jsonObject = new JsonObject();
jsonObject.addProperty(GoogleCredentialsConstants.CLIENT_ID, CLIENT_ID);
jsonObject.addProperty(GoogleCredentialsConstants.CLIENT_EMAIL, CLIENT_EMAIL);
jsonObject.addProperty(GoogleCredentialsConstants.ACCOUNT_TYPE, ACCOUNT_TYPE);
jsonObject.addProperty(GoogleCredentialsConstants.PRIVATE_KEY_ID, PRIVATE_KEY_ID);
jsonObject.addProperty(GoogleCredentialsConstants.PRIVATE_KEY, PRIVATE_KEY);
String credentials = jsonObject.toString();

GoogleCredentials googleCredentials;
try (InputStream inputStream = new ByteArrayInputStream(credentials.getBytes(StandardCharsets.UTF_8))) {
    googleCredentials = GoogleCredentials.fromStream(inputStream);
} catch (IOException ex) {
    throw new CustomException(ex);
}

Storage storage = StorageOptions.newBuilder()
      .setProjectId(projectId)
      .setCredentials(googleCredentials )
      .build()
      .getService();

Blob blob = storage.get(BlobId.of(bucketName, remoteFilePath));
blob.downloadTo(localFilePath); // line where code fails

Stack trace

Caused by: java.net.SocketException: Connection reset by peer (Write failed)
	at java.base/java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.base/java.net.SocketOutputStream.socketWrite(Unknown Source)
	at java.base/java.net.SocketOutputStream.write(Unknown Source)
	at java.base/sun.security.ssl.SSLSocketOutputRecord.flush(Unknown Source)
	at java.base/sun.security.ssl.OutputRecord.changeWriteCiphers(Unknown Source)
	at java.base/sun.security.ssl.KeyUpdate$KeyUpdateProducer.produce(Unknown Source)
	at java.base/sun.security.ssl.KeyUpdate$KeyUpdateKickstartProducer.produce(Unknown Source)
	at java.base/sun.security.ssl.SSLHandshake.kickstart(Unknown Source)
	at java.base/sun.security.ssl.PostHandshakeContext.kickstart(Unknown Source)
	at java.base/sun.security.ssl.TransportContext.kickstart(Unknown Source)
	at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source)
	at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source)
	at java.base/sun.security.ssl.SSLSocketImpl.tryKeyUpdate(Unknown Source)
	at java.base/sun.security.ssl.SSLSocketImpl.decode(Unknown Source)
	at java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(Unknown Source)
	at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(Unknown Source)
	at java.base/java.io.BufferedInputStream.read1(Unknown Source)
	at java.base/java.io.BufferedInputStream.read(Unknown Source)
	at java.base/sun.net.www.MeteredStream.read(Unknown Source)
	at java.base/java.io.FilterInputStream.read(Unknown Source)
	at java.base/sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(Unknown Source)
	at com.google.api.client.http.javanet.NetHttpResponse$SizeValidatingInputStream.read(NetHttpResponse.java:164)
	at java.base/java.io.BufferedInputStream.read1(Unknown Source)
	at java.base/java.io.BufferedInputStream.read(Unknown Source)
	at java.base/java.io.FilterInputStream.read(Unknown Source)
	at com.google.common.io.ByteStreams.copy(ByteStreams.java:112)
	at com.google.api.client.googleapis.media.MediaHttpDownloader.executeCurrentRequest(MediaHttpDownloader.java:243)
	at com.google.api.client.googleapis.media.MediaHttpDownloader.download(MediaHttpDownloader.java:183)
	at com.google.api.client.googleapis.media.MediaHttpDownloader.download(MediaHttpDownloader.java:142)

The text was updated successfully, but these errors were encountered:

BenWhitehead · 2022-05-31T19:15:30Z

Hi @rdubitsky-syberry,

What version of com.google.cloud:google-cloud-storage are you using (mvn dependency:list -Dincludes=com.google.cloud:google-cloud-storage)?

Additionally, what type/size of instance are you trying to download the blob to?

rdubitsky-syberry · 2022-05-31T19:49:33Z

Hi @BenWhitehead
com.google.cloud:google-cloud-storage:2.4.0 is used
Type of instance is csv, size is 220 GB.

BenWhitehead · 2022-05-31T23:15:19Z

Hi @rdubitsky-syberry,

Thanks for confirming. I'll see what I can determine for a fix.

In the meantime, if you're blocked on this an alternative approach that could be used is as follows:

    try (
        ReadChannel r = storage.reader(BlobId.of(bucketName, remoteFilePath));
        WritableByteChannel w = new FileOutputStream(localFilePath).getChannel()
    ) {
      r.setChunkSize(16 * 1024 * 1024);  // attempt to read up to 16 MiB for each sub-request
      com.google.common.io.ByteStreams.copy(r, w);
    }

This method will be slower than downloadTo but has the advantage that it can incrementally retry the download.

rdubitsky-syberry · 2022-06-01T08:14:07Z

Hi, @BenWhitehead

Thanks for alternative approach. I have already used it, but the reason I use downloadTo is that it is quicker than copy. Hope you'll come up with some fix.

Thanks in advance.

BenWhitehead · 2022-06-03T19:42:29Z

Wanted to let you know, I have been able to get the error to happen with some reliability (~60%). I'm still trying to figure out why it's erroring and not retrying from where it left off. The implementation of downloadTo should be attempting to retry the request.

…fset on requests When invoking downloadTo(..., OutputStream) if a retry was attempted the proper byte offset was not being sent in the retried request. Update the low level method we use to send the request so that it does send the byte offset. Update ITRetryConformanceTest to run Scenario 8 test cases, which cover resuming a download which could have caught this error sooner. Update StorageException.translate(IOException) to classify `IOException: Premature EOF` as the existing retryable reason `connectionClosedPrematurely`. Add case to DefaultRetryHandlingBehaviorTest to ensure conformance to this categorization. Related to #1425

…fset on requests When invoking downloadTo(..., OutputStream) if a retry was attempted the proper byte offset was not being sent in the retried request. Update logic of HttpStorageRpc.read to manually set the range header rather than trying to rely on MediaDownloader to do it along with not automatically decompressing the byte stream. Update ITRetryConformanceTest to run Scenario 8 test cases, which cover resuming a download which could have caught this error sooner. Update StorageException.translate(IOException) to classify `IOException: Premature EOF` as the existing retryable reason `connectionClosedPrematurely`. Add case to DefaultRetryHandlingBehaviorTest to ensure conformance to this categorization. Break downloadTo integration test out into their own class, and separate the multiple scenarios being tested in the same method. Related to #1425

…fset on requests (#1434) * fix: update request method of HttpStorageRpc to properly configure offset on requests When invoking downloadTo(..., OutputStream) if a retry was attempted the proper byte offset was not being sent in the retried request. Update logic of HttpStorageRpc.read to manually set the range header rather than trying to rely on MediaDownloader to do it along with not automatically decompressing the byte stream. Update ITRetryConformanceTest to run Scenario 8 test cases, which cover resuming a download which could have caught this error sooner. Update StorageException.translate(IOException) to classify `IOException: Premature EOF` as the existing retryable reason `connectionClosedPrematurely`. Add case to DefaultRetryHandlingBehaviorTest to ensure conformance to this categorization. Break downloadTo integration test out into their own class, and separate the multiple scenarios being tested in the same method. Related to #1425

denis-yusevich · 2022-07-29T12:14:03Z

Hello @BenWhitehead! We are using google-cloud-storage:2.9.2 now, which contains your fixes, but still facing the same problem:
after downloading about 50% of the file, socket hangs up and the same error arises. Please, take a look at it.

Thanks in advance!

BenWhitehead · 2022-08-09T21:33:18Z

I spent some more time trying to diagnose why it may still be happening for you, but unfortunately haven't been able to narrow down exactly what's happening.

My best guess, but still unverified (the code is pretty difficult to trace precisely for order of operations), that that somehow the retry budget is exhausted to such a point that when the retry tries to determine if it can retry there isn't any budget left. If I'm able to prove one way or the other I'll report back.

denis-yusevich · 2022-09-23T15:20:10Z

Hi @BenWhitehead! Do you have any updates on this issue? Please let me now if there is some progress

BenWhitehead · 2022-09-23T19:27:59Z

Apologies for the long delay, lots of things got in the way.

From what I've been able to suss out, this behavior seems to be related to the totalTimeout value from the RetrySettings. When a call fails, in addition to checking whether the error is retriable the elapsed time will be checked against the totalTimeout. If the previous invocation(s) elapsed time exceeds the totalTimeout then the Retryer will not proceed.

The reason we don't see the error when using the ReadChannel is that currently the ReadChannel issues a new RPC for each chunk's worth of data, whereas downloadTo will open a single request and read from the socket as long as it can.

To allow retries to happen on such large files you'll need to increase the totalTimeout for your instance of Storage similar to how it's demonstrated in the java code sample on https://cloud.google.com/storage/docs/retry-strategy (How Cloud Storage tools implement retry strategies > Client libraries > Java).

In case you're curious below is the minimized repro I was able to generate after digging about:

Minimized Reproduction of behavior

Code

package zx;

import com.google.api.core.NanoClock;
import com.google.api.gax.retrying.BasicResultRetryAlgorithm;
import com.google.api.gax.retrying.RetrySettings;
import com.google.cloud.RetryHelper;
import com.google.common.base.Stopwatch;
import java.util.concurrent.atomic.AtomicInteger;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.threeten.bp.Duration;

final class Retry {
  private static final Logger LOGGER = LoggerFactory.getLogger(Retry.class);
    
  public static void main(String[] args) {
    LOGGER.debug(">>> info(args : {})", args);
    RetrySettings retrySettings = RetrySettings.newBuilder()
      .setMaxAttempts(5)
      .setInitialRpcTimeout(Duration.ofSeconds(2))
      .setMaxRpcTimeout(Duration.ofSeconds(4))
      .setTotalTimeout(Duration.ofSeconds(12))
      .build();
    
    AtomicInteger x = new AtomicInteger(1);
    Stopwatch s = Stopwatch.createStarted();
    try {
      String result = RetryHelper.runWithRetries(
          () -> {
            LOGGER.info("callable");
            if (true) {
              Thread.sleep(5_000);
              throw new RuntimeException("kaboom " + x.getAndIncrement());
            }
            return "something";
          },
          retrySettings,
          new BasicResultRetryAlgorithm<>() {
            @Override
            public boolean shouldRetry(Throwable previousThrowable, Object previousResponse) {
              LOGGER.info(">>> shouldRetry(previousThrowable : {}, previousResponse : {})", previousThrowable, previousResponse);
              return true;
            }
          },
          NanoClock.getDefaultClock()
      );
      LOGGER.info("result = {}", result);
    } finally {
      Stopwatch stop = s.stop();
      LOGGER.info("stop = {}", stop);
    }
  }
}

Output

2022-09-23 15:13:50,424 INFO  [main] zx.Retry - callable
2022-09-23 15:13:55,433 INFO  [main] zx.Retry - >>> shouldRetry(previousThrowable : java.lang.RuntimeException: kaboom 1, previousResponse : null)
2022-09-23 15:13:55,434 INFO  [main] zx.Retry - >>> shouldRetry(previousThrowable : java.lang.RuntimeException: kaboom 1, previousResponse : null)
2022-09-23 15:13:55,435 INFO  [main] zx.Retry - callable
2022-09-23 15:14:00,435 INFO  [main] zx.Retry - >>> shouldRetry(previousThrowable : java.lang.RuntimeException: kaboom 2, previousResponse : null)
2022-09-23 15:14:00,435 INFO  [main] zx.Retry - >>> shouldRetry(previousThrowable : java.lang.RuntimeException: kaboom 2, previousResponse : null)
2022-09-23 15:14:00,435 INFO  [main] zx.Retry - callable
2022-09-23 15:14:05,436 INFO  [main] zx.Retry - >>> shouldRetry(previousThrowable : java.lang.RuntimeException: kaboom 3, previousResponse : null)
2022-09-23 15:14:05,436 INFO  [main] zx.Retry - >>> shouldRetry(previousThrowable : java.lang.RuntimeException: kaboom 3, previousResponse : null)
2022-09-23 15:14:05,436 INFO  [main] zx.Retry - >>> shouldRetry(previousThrowable : java.lang.RuntimeException: kaboom 3, previousResponse : null)
2022-09-23 15:14:05,436 INFO  [main] zx.Retry - stop = 15.02 s
Exception in thread "main" com.google.cloud.RetryHelper$RetryHelperException: java.lang.RuntimeException: kaboom 3
    at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:54)
    at zx.Retry.main(Retry.java:29)
Caused by: java.lang.RuntimeException: kaboom 3
    at zx.Retry.lambda$main$0(Retry.java:34)
    at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103)
    at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
    at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
    ... 1 more

BenWhitehead · 2023-02-01T19:35:50Z

Hi again, after the recent fix in #1799 the alternative approach in #1425 (comment) is now provides bandwidth competitive with downloadTo[1] and providing transparent automatic retries.

For the next major version we will very likely switch the internals of downloadTo to use the new reader to gain these benefits transparently.

product-auto-label bot added the api: storage Issues related to the googleapis/java-storage API. label May 31, 2022

BenWhitehead added type: question Request for information or clarification. Not an issue. needs more info This issue needs more information from the customer to proceed. labels May 31, 2022

cojenco assigned BenWhitehead Jun 1, 2022

BenWhitehead mentioned this issue Jun 6, 2022

fix: update request method of HttpStorageRpc to properly configure offset on requests #1434

Merged

BenWhitehead closed this as completed Sep 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can not download ~200GB file using storage blob #1425

Can not download ~200GB file using storage blob #1425

rdubitsky-syberry commented May 31, 2022 •

edited

Loading

BenWhitehead commented May 31, 2022

rdubitsky-syberry commented May 31, 2022

BenWhitehead commented May 31, 2022

rdubitsky-syberry commented Jun 1, 2022 •

edited

Loading

BenWhitehead commented Jun 3, 2022

denis-yusevich commented Jul 29, 2022

BenWhitehead commented Aug 9, 2022

denis-yusevich commented Sep 23, 2022

BenWhitehead commented Sep 23, 2022

Code

Output

BenWhitehead commented Feb 1, 2023

Can not download ~200GB file using storage blob #1425

Can not download ~200GB file using storage blob #1425

Comments

rdubitsky-syberry commented May 31, 2022 • edited Loading

Environment details

Code example

Stack trace

BenWhitehead commented May 31, 2022

rdubitsky-syberry commented May 31, 2022

BenWhitehead commented May 31, 2022

rdubitsky-syberry commented Jun 1, 2022 • edited Loading

BenWhitehead commented Jun 3, 2022

denis-yusevich commented Jul 29, 2022

BenWhitehead commented Aug 9, 2022

denis-yusevich commented Sep 23, 2022

BenWhitehead commented Sep 23, 2022

Code

Output

BenWhitehead commented Feb 1, 2023

rdubitsky-syberry commented May 31, 2022 •

edited

Loading

rdubitsky-syberry commented Jun 1, 2022 •

edited

Loading