Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not download ~200GB file using storage blob #1425

Closed
rdubitsky-syberry opened this issue May 31, 2022 · 10 comments
Closed

Can not download ~200GB file using storage blob #1425

rdubitsky-syberry opened this issue May 31, 2022 · 10 comments
Assignees
Labels
api: storage Issues related to the googleapis/java-storage API. priority: p2 Moderately-important priority. Fix may not be included in next release. status: investigating The issue is under investigation, which is determined to be non-trivial. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@rdubitsky-syberry
Copy link

rdubitsky-syberry commented May 31, 2022

Execution fails when ~128gb is downloaded and code fail does not depend on time spent on downloading this 128 gb of file.

Environment details

OS type and version: Linux
Java version: 11

Code example

JsonObject jsonObject = new JsonObject();
jsonObject.addProperty(GoogleCredentialsConstants.CLIENT_ID, CLIENT_ID);
jsonObject.addProperty(GoogleCredentialsConstants.CLIENT_EMAIL, CLIENT_EMAIL);
jsonObject.addProperty(GoogleCredentialsConstants.ACCOUNT_TYPE, ACCOUNT_TYPE);
jsonObject.addProperty(GoogleCredentialsConstants.PRIVATE_KEY_ID, PRIVATE_KEY_ID);
jsonObject.addProperty(GoogleCredentialsConstants.PRIVATE_KEY, PRIVATE_KEY);
String credentials = jsonObject.toString();

GoogleCredentials googleCredentials;
try (InputStream inputStream = new ByteArrayInputStream(credentials.getBytes(StandardCharsets.UTF_8))) {
    googleCredentials = GoogleCredentials.fromStream(inputStream);
} catch (IOException ex) {
    throw new CustomException(ex);
}

Storage storage = StorageOptions.newBuilder()
      .setProjectId(projectId)
      .setCredentials(googleCredentials )
      .build()
      .getService();

Blob blob = storage.get(BlobId.of(bucketName, remoteFilePath));
blob.downloadTo(localFilePath); // line where code fails

Stack trace

Caused by: java.net.SocketException: Connection reset by peer (Write failed)
	at java.base/java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.base/java.net.SocketOutputStream.socketWrite(Unknown Source)
	at java.base/java.net.SocketOutputStream.write(Unknown Source)
	at java.base/sun.security.ssl.SSLSocketOutputRecord.flush(Unknown Source)
	at java.base/sun.security.ssl.OutputRecord.changeWriteCiphers(Unknown Source)
	at java.base/sun.security.ssl.KeyUpdate$KeyUpdateProducer.produce(Unknown Source)
	at java.base/sun.security.ssl.KeyUpdate$KeyUpdateKickstartProducer.produce(Unknown Source)
	at java.base/sun.security.ssl.SSLHandshake.kickstart(Unknown Source)
	at java.base/sun.security.ssl.PostHandshakeContext.kickstart(Unknown Source)
	at java.base/sun.security.ssl.TransportContext.kickstart(Unknown Source)
	at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source)
	at java.base/sun.security.ssl.SSLSocketImpl.startHandshake(Unknown Source)
	at java.base/sun.security.ssl.SSLSocketImpl.tryKeyUpdate(Unknown Source)
	at java.base/sun.security.ssl.SSLSocketImpl.decode(Unknown Source)
	at java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(Unknown Source)
	at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(Unknown Source)
	at java.base/java.io.BufferedInputStream.read1(Unknown Source)
	at java.base/java.io.BufferedInputStream.read(Unknown Source)
	at java.base/sun.net.www.MeteredStream.read(Unknown Source)
	at java.base/java.io.FilterInputStream.read(Unknown Source)
	at java.base/sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(Unknown Source)
	at com.google.api.client.http.javanet.NetHttpResponse$SizeValidatingInputStream.read(NetHttpResponse.java:164)
	at java.base/java.io.BufferedInputStream.read1(Unknown Source)
	at java.base/java.io.BufferedInputStream.read(Unknown Source)
	at java.base/java.io.FilterInputStream.read(Unknown Source)
	at com.google.common.io.ByteStreams.copy(ByteStreams.java:112)
	at com.google.api.client.googleapis.media.MediaHttpDownloader.executeCurrentRequest(MediaHttpDownloader.java:243)
	at com.google.api.client.googleapis.media.MediaHttpDownloader.download(MediaHttpDownloader.java:183)
	at com.google.api.client.googleapis.media.MediaHttpDownloader.download(MediaHttpDownloader.java:142)
@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/java-storage API. label May 31, 2022
@BenWhitehead
Copy link
Collaborator

Hi @rdubitsky-syberry,

What version of com.google.cloud:google-cloud-storage are you using (mvn dependency:list -Dincludes=com.google.cloud:google-cloud-storage)?

Additionally, what type/size of instance are you trying to download the blob to?

@BenWhitehead BenWhitehead added type: question Request for information or clarification. Not an issue. needs more info This issue needs more information from the customer to proceed. labels May 31, 2022
@rdubitsky-syberry
Copy link
Author

Hi @BenWhitehead
com.google.cloud:google-cloud-storage:2.4.0 is used
Type of instance is csv, size is 220 GB.

@BenWhitehead
Copy link
Collaborator

Hi @rdubitsky-syberry,

Thanks for confirming. I'll see what I can determine for a fix.

In the meantime, if you're blocked on this an alternative approach that could be used is as follows:

    try (
        ReadChannel r = storage.reader(BlobId.of(bucketName, remoteFilePath));
        WritableByteChannel w = new FileOutputStream(localFilePath).getChannel()
    ) {
      r.setChunkSize(16 * 1024 * 1024);  // attempt to read up to 16 MiB for each sub-request
      com.google.common.io.ByteStreams.copy(r, w);
    }

This method will be slower than downloadTo but has the advantage that it can incrementally retry the download.

@rdubitsky-syberry
Copy link
Author

rdubitsky-syberry commented Jun 1, 2022

Hi, @BenWhitehead

Thanks for alternative approach. I have already used it, but the reason I use downloadTo is that it is quicker than copy. Hope you'll come up with some fix.

Thanks in advance.

@BenWhitehead BenWhitehead added priority: p2 Moderately-important priority. Fix may not be included in next release. status: investigating The issue is under investigation, which is determined to be non-trivial. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. and removed type: question Request for information or clarification. Not an issue. needs more info This issue needs more information from the customer to proceed. labels Jun 3, 2022
@BenWhitehead
Copy link
Collaborator

Wanted to let you know, I have been able to get the error to happen with some reliability (~60%). I'm still trying to figure out why it's erroring and not retrying from where it left off. The implementation of downloadTo should be attempting to retry the request.

BenWhitehead added a commit that referenced this issue Jun 6, 2022
…fset on requests

When invoking downloadTo(..., OutputStream) if a retry was attempted the proper byte offset was not being sent in the retried request. Update the low level method we use to send the request so that it does send the byte offset.

Update ITRetryConformanceTest to run Scenario 8 test cases, which cover resuming a download which could have caught this error sooner.

Update StorageException.translate(IOException) to classify `IOException: Premature EOF` as the existing retryable reason `connectionClosedPrematurely`. Add case to DefaultRetryHandlingBehaviorTest to ensure conformance to this categorization.

Related to #1425
BenWhitehead added a commit that referenced this issue Jun 6, 2022
…fset on requests

When invoking downloadTo(..., OutputStream) if a retry was attempted the proper byte offset was not being sent in the retried request. Update the low level method we use to send the request so that it does send the byte offset.

Update ITRetryConformanceTest to run Scenario 8 test cases, which cover resuming a download which could have caught this error sooner.

Update StorageException.translate(IOException) to classify `IOException: Premature EOF` as the existing retryable reason `connectionClosedPrematurely`. Add case to DefaultRetryHandlingBehaviorTest to ensure conformance to this categorization.

Related to #1425
BenWhitehead added a commit that referenced this issue Jun 6, 2022
…fset on requests

When invoking downloadTo(..., OutputStream) if a retry was attempted the proper byte offset was not being sent in the retried request. Update the low level method we use to send the request so that it does send the byte offset.

Update ITRetryConformanceTest to run Scenario 8 test cases, which cover resuming a download which could have caught this error sooner.

Update StorageException.translate(IOException) to classify `IOException: Premature EOF` as the existing retryable reason `connectionClosedPrematurely`. Add case to DefaultRetryHandlingBehaviorTest to ensure conformance to this categorization.

Related to #1425
BenWhitehead added a commit that referenced this issue Jun 6, 2022
…fset on requests

When invoking downloadTo(..., OutputStream) if a retry was attempted the proper byte offset was not being sent in the retried request. Update the low level method we use to send the request so that it does send the byte offset.

Update ITRetryConformanceTest to run Scenario 8 test cases, which cover resuming a download which could have caught this error sooner.

Update StorageException.translate(IOException) to classify `IOException: Premature EOF` as the existing retryable reason `connectionClosedPrematurely`. Add case to DefaultRetryHandlingBehaviorTest to ensure conformance to this categorization.

Related to #1425
BenWhitehead added a commit that referenced this issue Jun 8, 2022
…fset on requests

When invoking downloadTo(..., OutputStream) if a retry was attempted the proper byte
offset was not being sent in the retried request. Update logic of HttpStorageRpc.read
to manually set the range header rather than trying to rely on MediaDownloader to do
it along with not automatically decompressing the byte stream.

Update ITRetryConformanceTest to run Scenario 8 test cases, which cover resuming a
download which could have caught this error sooner.

Update StorageException.translate(IOException) to classify `IOException: Premature EOF`
as the existing retryable reason `connectionClosedPrematurely`. Add case to
DefaultRetryHandlingBehaviorTest to ensure conformance to this categorization.

Break downloadTo integration test out into their own class, and separate
the multiple scenarios being tested in the same method.

Related to #1425
BenWhitehead added a commit that referenced this issue Jun 8, 2022
…fset on requests (#1434)

* fix: update request method of HttpStorageRpc to properly configure offset on requests

When invoking downloadTo(..., OutputStream) if a retry was attempted the proper byte
offset was not being sent in the retried request. Update logic of HttpStorageRpc.read
to manually set the range header rather than trying to rely on MediaDownloader to do
it along with not automatically decompressing the byte stream.

Update ITRetryConformanceTest to run Scenario 8 test cases, which cover resuming a
download which could have caught this error sooner.

Update StorageException.translate(IOException) to classify `IOException: Premature EOF`
as the existing retryable reason `connectionClosedPrematurely`. Add case to
DefaultRetryHandlingBehaviorTest to ensure conformance to this categorization.

Break downloadTo integration test out into their own class, and separate
the multiple scenarios being tested in the same method.

Related to #1425
@denis-yusevich
Copy link

Hello @BenWhitehead! We are using google-cloud-storage:2.9.2 now, which contains your fixes, but still facing the same problem:
after downloading about 50% of the file, socket hangs up and the same error arises. Please, take a look at it.

Thanks in advance!

@BenWhitehead
Copy link
Collaborator

I spent some more time trying to diagnose why it may still be happening for you, but unfortunately haven't been able to narrow down exactly what's happening.

My best guess, but still unverified (the code is pretty difficult to trace precisely for order of operations), that that somehow the retry budget is exhausted to such a point that when the retry tries to determine if it can retry there isn't any budget left. If I'm able to prove one way or the other I'll report back.

@denis-yusevich
Copy link

Hi @BenWhitehead! Do you have any updates on this issue? Please let me now if there is some progress

@BenWhitehead
Copy link
Collaborator

Apologies for the long delay, lots of things got in the way.

From what I've been able to suss out, this behavior seems to be related to the totalTimeout value from the RetrySettings. When a call fails, in addition to checking whether the error is retriable the elapsed time will be checked against the totalTimeout. If the previous invocation(s) elapsed time exceeds the totalTimeout then the Retryer will not proceed.

The reason we don't see the error when using the ReadChannel is that currently the ReadChannel issues a new RPC for each chunk's worth of data, whereas downloadTo will open a single request and read from the socket as long as it can.

To allow retries to happen on such large files you'll need to increase the totalTimeout for your instance of Storage similar to how it's demonstrated in the java code sample on https://cloud.google.com/storage/docs/retry-strategy (How Cloud Storage tools implement retry strategies > Client libraries > Java).

In case you're curious below is the minimized repro I was able to generate after digging about:

Minimized Reproduction of behavior

Code

package zx;

import com.google.api.core.NanoClock;
import com.google.api.gax.retrying.BasicResultRetryAlgorithm;
import com.google.api.gax.retrying.RetrySettings;
import com.google.cloud.RetryHelper;
import com.google.common.base.Stopwatch;
import java.util.concurrent.atomic.AtomicInteger;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.threeten.bp.Duration;

final class Retry {
  private static final Logger LOGGER = LoggerFactory.getLogger(Retry.class);
    
  public static void main(String[] args) {
    LOGGER.debug(">>> info(args : {})", args);
    RetrySettings retrySettings = RetrySettings.newBuilder()
      .setMaxAttempts(5)
      .setInitialRpcTimeout(Duration.ofSeconds(2))
      .setMaxRpcTimeout(Duration.ofSeconds(4))
      .setTotalTimeout(Duration.ofSeconds(12))
      .build();
    
    AtomicInteger x = new AtomicInteger(1);
    Stopwatch s = Stopwatch.createStarted();
    try {
      String result = RetryHelper.runWithRetries(
          () -> {
            LOGGER.info("callable");
            if (true) {
              Thread.sleep(5_000);
              throw new RuntimeException("kaboom " + x.getAndIncrement());
            }
            return "something";
          },
          retrySettings,
          new BasicResultRetryAlgorithm<>() {
            @Override
            public boolean shouldRetry(Throwable previousThrowable, Object previousResponse) {
              LOGGER.info(">>> shouldRetry(previousThrowable : {}, previousResponse : {})", previousThrowable, previousResponse);
              return true;
            }
          },
          NanoClock.getDefaultClock()
      );
      LOGGER.info("result = {}", result);
    } finally {
      Stopwatch stop = s.stop();
      LOGGER.info("stop = {}", stop);
    }
  }
}

Output

2022-09-23 15:13:50,424 INFO  [main] zx.Retry - callable
2022-09-23 15:13:55,433 INFO  [main] zx.Retry - >>> shouldRetry(previousThrowable : java.lang.RuntimeException: kaboom 1, previousResponse : null)
2022-09-23 15:13:55,434 INFO  [main] zx.Retry - >>> shouldRetry(previousThrowable : java.lang.RuntimeException: kaboom 1, previousResponse : null)
2022-09-23 15:13:55,435 INFO  [main] zx.Retry - callable
2022-09-23 15:14:00,435 INFO  [main] zx.Retry - >>> shouldRetry(previousThrowable : java.lang.RuntimeException: kaboom 2, previousResponse : null)
2022-09-23 15:14:00,435 INFO  [main] zx.Retry - >>> shouldRetry(previousThrowable : java.lang.RuntimeException: kaboom 2, previousResponse : null)
2022-09-23 15:14:00,435 INFO  [main] zx.Retry - callable
2022-09-23 15:14:05,436 INFO  [main] zx.Retry - >>> shouldRetry(previousThrowable : java.lang.RuntimeException: kaboom 3, previousResponse : null)
2022-09-23 15:14:05,436 INFO  [main] zx.Retry - >>> shouldRetry(previousThrowable : java.lang.RuntimeException: kaboom 3, previousResponse : null)
2022-09-23 15:14:05,436 INFO  [main] zx.Retry - >>> shouldRetry(previousThrowable : java.lang.RuntimeException: kaboom 3, previousResponse : null)
2022-09-23 15:14:05,436 INFO  [main] zx.Retry - stop = 15.02 s
Exception in thread "main" com.google.cloud.RetryHelper$RetryHelperException: java.lang.RuntimeException: kaboom 3
    at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:54)
    at zx.Retry.main(Retry.java:29)
Caused by: java.lang.RuntimeException: kaboom 3
    at zx.Retry.lambda$main$0(Retry.java:34)
    at com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:103)
    at com.google.cloud.RetryHelper.run(RetryHelper.java:76)
    at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
    ... 1 more

@BenWhitehead
Copy link
Collaborator

Hi again, after the recent fix in #1799 the alternative approach in #1425 (comment) is now provides bandwidth competitive with downloadTo[1] and providing transparent automatic retries.

For the next major version we will very likely switch the internals of downloadTo to use the new reader to gain these benefits transparently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/java-storage API. priority: p2 Moderately-important priority. Fix may not be included in next release. status: investigating The issue is under investigation, which is determined to be non-trivial. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

3 participants