Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files occasionally being truncated when downloading #2055

Closed
Jake7D opened this issue Aug 31, 2022 · 3 comments · Fixed by #2056
Closed

Files occasionally being truncated when downloading #2055

Jake7D opened this issue Aug 31, 2022 · 3 comments · Fixed by #2056
Assignees
Labels
api: storage Issues related to the googleapis/nodejs-storage API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@Jake7D
Copy link

Jake7D commented Aug 31, 2022

Issue

We're seeing occasionally that files are being truncated when they are downloaded without the storage library throwing an error.
What we're trying to do is concurrently pull down a set of files from a location to then process them. We also exclude any 0 byte files when we download. We create multiple pubsub subscriptions when doing this as the node version doesn't support concurrent messaging in pubsub in the way we need it to.
We're seeing this happening around 5% of the time we download files.

Environment details

  • OS: Windows
  • Node.js version: 16.16.0
  • npm version: 8.11.0
  • @google-cloud/storage version: 6.2.0

Steps to reproduce

Difficult to explain so including a snippet of what we're doing here. We're using storage alongside the pubsub library so including the code for that, as we have a suspicion that might be causing some of the issues.

const pubsub = new PubSub({ projectId });
const storage = new Storage({ projectId });
const subscriptions = [...Array(5).keys()].map(() => pubsub.subscription(validReleaseSub, {
  ackDeadline: 600,
  flowControl: {
    allowExcessMessages: false,
    maxMessages: 5
  }
}));

for(const subscription of subscriptions) {
  subscription.on('message', async (message) => {
   const bucket = storage.bucket(message.bucket);
   const [files] = await bucket.getFiles({ prefix: `${message.location}/` });

   await Promise.all(
     files.filter(file => file.metadata.size != 0).map(async file => {
      const destination = file.name.replace(message.location, destinationPath);
      await file.download({destination});
    });
   );

  // process the files
  });
}
@Jake7D Jake7D added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Aug 31, 2022
@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/nodejs-storage API. label Aug 31, 2022
@danielbankhead danielbankhead self-assigned this Aug 31, 2022
@danielbankhead danielbankhead added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed priority: p2 Moderately-important priority. Fix may not be included in next release. labels Aug 31, 2022
@Jake7D
Copy link
Author

Jake7D commented Sep 1, 2022

So we've also noticed this seems to start happening when the CPU ramps up on the box (we have files delivered from an external source to the bucket, so they come in sporadically in varying numbers)

@danielbankhead
Copy link
Contributor

Hey @Jake7D, thanks for your patience - a fix has been merged. The next release will public shortly:

To better mitigate and catch issues like this in testing in the future we're planning to implement a revamp of our stream implementation in the next major release:

@Jake7D
Copy link
Author

Jake7D commented Sep 5, 2022

Perfect thanks @danielbankhead. We'll give it a try and feedback if we get any other problems

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/nodejs-storage API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants