Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v6.4.1 seems to introduce a bug with gzipped files #2044

Closed
neilstuartcraig opened this issue Aug 17, 2022 · 15 comments · Fixed by #2056
Closed

v6.4.1 seems to introduce a bug with gzipped files #2044

neilstuartcraig opened this issue Aug 17, 2022 · 15 comments · Fixed by #2056
Assignees
Labels
api: storage Issues related to the googleapis/nodejs-storage API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@neilstuartcraig
Copy link

neilstuartcraig commented Aug 17, 2022

Environment details

  • OS: Cloud Run
  • Node.js version: 18.3.0 (ALPINE: 18.3-alpine3.15 Docker image)
  • npm version: 8.12.1
  • @google-cloud/storage version: 6.4.1

Steps to reproduce

I have a log file processing pipeline on Cloud Run/Node which is pretty well monitored and managed via CI/CD (GitHub -> Cloud Build -> Cloud Run). Dependabot raised a PR for 6.4.1 of this lib which passed all my tests and ran fine overnight on pre-live so I merged it which deploys to live. Immediately, the number of log files being rejected by the pipeline jumped and I saw that the Node app was logging errors:
gzip error: unexpected end of file (code: Z_BUF_ERROR)
The processing time per file also jumped massively (by an order of magnitude or 2).
Once I reverted this version of the lib, the errors disappeared and processing time returned to normal.

The Node app Does this:

  1. Receives a "new GCS file" push from Eventarc (HTTP lib is Fastify)
  2. Streams the new file through the processing pipeline, using gunzip-maybe
  3. Streams resultant data into BigQuery

So I think the usage of this lib is relatively simple - i am only using a few methods (via async/await):

  • setMetadata()
  • exists()
  • bucket()
  • file()
  • delete()
  • createReadStream() - probably the most likely cuplrit here, i guess

Please let me know if you have more Qs. Sorry this isn't a simple set of steps, I thought ^ was more practical in this case.

Making sure to follow these steps will guarantee the quickest resolution possible.

Thanks!

@neilstuartcraig neilstuartcraig added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Aug 17, 2022
@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/nodejs-storage API. label Aug 17, 2022
@danielbankhead danielbankhead self-assigned this Aug 17, 2022
@danielbankhead danielbankhead added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed priority: p2 Moderately-important priority. Fix may not be included in next release. labels Aug 17, 2022
@shaffeeullah
Copy link
Contributor

@neilstuartcraig what version of the library did you upgrade from when you saw this issue?

@danielbankhead
Copy link
Contributor

Hey @neilstuartcraig - could you provide a sample that replicates the issue and any additional associated logs?

@neilstuartcraig
Copy link
Author

@neilstuartcraig what version of the library did you upgrade from when you saw this issue?

Oh sorry, I forgot that. I upgraded from 6.4.0. Hopefully that helps narrow down what to look at as the changes were minimal between the two (as shown in https://github.com/googleapis/nodejs-storage/releases/tag/v6.4.1)

@neilstuartcraig
Copy link
Author

neilstuartcraig commented Aug 18, 2022

Hey @neilstuartcraig - could you provide a sample that replicates the issue and any additional associated logs?

I'm really sorry, I can't provide the logs files as they contain personal information so i'd be breaking GDPR - and if i amend the files, they obviously won't be the same. What I can say though is that they're gzipped text (nginx access logs in default "combined" format).

The code which showed the errors was:

import {default as gunzip} from "gunzip-maybe";
import {Storage} from "@google-cloud/storage";
const googleCloudStorage = new Storage();
// ...
const sourceBucket = googleCloudStorage.bucket(executionDetails.log_bucket);
const sourceFile = sourceBucket.file(executionDetails.log_filename);
// ...
sourceFile.createReadStream()
                        .on("error", async (e) => {
                            await handleStreamError(e, "GCS file readstream creation", executionDetails, log, startTSwithMS, auditLogBigQueryTable);
                            return reject(new Error(executionDetails.messages));
                        })
                        .pipe(gunzip()) // Note that this is maybe-gunzip which does a noop for non-gzipped content (which allows us to use uncompresed files for e.g. e2e tests)
                        .on("error", async (e) => {

// The below is the error I was seeing: "@handleStreamError/readstream gzip error: unexpected end of file (code: Z_BUF_ERROR)."
// Note that gunzip is actually gunzip-maybe to allow non-gzipped files to be processed as well a gzipped
                            await handleStreamError(e, "readstream gzip", executionDetails, log, startTSwithMS, auditLogBigQueryTable);
                            return reject(new Error(executionDetails.messages));
                        })
                        .pipe(es.split())
                        .on("error", async (e) => {
                            await handleStreamError(e, "readstream split", executionDetails, log, startTSwithMS, auditLogBigQueryTable);
                            return reject(new Error(executionDetails.messages));
                        })
                        .pipe(es.map(async (rawLine, mapCallback) => {
// ...

I can't easily give you all of the code as it's across a few files and won't work in isolation but I am hoping that the above will be at least a pointer. If not, please let me know and i'll see what I can do to get you something better.

Cheers

@danielbankhead
Copy link
Contributor

Hey @neilstuartcraig, after extensive debugging I'm unable to reproduce this error. Here's a working sample with a variety of test scenarios:

import gunzipMaybe = require('gunzip-maybe');
import {Storage} from '@google-cloud/storage';
import {randomBytes} from 'crypto';
import {gzipSync} from 'zlib';

const storage = new Storage();
const bucket = storage.bucket(/* insert bucket name for testing */);
const testFile = bucket.file('test-gzip-object');

async function setupCase1() {
  // random data
  await testFile.save(randomBytes(1024));
  console.log('uploaded - setupCase1');
}

async function setupCase2() {
  // random data, upload pre-gzipped
  await testFile.save(gzipSync(randomBytes(1024)));
  console.log('uploaded - setupCase2');
}

async function setupCase3() {
  // random data, gzipped on upload
  await testFile.save(randomBytes(1024), {gzip: true});
  console.log('uploaded - setupCase3');
}

async function returnDecompressed() {
  console.log('reading from stream...');
  await new Promise((resolve, reject) => {
    testFile
      .createReadStream()
      .on('error', reject)
      .pipe(gunzipMaybe())
      .on('data', d => console.log('got data', d.byteLength))
      .on('end', resolve)
      .on('error', reject);
  });
  console.log('finished reading from stream.');
}

async function main() {
  await setupCase1();
  await returnDecompressed();

  await setupCase2();
  await returnDecompressed();

  await setupCase3();
  await returnDecompressed();

  console.log('done');
}

main();

Do you have any other information that would help us debug this issue for you?

@danielbankhead
Copy link
Contributor

danielbankhead commented Aug 24, 2022

Hey @neilstuartcraig, do you have any other information that would help us debug this issue for you?

Since my last message, I expanded the test cases and verified a number of additional scenarios:

import gunzipMaybe = require('gunzip-maybe');
import {Storage} from '@google-cloud/storage';
import {randomBytes} from 'crypto';
import {gzipSync} from 'zlib';

const storage = new Storage();
const bucket = storage.bucket(/* insert bucket name for testing */);
const testFile = bucket.file('test-gzip-object');

async function setupCase1() {
  // random data
  await testFile.save(randomBytes(1024));
  console.log('uploaded - setupCase1');
}

async function setupCase2() {
  // random data, upload pre-gzipped
  await testFile.save(gzipSync(randomBytes(1024)));
  console.log('uploaded - setupCase2');
}

async function setupCase3() {
  // random data, gzipped on upload
  await testFile.save(randomBytes(1024), {gzip: true});
  console.log('uploaded - setupCase3');
}

async function setupCase4() {
  // random data, upload pre-gzipped w/ Content-Encoding
  await testFile.save(gzipSync(randomBytes(1024)), {
    metadata: {
      contentEncoding: 'gzip',
    },
  });
  console.log('uploaded - setupCase4');
}

async function setupCase5() {
  // random data, upload pre-gzipped w/ Content-Encoding + Content-Type
  await testFile.save(gzipSync(randomBytes(1024)), {
    metadata: {
      contentEncoding: 'gzip',
    },
    contentType: 'application/gzip',
  });
  console.log('uploaded - setupCase5');
}

async function setupCase6() {
  // multiple buffers of random data, concat & uploaded pre-gzipped
  await testFile.save(
    Buffer.concat([gzipSync(randomBytes(1024)), gzipSync(randomBytes(1024))]),
    {
      metadata: {
        contentEncoding: 'gzip',
      },
    }
  );
  console.log('uploaded - setupCase6');
}

async function setupCase7(list = ['']) {
  // list of strings to concat, concat & uploaded pre-gzipped
  const data = Buffer.concat(list.map(str => gzipSync(str)));

  await testFile.save(data, {
    metadata: {
      contentEncoding: 'gzip',
    },
  });
  console.log('uploaded - setupCase7');
}

async function returnDecompressed() {
  console.log('reading from stream...');
  let buffer = Buffer.alloc(0);

  await new Promise((resolve, reject) => {
    testFile
      .createReadStream()
      .on('error', reject)
      .pipe(gunzipMaybe())
      .on('data', d => {
        console.log('got data', d.byteLength);
        buffer = Buffer.concat([buffer, d]);
      })
      .on('end', resolve)
      .on('error', reject);
  });
  console.log('finished reading from stream.');

  return buffer;
}

async function main() {
  await setupCase1();
  await returnDecompressed();

  await setupCase2();
  await returnDecompressed();

  await setupCase3();
  await returnDecompressed();

  await setupCase4();
  await returnDecompressed();

  await setupCase5();
  await returnDecompressed();

  await setupCase6();
  await returnDecompressed();

  const list = ['a', 'b'];
  await setupCase7(list);
  const results = await returnDecompressed();
  console.log(list, results.toString());

  console.log('done');
}

main();

@danielbankhead
Copy link
Contributor

danielbankhead commented Aug 24, 2022

A diff link for reference: v6.4.0...v6.4.1

@neilstuartcraig
Copy link
Author

Thanks so much for digging in to this @danielbankhead - I really appreciate it.
I am not really sure what I can provide to you which might help, I could give you source code but it won't work in isolation, you'd have to set up the whole environment in Terraform which is no small undertaking!
Would it help if I provided the source code and some data format examples? For the data, I am mainly thinking the format that comes in from the Eventarc/PubSub HTTP push as a result of object.finalize in GCS - I guess you're probably pretty familiar with that.
I could try deploying 6.4.1 again I guess, I'm just hesitant as this is a production system which processes our www and media CDN access logs so it's our eyes on the world to a fair extent.
For ref, here's what happened last time:

Overall log processing status - the purple dashed line indicates the time we upgraded to 6.4.1, the right-hand dashed yellow line indicates the rollback. The upgrade to 6.4.1 was the only change (see GitHub screengrab below it):
image

image

Let me know if you think the source code and/or re-trying the upgrade would help.
Cheers

@danielbankhead
Copy link
Contributor

Thanks! And no problem - no need to share source code or sensitive information 🙂

Other places to check:

  • Did any other dependencies in your dependency change between the two versions for you? This can be assessed via npm ls --all or by comparing package-lock.json before and after.
    • I wonder if removing pumpify between releases allowed another package to pick a different version of pumpify
  • The processing time per file also jumped massively (by an order of magnitude or 2).

@danielbankhead danielbankhead added priority: p2 Moderately-important priority. Fix may not be included in next release. and removed priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Aug 26, 2022
@neilstuartcraig
Copy link
Author

I'm on leave at the moment but just had a look at the dependabot PR and the package-lock.json diff looks like this:
image
So yeah, I think from a quick glance, that looks like pumpify was affected. I am not familiar enough with the details to know for sure but maybe you are? I can get you the before/after raw files if that helps?

I don't have any custom timings for the app because they're expensive to store - the app processes several billion log lines per day on a modest budget so i have to limit what i log, unfortunately.

Cheers!

@danielbankhead
Copy link
Contributor

A few other places to check:

  • How are the objects uploaded? Is it possible somehow some objects were legitimately gzipped incorrectly (and we're looking at a symptom of another bug)?
  • Are there any other variables different between your pre-live and live environments (that you can share)?
  • Is it possible any other software updates were included in the deploy? Say, any subtle dependency changes that were included in the deployed container?

Additionally, does your company have a support contract with Google? If so, we have an opportunity to dive further with additional tools & resources.

@neilstuartcraig
Copy link
Author

How are the objects uploaded? Is it possible somehow some objects were legitimately gzipped incorrectly (and we're looking at a symptom of another bug)?

Deploys to Cloud Run are done via GitHub -> Cloud Build (Terraform) -> Cloud Run. Hopefully this means things should be working as intended by Google Cloud.

Are there any other variables different between your pre-live and live environments (that you can share)?

Only some peripheral elements such as log verbosity and number of logs. The app code itself is identical (assuming we're not mid-deploy with dev being ahead of prod as we roll out).

Is it possible any other software updates were included in the deploy? Say, any subtle dependency changes that were included in the deployed container?

I don't think so, the Dockerfile is identical and I lock npm dependency versions to exact (rather than semver range) to make deploys reproducible.

Feels like it might be what you suggested previously, a quirk of dependency chains. I am wondering if I ought to try the upgrade again and keep an eye on it so I can roll back quickly if needed. Not sure there's much of a way to debug without a super deep dive.

Additionally, does your company have a support contract with Google? If so, we have an opportunity to dive further with additional tools & resources.

Yes, we do. I think I'm right in saying we have enterprise support. Happy to engage via that route if it's preferable.

Thanks again for all your time and help with this.

@danielbankhead
Copy link
Contributor

Thanks for the additional information @neilstuartcraig, I believe #2055 is related to the issue you are facing.

I will do some more digging and resolve this for you.

@danielbankhead danielbankhead added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed priority: p2 Moderately-important priority. Fix may not be included in next release. labels Aug 31, 2022
@danielbankhead
Copy link
Contributor

Hey @neilstuartcraig, thanks for your patience - we were finally able to reproduce the issue and a fix has been merged. The next release will public shortly:

To better mitigate and catch issues like this in testing in the future we're planning to implement a revamp of our stream implementation in the next major release:

@neilstuartcraig
Copy link
Author

Hey @neilstuartcraig, thanks for your patience - we were finally able to reproduce the issue and a fix has been merged. The next release will public shortly:

* [chore(main): release 6.4.2 #2050](https://github.com/googleapis/nodejs-storage/pull/2050)

To better mitigate and catch issues like this in testing in the future we're planning to implement a revamp of our stream implementation in the next major release:

* [refactor(deps)!: Remove `duplexify` dependency + Revamp Stream Implementation #2041](https://github.com/googleapis/nodejs-storage/issues/2041)

Sorry for the delay in reply...busy times!
I rolled this out to dev and prod environments a few days back and it looks good. Thanks so much for your work and time on this, I really appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/nodejs-storage API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants