-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v6.4.1 seems to introduce a bug with gzipped files #2044
Comments
@neilstuartcraig what version of the library did you upgrade from when you saw this issue? |
Hey @neilstuartcraig - could you provide a sample that replicates the issue and any additional associated logs? |
Oh sorry, I forgot that. I upgraded from 6.4.0. Hopefully that helps narrow down what to look at as the changes were minimal between the two (as shown in https://github.com/googleapis/nodejs-storage/releases/tag/v6.4.1) |
I'm really sorry, I can't provide the logs files as they contain personal information so i'd be breaking GDPR - and if i amend the files, they obviously won't be the same. What I can say though is that they're gzipped text (nginx access logs in default "combined" format). The code which showed the errors was:
I can't easily give you all of the code as it's across a few files and won't work in isolation but I am hoping that the above will be at least a pointer. If not, please let me know and i'll see what I can do to get you something better. Cheers |
Hey @neilstuartcraig, after extensive debugging I'm unable to reproduce this error. Here's a working sample with a variety of test scenarios: import gunzipMaybe = require('gunzip-maybe');
import {Storage} from '@google-cloud/storage';
import {randomBytes} from 'crypto';
import {gzipSync} from 'zlib';
const storage = new Storage();
const bucket = storage.bucket(/* insert bucket name for testing */);
const testFile = bucket.file('test-gzip-object');
async function setupCase1() {
// random data
await testFile.save(randomBytes(1024));
console.log('uploaded - setupCase1');
}
async function setupCase2() {
// random data, upload pre-gzipped
await testFile.save(gzipSync(randomBytes(1024)));
console.log('uploaded - setupCase2');
}
async function setupCase3() {
// random data, gzipped on upload
await testFile.save(randomBytes(1024), {gzip: true});
console.log('uploaded - setupCase3');
}
async function returnDecompressed() {
console.log('reading from stream...');
await new Promise((resolve, reject) => {
testFile
.createReadStream()
.on('error', reject)
.pipe(gunzipMaybe())
.on('data', d => console.log('got data', d.byteLength))
.on('end', resolve)
.on('error', reject);
});
console.log('finished reading from stream.');
}
async function main() {
await setupCase1();
await returnDecompressed();
await setupCase2();
await returnDecompressed();
await setupCase3();
await returnDecompressed();
console.log('done');
}
main(); Do you have any other information that would help us debug this issue for you? |
Hey @neilstuartcraig, do you have any other information that would help us debug this issue for you? Since my last message, I expanded the test cases and verified a number of additional scenarios: import gunzipMaybe = require('gunzip-maybe');
import {Storage} from '@google-cloud/storage';
import {randomBytes} from 'crypto';
import {gzipSync} from 'zlib';
const storage = new Storage();
const bucket = storage.bucket(/* insert bucket name for testing */);
const testFile = bucket.file('test-gzip-object');
async function setupCase1() {
// random data
await testFile.save(randomBytes(1024));
console.log('uploaded - setupCase1');
}
async function setupCase2() {
// random data, upload pre-gzipped
await testFile.save(gzipSync(randomBytes(1024)));
console.log('uploaded - setupCase2');
}
async function setupCase3() {
// random data, gzipped on upload
await testFile.save(randomBytes(1024), {gzip: true});
console.log('uploaded - setupCase3');
}
async function setupCase4() {
// random data, upload pre-gzipped w/ Content-Encoding
await testFile.save(gzipSync(randomBytes(1024)), {
metadata: {
contentEncoding: 'gzip',
},
});
console.log('uploaded - setupCase4');
}
async function setupCase5() {
// random data, upload pre-gzipped w/ Content-Encoding + Content-Type
await testFile.save(gzipSync(randomBytes(1024)), {
metadata: {
contentEncoding: 'gzip',
},
contentType: 'application/gzip',
});
console.log('uploaded - setupCase5');
}
async function setupCase6() {
// multiple buffers of random data, concat & uploaded pre-gzipped
await testFile.save(
Buffer.concat([gzipSync(randomBytes(1024)), gzipSync(randomBytes(1024))]),
{
metadata: {
contentEncoding: 'gzip',
},
}
);
console.log('uploaded - setupCase6');
}
async function setupCase7(list = ['']) {
// list of strings to concat, concat & uploaded pre-gzipped
const data = Buffer.concat(list.map(str => gzipSync(str)));
await testFile.save(data, {
metadata: {
contentEncoding: 'gzip',
},
});
console.log('uploaded - setupCase7');
}
async function returnDecompressed() {
console.log('reading from stream...');
let buffer = Buffer.alloc(0);
await new Promise((resolve, reject) => {
testFile
.createReadStream()
.on('error', reject)
.pipe(gunzipMaybe())
.on('data', d => {
console.log('got data', d.byteLength);
buffer = Buffer.concat([buffer, d]);
})
.on('end', resolve)
.on('error', reject);
});
console.log('finished reading from stream.');
return buffer;
}
async function main() {
await setupCase1();
await returnDecompressed();
await setupCase2();
await returnDecompressed();
await setupCase3();
await returnDecompressed();
await setupCase4();
await returnDecompressed();
await setupCase5();
await returnDecompressed();
await setupCase6();
await returnDecompressed();
const list = ['a', 'b'];
await setupCase7(list);
const results = await returnDecompressed();
console.log(list, results.toString());
console.log('done');
}
main(); |
A diff link for reference: v6.4.0...v6.4.1 |
Thanks so much for digging in to this @danielbankhead - I really appreciate it. Overall log processing status - the purple dashed line indicates the time we upgraded to 6.4.1, the right-hand dashed yellow line indicates the rollback. The upgrade to 6.4.1 was the only change (see GitHub screengrab below it): Let me know if you think the source code and/or re-trying the upgrade would help. |
Thanks! And no problem - no need to share source code or sensitive information 🙂 Other places to check:
|
A few other places to check:
Additionally, does your company have a support contract with Google? If so, we have an opportunity to dive further with additional tools & resources. |
Deploys to Cloud Run are done via GitHub -> Cloud Build (Terraform) -> Cloud Run. Hopefully this means things should be working as intended by Google Cloud.
Only some peripheral elements such as log verbosity and number of logs. The app code itself is identical (assuming we're not mid-deploy with dev being ahead of prod as we roll out).
I don't think so, the Dockerfile is identical and I lock npm dependency versions to exact (rather than semver range) to make deploys reproducible. Feels like it might be what you suggested previously, a quirk of dependency chains. I am wondering if I ought to try the upgrade again and keep an eye on it so I can roll back quickly if needed. Not sure there's much of a way to debug without a super deep dive.
Yes, we do. I think I'm right in saying we have enterprise support. Happy to engage via that route if it's preferable. Thanks again for all your time and help with this. |
Thanks for the additional information @neilstuartcraig, I believe #2055 is related to the issue you are facing. I will do some more digging and resolve this for you. |
Hey @neilstuartcraig, thanks for your patience - we were finally able to reproduce the issue and a fix has been merged. The next release will public shortly: To better mitigate and catch issues like this in testing in the future we're planning to implement a revamp of our stream implementation in the next major release: |
Sorry for the delay in reply...busy times! |
Environment details
@google-cloud/storage
version: 6.4.1Steps to reproduce
I have a log file processing pipeline on Cloud Run/Node which is pretty well monitored and managed via CI/CD (GitHub -> Cloud Build -> Cloud Run). Dependabot raised a PR for 6.4.1 of this lib which passed all my tests and ran fine overnight on pre-live so I merged it which deploys to live. Immediately, the number of log files being rejected by the pipeline jumped and I saw that the Node app was logging errors:
gzip error: unexpected end of file (code: Z_BUF_ERROR)
The processing time per file also jumped massively (by an order of magnitude or 2).
Once I reverted this version of the lib, the errors disappeared and processing time returned to normal.
The Node app Does this:
So I think the usage of this lib is relatively simple - i am only using a few methods (via async/await):
Please let me know if you have more Qs. Sorry this isn't a simple set of steps, I thought ^ was more practical in this case.
Making sure to follow these steps will guarantee the quickest resolution possible.
Thanks!
The text was updated successfully, but these errors were encountered: