-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: set keyword argument so zipfile actually compresses #21144
Conversation
Codecov Report
@@ Coverage Diff @@
## master #21144 +/- ##
=======================================
Coverage 91.84% 91.84%
=======================================
Files 153 153
Lines 49505 49505
=======================================
Hits 45466 45466
Misses 4039 4039
Continue to review full report at Codecov.
|
if mode in ['wb', 'rb']: | ||
mode = mode.replace('b', '') | ||
super(BytesZipFile, self).__init__(file, mode, **kwargs) | ||
super(BytesZipFile, self).__init__(file, mode, compression, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add tests and a whatsnew
for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, because you are modifying the default behavior, I'm not sure if we need a deprecation cycle for this (to be safe, we should I would imagine).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no this is a bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, though tests and whatsnew
are still needed (just to be clear).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. added whatsnew and tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add a whatsnew & a test (not sure what that should look like)
pandas/tests/frame/test_to_csv.py
Outdated
@@ -943,6 +943,22 @@ def test_to_csv_compression(self, compression): | |||
with tm.decompress_file(filename, compression) as fh: | |||
assert_frame_equal(df, read_csv(fh, index_col=0)) | |||
|
|||
def test_to_csv_compression_size(self, compression): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are all the same tests I think it makes more sense to put in test_common
and parametrize for the different writers, rather than splitting out across the various modules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense. done.
pandas/tests/test_common.py
Outdated
s = df.iloc[:, 0] | ||
|
||
with tm.ensure_clean() as filename: | ||
for obj in [df, s]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can even parametrize the Series and Frame instead of a loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
pandas/tests/test_common.py
Outdated
file_size = os.path.getsize(filename) | ||
getattr(obj, method)(filename, compression=None) | ||
uncompressed_file_size = os.path.getsize(filename) | ||
if compression: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't need this conditional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. though had to skip or xfail when compression==None. the fixture is shared across other tests so no need to change fixture.
pandas/tests/test_common.py
Outdated
|
||
|
||
@pytest.mark.parametrize('frame', [ | ||
pd.concat(100 * [DataFrame([[0.123456, 0.234567, 0.567567], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using concat just multiply your list of lists by 100 within the constructor - will be more performant and idiomatic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point.
pandas/tests/test_common.py
Outdated
@@ -222,3 +224,22 @@ def test_standardize_mapping(): | |||
|
|||
dd = collections.defaultdict(list) | |||
assert isinstance(com.standardize_mapping(dd), partial) | |||
|
|||
|
|||
@pytest.mark.parametrize('frame', [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is either a frame or a series don't use frame
as the variable name, as that's slightly confusing in case of the former being passed. obj
should be fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - will see what others say but thanks for the PR!
doc/source/whatsnew/v0.23.1.txt
Outdated
@@ -83,6 +83,7 @@ Indexing | |||
I/O | |||
^^^ | |||
|
|||
- Bug in :class:`pandas.io.common.BytesZipFile` where zip compression produces uncompressed zip archive (:issue:`17778`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you reference 21144 (as well is fine) as that other issue was closed for 0.23.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pytest.skip("only test compression case.") | ||
|
||
with tm.ensure_clean() as filename: | ||
getattr(obj, method)(filename, compression=compression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this fail under 0.23.0? (and works here clearly)
thanks @minggli ! |
happy to help! |
(cherry picked from commit c85ab08)
git diff upstream/master -u -- "*.py" | flake8 --diff
zipfile.ZipFile has default compression mode
zipfile.ZIP_STORED
. It creates an uncompressed archive member. Whilst it doesn't cause issue, it is a strange default to have given users would want to compress files.In order for zip compression to actually reduce file size, keyword argument
compression=zipfile.ZIP_DEFLATED
is added.