-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Support passing compression args to gzip and bz2 #33398
Conversation
One question to consider: Instead of providing the ability to provide compression level args on a case-by case basis depending on protocol (originally just zip, now gzip and bz2 as well), would it be better to simply provide this ability to all zip protocols? I don't see anything in the code that prevents this--we would simply add |
d3fe3fb
to
a7ee476
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable
pandas/tests/io/test_compression.py
Outdated
) | ||
@pytest.mark.parametrize("method", ["to_pickle", "to_json", "to_csv"]) | ||
def test_gzip_compression_level_path(obj, method): | ||
"""GH#33398 Ideally this test should be repeated for bz2 as well, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there another test we can do to ensure we pass the keyword args through? If even just that the call works I think OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can have a test that simply asserts that the call succeeds without asserting anything about size. I'll add.
Hello @jessefarnham! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-04-10 20:39:31 UTC |
pandas/io/common.py
Outdated
@@ -397,16 +403,18 @@ def get_handle( | |||
# GZ Compression | |||
if compression == "gzip": | |||
if is_path: | |||
f = gzip.open(path_or_buf, mode) | |||
f = gzip.open(path_or_buf, mode, **compression_args) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I added these and the below mypy ignore flags. The problem is that we're passing kwargs to functions that don't take kwargs, which is not typesafe. Both of the alternatives I could think of seem worse:
- Changing the called function signatures to take an unused **kwargs parameter.
- Explicitly extracting each parameter from compression_args and passing it individually, thereby needing to duplicate the default parameter values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good! just a comment about docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other comments notwithstanding, LGTM. Nice work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm @jreback
@jessefarnham looks like there is a merge conflict - can you fix that up? |
Fixed |
@WillAyd Conflict resolved, but check seems to be failing because linting on master is currently broken |
Gotcha; should be fixed in #33461 so can re-merge after that gets into master |
This commit closes GH#33196 but takes a more generic approach than the suggested solution. Instead of providing a 'fast' kwarg or global compression level setting, this commit extends the ability to pass compression settings as a dict to the gzip and bz2 compression methods. In this way, if the user wants faster compression, they can pass compression={'method': 'gzip', 'compresslevel'=1} rather than just compression='gzip'. Note: For the API to be consistent when passing paths vs. filelikes, GZipFile and gzip2.open() must accept the same kwargs.
All set, ready for merge to master. |
thanks @jessefarnham very nice! |
This commit closes #33196 but takes a more generic approach than the suggested
solution. Instead of providing a 'fast' kwarg or global compression level
setting, this commit extends the ability to pass compression settings as a
dict to the gzip and bz2 compression methods. In this way, if the user
wants faster compression, they can pass
compression={'method': 'gzip', 'compresslevel'=1} rather than
just compression='gzip'.
Note: For the API to be consistent when passing paths vs. filelikes, GZipFile and gzip2.open() must accept the same kwargs.
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff