-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more compression types for to_json
#3551
Conversation
@lhoestq, I looked into how to compress with
How |
|
Definitely, @lhoestq! I've adapted that from original code and turns out it is faster than |
One small thing, currently I'm assuming that user will provide compression extension in |
Thanks !
I think it's fine as it is right now :) No need to check the extension of the filename passed to |
I think the default compression level of |
I found that It also has Since Let me know if you prefer using |
Just tried |
@@ -255,3 +257,15 @@ def test_dataset_to_json_orient_invalidproc(self, dataset): | |||
with pytest.raises(ValueError): | |||
with io.BytesIO() as buffer: | |||
JsonDatasetWriter(dataset, buffer, num_proc=0) | |||
|
|||
@pytest.mark.parametrize("compression, extension", [("gzip", "gz"), ("bz2", "bz2"), ("xz", "xz")]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somehow gzip
test is failing due to few mismatches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update: instead of reading compressed files and comparing them directly, I uncompressed them using fsspec
and then compared. The bug went away!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks ! Just adding an error message in case someone passes compression
for a buffer or file-like object:
This PR adds
bz2
,xz
, andzip
(WIP) forto_json
. I also plan to addinfer
like howpandas
does it