Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: add zstd compression support #1342

Closed
aborruso opened this issue Jul 21, 2023 · 4 comments
Closed

Feature request: add zstd compression support #1342

aborruso opened this issue Jul 21, 2023 · 4 comments
Assignees

Comments

@aborruso
Copy link
Contributor

aborruso commented Jul 21, 2023

Miller is the data tool I use the most. Another tool that I use a lot is duckdb.

It supports zstd (and gzip) compressed csv. ZSTD compression and decompression can be extremely fast. I compress a 4.5 GB CSV file in 3 seconds (I have 16 GB of ram and 12th Gen Intel(R) Core(TM) i7-1280P 2.00 GHz).
The output is a 160 MB compressed csv file.
And it's possible to run a duckdb SUMMARIZE on it in 8.5 seconds.
The CSV has 1745439 rows and 199 columns.

A big credit goes to duckdb, but part of the credit goes to this compression format.

This issue to ask enable it in Miller compressed data.

Thank you

@aborruso
Copy link
Contributor Author

aborruso commented Aug 2, 2023

What do you think about this @johnkerl ?

Thank you

@johnkerl johnkerl changed the title feature request: add zstd compression support Feature request: add zstd compression support Aug 19, 2023
@johnkerl
Copy link
Owner

johnkerl commented Aug 19, 2023

@aborruso for comparison let's first look at gzip. There are two ways to get gzip: --prepipe gunzip and --gzin. The first one is flexible: you get to specify the executable. The second one is done in-process and it requires support from the Go library: https://pkg.go.dev/compress/gzip

Now for zstd. If there is an executable for that, you can do --prepipe zstd. To implement --zstdin we'd need a Go library for handling zstd data. But https://pkg.go.dev/compress does not have one. There may be some other place to get a Go library that does zstd: for example https://pkg.go.dev/github.com/klauspost/compress/zstd.

@johnkerl
Copy link
Owner

johnkerl commented Aug 19, 2023

@aborruso can you check out head and try this?
#1360

No worries if not; please let me know ...

Also you can take a peek at head docs here:
https://miller.readthedocs.io/en/main/reference-main-compressed-data/#compressed-data

@aborruso
Copy link
Contributor Author

@aborruso can you check out head and try this?

Wow, it works great. I was already using zstd with the prepipe, but it seemed very convenient and important for Miller to support it natively and directly.
I think it's becoming a "standard" in the context of compressed structured text data.

Thank you very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants