Feature request: add zstd compression support #1342

aborruso · 2023-07-21T13:57:03Z

Miller is the data tool I use the most. Another tool that I use a lot is duckdb.

It supports zstd (and gzip) compressed csv. ZSTD compression and decompression can be extremely fast. I compress a 4.5 GB CSV file in 3 seconds (I have 16 GB of ram and 12th Gen Intel(R) Core(TM) i7-1280P 2.00 GHz).
The output is a 160 MB compressed csv file.
And it's possible to run a duckdb SUMMARIZE on it in 8.5 seconds.
The CSV has 1745439 rows and 199 columns.

A big credit goes to duckdb, but part of the credit goes to this compression format.

This issue to ask enable it in Miller compressed data.

Thank you

aborruso · 2023-08-02T13:28:52Z

What do you think about this @johnkerl ?

Thank you

johnkerl · 2023-08-19T18:13:09Z

@aborruso for comparison let's first look at gzip. There are two ways to get gzip: --prepipe gunzip and --gzin. The first one is flexible: you get to specify the executable. The second one is done in-process and it requires support from the Go library: https://pkg.go.dev/compress/gzip

Now for zstd. If there is an executable for that, you can do --prepipe zstd. To implement --zstdin we'd need a Go library for handling zstd data. But https://pkg.go.dev/compress does not have one. There may be some other place to get a Go library that does zstd: for example https://pkg.go.dev/github.com/klauspost/compress/zstd.

johnkerl · 2023-08-19T19:23:39Z

@aborruso can you check out head and try this?
#1360

No worries if not; please let me know ...

Also you can take a peek at head docs here:
https://miller.readthedocs.io/en/main/reference-main-compressed-data/#compressed-data

aborruso · 2023-08-20T06:18:59Z

@aborruso can you check out head and try this?

Wow, it works great. I was already using zstd with the prepipe, but it seemed very convenient and important for Miller to support it natively and directly.
I think it's becoming a "standard" in the context of compressed structured text data.

Thank you very much

johnkerl changed the title ~~feature request: add zstd compression support~~ Feature request: add zstd compression support Aug 19, 2023

johnkerl self-assigned this Aug 19, 2023

johnkerl added the active label Aug 19, 2023

johnkerl mentioned this issue Aug 19, 2023

Support ZSTD compression in-process #1360

Merged

johnkerl added pending feedback to close and removed active labels Aug 19, 2023

aborruso closed this as completed Aug 20, 2023

johnkerl removed the pending feedback to close label Aug 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: add zstd compression support #1342

Feature request: add zstd compression support #1342

aborruso commented Jul 21, 2023 •

edited

Loading

aborruso commented Aug 2, 2023

johnkerl commented Aug 19, 2023 •

edited

Loading

johnkerl commented Aug 19, 2023 •

edited

Loading

aborruso commented Aug 20, 2023

Feature request: add zstd compression support #1342

Feature request: add zstd compression support #1342

Comments

aborruso commented Jul 21, 2023 • edited Loading

aborruso commented Aug 2, 2023

johnkerl commented Aug 19, 2023 • edited Loading

johnkerl commented Aug 19, 2023 • edited Loading

aborruso commented Aug 20, 2023

aborruso commented Jul 21, 2023 •

edited

Loading

johnkerl commented Aug 19, 2023 •

edited

Loading

johnkerl commented Aug 19, 2023 •

edited

Loading