-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tune CADD output size #57
Comments
Going through the windows in random order (after shuffling) does not help. I suspect zstd is smart enough about building local dictionaries. Next attempt will be to store the raw TSV lines and convert them on the fly. |
The size goes down from 426GB to 358GB (bgzip-ed is 252GB). This is still a 40% increase in data. I'll merge #58 for now but keep this ticket open. |
Re-opening as compression does not work well enough yet. |
Is your feature request related to a problem? Please describe.
At the current compression, the rocksdb for CADD clocks in at 426GB. The bgzip-ed TSV files only have 252GB. This is clearly too much.
Describe the solution you'd like
Consider other encoding strategies.
Describe alternatives you've considered
N/A
Additional context
N/A
The text was updated successfully, but these errors were encountered: