You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Homo_sapiens_assembly19.fasta 3140756381 bytes (100%)
Homo_sapiens_assembly19.fasta.bgzip 870590559 bytes (27%) - samtools faidx creates 2 index files fai and gzi (both are required for fast lookups)
Homo_sapiens_assembly19.fasta.gz 822227249 bytes (26% - using best compression ratio)
Homo_sapiens_assembly19.2bit 775499486 bytes (24% of fasta, 89% of bgzip)
Homo_sapiens_assembly19.2bit.bgzip 684107046 bytes (21%) - we'd probably need a new index for this
Based on this I think we should go for bgzip + index. @droazen@lbergelson WDYT? @tomwhite what's the right way to broadcast a block gzipped file with 2 index files?
It's possible that 2-bit isn't gaining us much efficiency. If we could read gzip fasta we might be able to use that instead for less complexity.
We might have to maintain it zipped in memory though, which could be complicated.
The text was updated successfully, but these errors were encountered: