Skip to content
afiedler edited this page Jan 28, 2011 · 9 revisions

Note: Not scientific yet. Source is a 1.5 GB CSV file, but many records are filtered out with a TokenFilter. Likely 75% are filtered.

Test Number Chunk Size Compression Index Step Output size
1 10,000 Zlib 6 10,000 124,868 KB
2 4,096 Zlib 6 10,000 124,776 KB
3 10,000 Zlib 6 100,000 124,868 KB
4 4,096 Zlib 6 100,000 124,776 KB
5 100,000 Zlib 6 100,000 126,274 KB
Clone this wiki locally