8x speed-up by buffering of InputStream during reading of uncompressed files #49
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I ran some benchmarks for the BinaryCIF project/format and in comparison the Java implementation of the MMTF codec was surprisingly slow. Especially when uncompressed (non-gzipped) files were processed. Find benchmark details in the RCSB internal
ciftools-performance
repo.By employing a
BufferedInputStream
with 65536 buffer size the performance can be improved drastically, resulting in a traversal of the currently 154k structures in 70 s (10 minutes with the current code).For comparison, read times for BinaryCIF and mmCIF parsing are given (which should be slower due to higher overhead). A performance increase for gzipped files can be expected by using a
GZIPInputStream
with an equally sized buffer of 65536 (in contrast to the default buffer of 512 bytes).