Make reg2bins, reg2intervals faster on whole-chromosome queries #1596

daviesrob · 2023-03-30T10:22:59Z

It's faster to iterate through the index hash table when requesting whole chromosomes, compared to the old behaviour of iterating through all the bins that could exist and looking them up in the hash table to see if they're present. The latter method works better for narrow ranges though, so we choose which to use based on the number of bins covering the range compared to the number in the index.

The speed-up is most notable on CSI-indexed BED files, which since #1506 have used eight-level indexes. Iterating through all the unused bins took around 0.2s in my testing. With this fix applied, the same query can be done in a few milliseconds.

It's faster to iterate through the index hash table when requesting whole chromosomes, compared to the old behaviour of iterating through all the bins that could exist and looking them up in the hash table to see if they're present. The latter method works better for narrow ranges though, so we choose which to use based on the number of bins covering the range compared to the number in the index.

As for reg2bins(), it may be faster to iterate through the hash table entries, depending on the number and how wide the region being searched is.

daviesrob added 2 commits March 30, 2023 10:10

Make reg2intervals() faster on whole-chromosome queries

678c0ab

As for reg2bins(), it may be faster to iterate through the hash table entries, depending on the number and how wide the region being searched is.

whitwham assigned jkbonfield Apr 4, 2023

jkbonfield merged commit 07638e1 into samtools:develop Apr 13, 2023

daviesrob deleted the faster_region_search branch April 18, 2023 12:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make reg2bins, reg2intervals faster on whole-chromosome queries #1596

Make reg2bins, reg2intervals faster on whole-chromosome queries #1596

daviesrob commented Mar 30, 2023

Make reg2bins, reg2intervals faster on whole-chromosome queries #1596

Make reg2bins, reg2intervals faster on whole-chromosome queries #1596

Conversation

daviesrob commented Mar 30, 2023