Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make reg2bins, reg2intervals faster on whole-chromosome queries #1596

Merged
merged 2 commits into from
Apr 13, 2023

Conversation

daviesrob
Copy link
Member

It's faster to iterate through the index hash table when requesting whole chromosomes, compared to the old behaviour of iterating through all the bins that could exist and looking them up in the hash table to see if they're present. The latter method works better for narrow ranges though, so we choose which to use based on the number of bins covering the range compared to the number in the index.

The speed-up is most notable on CSI-indexed BED files, which since #1506 have used eight-level indexes. Iterating through all the unused bins took around 0.2s in my testing. With this fix applied, the same query can be done in a few milliseconds.

It's faster to iterate through the index hash table when
requesting whole chromosomes, compared to the old behaviour
of iterating through all the bins that could exist and looking
them up in the hash table to see if they're present.  The latter
method works better for narrow ranges though, so we choose which
to use based on the number of bins covering the range compared
to the number in the index.
As for reg2bins(), it may be faster to iterate through the hash
table entries, depending on the number and how wide the region
being searched is.
@jkbonfield jkbonfield merged commit 07638e1 into samtools:develop Apr 13, 2023
@daviesrob daviesrob deleted the faster_region_search branch April 18, 2023 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants