Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically generate and cache minimap2 indexes to eliminate redundant indexing overhead #39

Closed
bede opened this issue Jun 14, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@bede
Copy link
Owner

bede commented Jun 14, 2024

When initially implementing long read support, I was unable to demonstrate significantly reduced execution time versus recreating the index from scratch every time hostile clean is called. Using a prebuilt index was only marginally quicker and not worth the complexity of managing indexes. However, recently I tested whether this is still the case and observed that running hostile clean on a small long read fastq drops from taking ~45s to ~7s through use of a precomputed index.

This behaviour should first be characterised / verified on Linux and MacOS. Assuming the performance benefits are replicated on both OSs, adding invisible (but suitably logged) index caching and reuse should be done unless a good reason not to do so becomes apparent.

This will dramatically reduce execution time for processing many long read samples where this redundant indexing overhead is a nuisance.

@bede
Copy link
Owner Author

bede commented Dec 16, 2024

Merged into main, pending release

@bede
Copy link
Owner Author

bede commented Dec 19, 2024

Released in 2.0.0

@bede bede closed this as completed Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant