Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize gtf by only parsing lazily per-refName #2927

Merged
merged 1 commit into from
Apr 21, 2022
Merged

Conversation

cmdcolin
Copy link
Collaborator

Possible fix for #2681

loading a random location on chr1 for a 19MB gtf.gz file from UCSC

This branch, ~10s
Main branch, ~48s

approx tracing differences

this branch
Screenshot from 2022-04-18 18-18-39

main branch
Screenshot from 2022-04-18 18-17-28

@github-actions github-actions bot added the needs label triage Needs a label to show in changelog (breaking, enhancement, bug, documentation, or internal) label Apr 19, 2022
@cmdcolin
Copy link
Collaborator Author

Note that the 19mb gzip gtf unzips to 329mb of unzipped text so it ends up being kinda big data.

it actually comes close to the maximum string size in chrome (512MB). if we need to handle bigger than 512 then we would want to be careful about not converting the buffer directly to string via TextDecoder (silly blogpost I made on subject awhile back https://cmdcolin.github.io/posts/2021-10-30-spooky)

@cmdcolin cmdcolin added enhancement New feature or request performance and removed needs label triage Needs a label to show in changelog (breaking, enhancement, bug, documentation, or internal) labels Apr 19, 2022
@cmdcolin cmdcolin merged commit af18393 into main Apr 21, 2022
@cmdcolin cmdcolin deleted the optimize_gtf branch April 21, 2022 15:57
@cmdcolin cmdcolin mentioned this pull request Apr 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant