Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: speed up TRRecord.GetLengthGenotypes() #187

Merged
merged 10 commits into from
Dec 21, 2023
Merged

perf: speed up TRRecord.GetLengthGenotypes() #187

merged 10 commits into from
Dec 21, 2023

Conversation

aryarm
Copy link
Member

@aryarm aryarm commented Aug 11, 2023

I was playing around with GetLengthGenotypes when trying to integrate it into haptools and noticed there might be a potentially faster way of implementing it.

This implementation vectorizes the for-loop using broadcasting, improving the speed as we scale up 1) the number of samples and 2) the number of alleles in each TR of a VCF. I benchmarked haptools for 1 (but not 2) via our benchmarking script (with some small adaptations).

benchmark_GetLengthGenotypes

I didn't add any tests to this PR because it's just a refactor. GetLengthGenotypes already has tests and they all seemed to pass with this new code. But feel free to let me know if there are any tests that it would be good for me to add.

use broadcasting to improve the speed as we scale up the number of samples
@aryarm aryarm changed the title speed up for TRRecord.GetLengthGenotypes() refactor: speed up for TRRecord.GetLengthGenotypes() Aug 11, 2023
@aryarm aryarm changed the base branch from master to develop August 16, 2023 18:36
@aryarm aryarm marked this pull request as ready for review August 16, 2023 19:38
@aryarm
Copy link
Member Author

aryarm commented Oct 30, 2023

@LiterallyUniqueLogin, can I request a review from you since you were the original author of this code? What do you think of these changes?

@aryarm aryarm changed the base branch from develop to master October 30, 2023 21:25
@aryarm aryarm changed the title refactor: speed up for TRRecord.GetLengthGenotypes() perf: speed up for TRRecord.GetLengthGenotypes() Nov 9, 2023
Copy link
Contributor

@LiterallyUniqueLogin LiterallyUniqueLogin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thanks for the update! Just one minor comment to address.

trtools/utils/tr_harmonizer.py Outdated Show resolved Hide resolved
@aryarm aryarm changed the title perf: speed up for TRRecord.GetLengthGenotypes() perf: speed up TRRecord.GetLengthGenotypes() Dec 20, 2023
@aryarm aryarm merged commit 835ced3 into gymrek-lab:master Dec 21, 2023
5 checks passed
@aryarm aryarm deleted the ref/GetLengthGenotypes branch December 21, 2023 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants