Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More excessively micro micro-optimizations #5616

Merged
merged 4 commits into from
Feb 8, 2019

Conversation

jamesemery
Copy link
Collaborator

These methods were showing up as time syncs on the profile for #5607
screen shot 2019-01-28 at 3 44 55 pm
screen shot 2019-01-28 at 3 44 41 pm

I ran these a few times vs. its root branch, its probably a small improvement in runtime over this chr15 snippet of an exome on my local machine:
#5607:

2m57.516s, 2m56.677s, 3m2.531s, 3m4.116s

This branch:

2m50.112s, 2m46.981s, 2m43.764s, 2m40.806s

Depends On #5607

@jamesemery jamesemery requested a review from droazen January 28, 2019 21:48
@codecov-io
Copy link

codecov-io commented Jan 28, 2019

Codecov Report

Merging #5616 into master will increase coverage by 0.003%.
The diff coverage is 100%.

@@               Coverage Diff               @@
##              master     #5616       +/-   ##
===============================================
+ Coverage     87.035%   87.038%   +0.003%     
- Complexity     31726     31730        +4     
===============================================
  Files           1943      1943               
  Lines         146193    146199        +6     
  Branches       16141     16144        +3     
===============================================
+ Hits          127239    127249       +10     
+ Misses         13067     13064        -3     
+ Partials        5887      5886        -1
Impacted Files Coverage Δ Complexity Δ
...kers/haplotypecaller/ReferenceConfidenceModel.java 92.632% <100%> (+0.24%) 70 <0> (+1) ⬆️
...nder/tools/spark/pipelines/ReadsPipelineSpark.java 90.741% <0%> (ø) 13% <0%> (ø) ⬇️
...ithwaterman/SmithWatermanIntelAlignerUnitTest.java 60% <0%> (ø) 2% <0%> (ø) ⬇️
...lotypecaller/readthreading/ReadThreadingGraph.java 88.861% <0%> (+0.253%) 145% <0%> (+1%) ⬆️
...utils/smithwaterman/SmithWatermanIntelAligner.java 80% <0%> (+30%) 3% <0%> (+2%) ⬆️

@droazen droazen self-assigned this Feb 4, 2019
@jamesemery jamesemery force-pushed the je_je_HaplotypeCallerMicroOptimization4 branch from b3f376f to b194fab Compare February 4, 2019 21:38
@jamesemery
Copy link
Collaborator Author

@droazen this branch wasn't STRICTLY dependent on #5607, so I removed it from this branch to make reviewing easier. Its worth noting that the performance numbers and observed speedup were seen when this branch did hang off of #5607.

Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesemery back to you with my comments

@@ -193,18 +193,18 @@ public ReferenceConfidenceModel(final SampleList samples,
final String sampleName = readLikelihoods.getSample(0);

final int globalRefOffset = refSpan.getStart() - activeRegion.getExtendedSpan().getStart();
for ( final ReadPileup pileup : refPileups ) {
for (int i = 0, size = refPileups.size(); i < size; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declare refPileupsSize as a final var before the loop instead of using multiple initialization within the loop.

@@ -193,18 +193,18 @@ public ReferenceConfidenceModel(final SampleList samples,
final String sampleName = readLikelihoods.getSample(0);

final int globalRefOffset = refSpan.getStart() - activeRegion.getExtendedSpan().getStart();
for ( final ReadPileup pileup : refPileups ) {
for (int i = 0, size = refPileups.size(); i < size; i++) {
ReadPileup pileup = refPileups.get(i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final

final int position = call != null ? call.getStart() : curPos.getStart();
return priorList.stream().filter(vc -> position == vc.getStart()).collect(Collectors.toList());
List<VariantContext> list = new ArrayList<>(priorList.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And choose a more descriptive name than list. How about matchingPriors?

final int position = call != null ? call.getStart() : curPos.getStart();
return priorList.stream().filter(vc -> position == vc.getStart()).collect(Collectors.toList());
List<VariantContext> list = new ArrayList<>(priorList.size());
for (int i = 0, size = priorList.size(); i < size; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Declare priorListSize as a final var before the loop instead of using multiple initialization within the loop.

final int position = call != null ? call.getStart() : curPos.getStart();
return priorList.stream().filter(vc -> position == vc.getStart()).collect(Collectors.toList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment that this method is implemented with an old-school for loop instead of streaming for performance reasons (otherwise, someone will probably come along in 2 years and undo this).

@droazen droazen assigned jamesemery and unassigned droazen Feb 7, 2019
@jamesemery
Copy link
Collaborator Author

@droazen Responded to comments, note that i added a further escape condition where getMatchingPriors is avoided altogether if the VCpriors list is empty. Remember that this method is in a performance sensitive part of the code so every little bit of speed counts.

@droazen
Copy link
Contributor

droazen commented Feb 8, 2019

@jamesemery You have a compiler error on the latest version of this branch:

:compileJava/home/travis/build/broadinstitute/gatk/src/main/java/org/broadinstitute/hellbender/tools/walkers/haplotypecaller/ReferenceConfidenceModel.java:428: error: ';' expected
        final int priorsListSize = priorList.size()
                                                   ^
1 error

@jamesemery
Copy link
Collaborator Author

@droazen fixed, sorry about that

Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor additional comments -- back to @jamesemery. Merge after addressing them and tests pass.

@@ -193,18 +193,19 @@ public ReferenceConfidenceModel(final SampleList samples,
final String sampleName = readLikelihoods.getSample(0);

final int globalRefOffset = refSpan.getStart() - activeRegion.getExtendedSpan().getStart();
for ( final ReadPileup pileup : refPileups ) {
final int refPileupsSize = refPileups.size();
for (int i = 0; i < refPileupsSize; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment here that the old-school for loop is intentional (for-each was found to show up on the profiler).

final int position = call != null ? call.getStart() : curPos.getStart();
return priorList.stream().filter(vc -> position == vc.getStart()).collect(Collectors.toList());
final List<VariantContext> matchedPriors = new ArrayList<>(priorList.size());
// NOTE: a for loop is used here because this method ends up being called per-pileup, per-read and thus using faster alternates saves runtime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

" thus using faster alternates saves runtime" -> "using a loop instead of streaming saves runtime"

@jamesemery jamesemery merged commit 5562b25 into master Feb 8, 2019
@jamesemery jamesemery deleted the je_je_HaplotypeCallerMicroOptimization4 branch February 8, 2019 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants