Reduced some of the repeated steps in ReferenceConfidenceModel.calcNIndelinformativeReads #5469

jamesemery · 2018-11-30T19:29:01Z

This micro-optimization fell out of profiling of the HaplotypeCaller in GVCF mode.

Profiler view over an Exome before this patch:

Profiler view over the same Exome after this patch:

I suspect given the remaining 9% runtime could be reduced further by looking more closely at the array operations in isReadInformativeAboutIndelsOfSize()

(It should be noted that these profiler results lie within the ReferenceModelForNoVariation codepath which since this is over an Exome we expect the runtime to overall be skewed towards no-variation blocks)

Resolves #5648

jamesemery · 2018-11-30T20:02:08Z

As a sanity check, HaplotypeCaller run in GVCF mode over a bam subsetted to only chr15 on my laptop:
master:

real	4m28.600s
user	5m48.484s
sys	0m4.510s

this branch:

real	3m58.043s
user	5m15.625s
sys	0m4.012s

ldgauthier

I was going to buy you a coffee for the performance improvements, but after looking at the code it looks like maybe I owe you a coffee for fixing my performance regression.

ldgauthier · 2018-11-30T20:20:42Z

src/main/java/org/broadinstitute/hellbender/utils/read/AlignmentUtils.java

-    }
-
-    public static byte[] getSequenceAlignedOneToOne(final GATKRead read, final Function<GATKRead, byte[]> bytesProvider, final byte padWith) {
+    public static Tuple<byte[], byte[]> getBasesAndBaseQualitiesAlginedOneToOne(final GATKRead read, final byte gapCharacter, final byte qualityPadCharacter) {


Algined -> Aligned

jonn-smith

One minor question, but looks good. Let me know what you think about pair vs tuple and when tests pass, merge away.

jonn-smith · 2018-11-30T19:53:47Z

...va/org/broadinstitute/hellbender/tools/walkers/haplotypecaller/ReferenceConfidenceModel.java

@@ -6,6 +6,7 @@
 import htsjdk.samtools.CigarOperator;
 import htsjdk.samtools.SAMFileHeader;
 import htsjdk.samtools.util.Locatable;
+import htsjdk.samtools.util.Tuple;


Is there any advantage to using Tuple over Pair?

Huh, I'm not sure what Pair implementation you are talking about, we have one in gatk that is specific to MarkDuplicates that should probably be renamed to be less confusing anyway...

I think the different tuple implementations are mostly interchangeable but this one happens to live in htsjdk so that is generally a dependency plus.

I was talking about org.apache.commons.lang3.tuple.Pair. I've been using it a lot in Funcotator.

codecov-io · 2018-11-30T20:34:39Z

Codecov Report

Merging #5469 into master will increase coverage by 0.004%.
The diff coverage is 100%.

@@               Coverage Diff               @@
##              master     #5469       +/-   ##
===============================================
+ Coverage     87.037%   87.041%   +0.004%     
- Complexity     31728     31731        +3     
===============================================
  Files           1943      1943               
  Lines         146193    146225       +32     
  Branches       16141     16143        +2     
===============================================
+ Hits          127242    127275       +33     
  Misses         13064     13064               
+ Partials        5887      5886        -1

Impacted Files	Coverage Δ	Complexity Δ
...kers/haplotypecaller/ReferenceConfidenceModel.java	`92.35% <100%> (-0.042%)`	`69 <0> (ø)`
...nstitute/hellbender/utils/read/AlignmentUtils.java	`77.901% <100%> (+0.372%)`	`170 <1> (+1)`	⬆️
.../hellbender/utils/read/AlignmentUtilsUnitTest.java	`98.609% <100%> (+0.04%)`	`304 <2> (+2)`	⬆️
...nder/utils/runtime/StreamingProcessController.java	`67.773% <0%> (+0.474%)`	`33% <0%> (ø)`	⬇️

lbergelson

the last commit looks good to me

droazen

@jamesemery Review complete, back to you with my comments.

droazen · 2018-12-10T20:29:00Z

src/main/java/org/broadinstitute/hellbender/utils/read/AlignmentUtils.java

@@ -200,48 +199,57 @@ public static GATKRead createReadAlignedToRef(final GATKRead originalRead,
        return Arrays.copyOfRange(bases, basesStart, basesStop + 1);
    }

-    public static byte[] getBasesAlignedOneToOne(final GATKRead read) {
-        return getSequenceAlignedOneToOne(read, r -> r.getBasesNoCopy(), GAP_CHARACTER);
+    public static Tuple<byte[], byte[]> getBasesAndBaseQualitiesAlignedOneToOne(final GATKRead read) {


Add javadoc for this method, and in particular explain the meaning of the return value. Also document the fact that the bases/base quality arrays are not copied and that changing them will alter the corresponding arrays in the read.

Also add good unit tests for the new getBasesAndBaseQualitiesAlignedOneToOne() method.

note, this method is not actually a new method, just a refactoring of an old method that was being invoked twice.

droazen · 2018-12-10T20:29:06Z

src/main/java/org/broadinstitute/hellbender/utils/read/AlignmentUtils.java

-    }
-
-    public static byte[] getSequenceAlignedOneToOne(final GATKRead read, final Function<GATKRead, byte[]> bytesProvider, final byte padWith) {
+    public static Tuple<byte[], byte[]> getBasesAndBaseQualitiesAlignedOneToOne(final GATKRead read, final byte gapCharacter, final byte qualityPadCharacter) {


Add javadoc for this method, and in particular explain the meaning of the return value and method parameters. Also document the fact that the bases/base quality arrays are not copied and that changing them will alter the corresponding arrays in the read.

droazen · 2018-12-10T20:30:54Z

...va/org/broadinstitute/hellbender/tools/walkers/haplotypecaller/ReferenceConfidenceModel.java

@@ -590,21 +591,20 @@ boolean isReadInformativeAboutIndelsOfSize(final GATKRead read,
        // We are safe to use the faster no-copy versions of getBases and getBaseQualities here,
        // since we're not modifying the returned arrays in any way. This makes a small difference
        // in the HaplotypeCaller profile, since this method is a major hotspot.
-        final byte[] readBases = AlignmentUtils.getBasesAlignedOneToOne(read);  //calls getBasesNoCopy if CIGAR is all match
-        final byte[] readQuals = AlignmentUtils.getBaseQualsAlignedOneToOne(read);
+        final Tuple<byte[], byte[]> readBasesAndBaseQualities = AlignmentUtils.getBasesAndBaseQualitiesAlignedOneToOne(read);  //calls getBasesNoCopy if CIGAR is all match


Agree with @jonn-smith on using the Apache commons Pair instead of Tuple for the return value.

droazen · 2018-12-10T20:58:45Z

src/main/java/org/broadinstitute/hellbender/utils/read/AlignmentUtils.java

        }
        else {
-            final byte[] paddedBases = new byte[CigarUtils.countRefBasesIncludingSoftClips(read, 0, cigar.numCigarElements())];
+            int numberRefBasesIncludingSoftclips = CigarUtils.countRefBasesIncludingSoftClips(read, 0, numCigarElements);


numberRefBasesIncludingSoftclips should be final

droazen · 2018-12-10T21:00:30Z

src/main/java/org/broadinstitute/hellbender/utils/read/AlignmentUtils.java

            int literalPos = 0;
            int paddedPos = 0;
-            for ( int i = 0; i < cigar.numCigarElements(); i++ ) {
-                final CigarElement ce = cigar.getCigarElement(i);
+            for ( int i = 0; i < read.numCigarElements(); i++ ) {


Use the already-initialized numCigarElements here instead of read.numCigarElements()

good catch!

droazen · 2019-01-16T21:18:31Z

@jamesemery Pinging you on this one -- do you want to get this in for the 4.1 release?

jamesemery · 2019-01-29T21:11:20Z

@droazen responded to comments back to you

droazen

Back to you @jamesemery with a second round of comments

droazen · 2019-02-05T20:51:57Z

...va/org/broadinstitute/hellbender/tools/walkers/haplotypecaller/ReferenceConfidenceModel.java


-
-        final int baselineMMSum = sumMismatchingQualities(readBases, readQuals, readStart, refBases, refStart, Integer.MAX_VALUE);
+        final int baselineMMSum = sumMismatchingQualities(readBasesAndBaseQualities.getKey(), readBasesAndBaseQualities.getValue(), readStart, refBases, refStart, Integer.MAX_VALUE);


Use getLeft() and getRight() here instead of getKey() and getValue()

droazen · 2019-02-05T20:54:02Z

...va/org/broadinstitute/hellbender/tools/walkers/haplotypecaller/ReferenceConfidenceModel.java


        // consider each indel size up to max in term, checking if an indel that deletes either the ref bases (deletion
        // or read bases (insertion) would fit as well as the origin baseline sum of mismatching quality scores
        for ( int indelSize = 1; indelSize <= maxIndelSize; indelSize++ ) {
            // check insertions:
-            if (sumMismatchingQualities(readBases, readQuals, readStart + indelSize, refBases, refStart, baselineMMSum) <= baselineMMSum) {
+            if (sumMismatchingQualities(readBasesAndBaseQualities.getKey(), readBasesAndBaseQualities.getValue(), readStart + indelSize, refBases, refStart, baselineMMSum) <= baselineMMSum) {


Use getLeft() and getRight() here instead of getKey() and getValue()

droazen · 2019-02-05T20:54:15Z

...va/org/broadinstitute/hellbender/tools/walkers/haplotypecaller/ReferenceConfidenceModel.java

                return false;
            }
            // check deletions:
-            if (sumMismatchingQualities(readBases, readQuals, readStart, refBases, refStart + indelSize, baselineMMSum) <= baselineMMSum) {
+            if (sumMismatchingQualities(readBasesAndBaseQualities.getKey(), readBasesAndBaseQualities.getValue(), readStart, refBases, refStart + indelSize, baselineMMSum) <= baselineMMSum) {


Use getLeft() and getRight() here instead of getKey() and getValue()

droazen · 2019-02-05T20:57:17Z

src/main/java/org/broadinstitute/hellbender/utils/read/AlignmentUtils.java

-        return getSequenceAlignedOneToOne(read, r -> r.getBaseQualitiesNoCopy(), (byte)0);
+    /**
+     * Returns the "IGV View" of all the bases and base qualities in a read aligned to the reference according to the cigar, dropping any bases
+     * that might be in the read but don't aren't in the reference. Any bases that appear in the reference but not the read


don't aren't -> aren't

droazen · 2019-02-07T17:54:22Z

src/main/java/org/broadinstitute/hellbender/utils/read/AlignmentUtils.java

    }

-    public static byte[] getSequenceAlignedOneToOne(final GATKRead read, final Function<GATKRead, byte[]> bytesProvider, final byte padWith) {
+    private static Pair<byte[], byte[]> getBasesAndBaseQualitiesAlignedOneToOne(final GATKRead read, final byte gapCharacter, final byte qualityPadCharacter) {


I think that basePadCharacter would be a better name than gapCharacter, and more consistent with the qualityPadCharacter arg.

droazen · 2019-02-07T19:44:07Z

src/main/java/org/broadinstitute/hellbender/utils/read/AlignmentUtils.java

        }
        else {
-            final byte[] paddedBases = new byte[CigarUtils.countRefBasesIncludingSoftClips(read, 0, cigar.numCigarElements())];
+            int numberRefBasesIncludingSoftclips = CigarUtils.countRefBasesIncludingSoftClips(read, 0, numCigarElements);
+            final byte[] paddedBases = new byte[numberRefBasesIncludingSoftclips];


numberRefBasesIncludingSoftclips should be final (looks like you missed this comment from last time?)

droazen · 2019-02-07T19:45:01Z

src/main/java/org/broadinstitute/hellbender/utils/read/AlignmentUtils.java

-            for ( int i = 0; i < cigar.numCigarElements(); i++ ) {
-                final CigarElement ce = cigar.getCigarElement(i);
+            for ( int i = 0; i < read.numCigarElements(); i++ ) {
+                final CigarElement ce = read.getCigarElement(i);


Use the already-initialized numCigarElements in the for loop condition here instead of read.numCigarElements() (another comment not addressed from last time)

droazen · 2019-02-07T19:50:38Z

src/test/java/org/broadinstitute/hellbender/utils/read/AlignmentUtilsUnitTest.java

+
+
+    @Test(dataProvider = "makeGetBasesAndBaseQualitiesAlignedOneToOneTest")
+    public void testCalcNIndelInformativeReads(final String readBases, final String cigar, final String expectedBases, final byte[] expectedQuals ) {


Test case is misnamed -- should be testGetBasesAndBaseQualitiesAlignedOneToOne(), not testCalcNIndelInformativeReads()

droazen · 2019-02-07T19:52:07Z

src/test/java/org/broadinstitute/hellbender/utils/read/AlignmentUtilsUnitTest.java

+        Pair<byte[], byte[]> actual = AlignmentUtils.getBasesAndBaseQualitiesAlignedOneToOne(read);
+
+        Assert.assertEquals(new String(actual.getKey()), expectedBases);
+        Assert.assertEquals(actual.getValue(), expectedQuals);


Use getLeft() and getRight()here as well, instead ofgetKey()/getValue()`

droazen · 2019-02-07T19:53:03Z

src/main/java/org/broadinstitute/hellbender/utils/read/AlignmentUtils.java

+     *
+     * @param read a read to return aligned to the reference
+     * @return A tuple of byte arrays where the first array corresponds to the bases aligned to the reference and second
+     *         array corresponds to the baseQualities aligned to the reference.


Description of return value is out of date -- it returns a Pair, not a Tuple.

jamesemery · 2019-02-07T20:13:39Z

@droazen Responded to your comments, thanks for the review. Is this okay to merge now?

…ndelinformativeReads

…that travis uses

droazen

👍 latest version looks good -- merge once tests pass

jamesemery requested review from ldgauthier and jonn-smith November 30, 2018 19:29

jamesemery assigned jonn-smith Nov 30, 2018

droazen self-assigned this Nov 30, 2018

droazen self-requested a review November 30, 2018 20:10

ldgauthier approved these changes Nov 30, 2018

View reviewed changes

jonn-smith approved these changes Nov 30, 2018

View reviewed changes

lbergelson approved these changes Dec 5, 2018

View reviewed changes

jamesemery mentioned this pull request Dec 5, 2018

Evaluate ReferenceConfidenceModel.calcNIndelInformativeReads() optimizations #5488

Closed

jamesemery force-pushed the je_HaplotypeCallerMicroOptimization1 branch from 61413bc to e26fb36 Compare December 10, 2018 17:36

droazen suggested changes Dec 10, 2018

View reviewed changes

droazen assigned jamesemery and unassigned droazen and jonn-smith Dec 10, 2018

jamesemery assigned droazen and unassigned jamesemery Jan 29, 2019

jamesemery force-pushed the je_HaplotypeCallerMicroOptimization1 branch from 9b6fed1 to 37cf03e Compare January 29, 2019 21:14

jamesemery mentioned this pull request Feb 5, 2019

AlignmentUtils.getSequenceAlignedOneToOne needs javadoc #5648

Closed

droazen suggested changes Feb 7, 2019

View reviewed changes

droazen assigned jamesemery and unassigned droazen Feb 7, 2019

jamesemery added 4 commits February 7, 2019 15:14

Reduced some of the repeated steps in ReferenceConfidenceModel.calcNI…

c0fea1d

…ndelinformativeReads

spellcheck

c79f9e5

Removed the call to getCigar() for further optimization

5da3a4a

responded to comments

286c036

jamesemery added 3 commits February 7, 2019 15:15

switched to a Pair implementaiton that does exist on the J9 compiler …

8688051

…that travis uses

again

a8c685e

responding to second round of review comments

0de1938

jamesemery force-pushed the je_HaplotypeCallerMicroOptimization1 branch from 33c1ce5 to 0de1938 Compare February 7, 2019 20:15

removing x

4beeb47

droazen approved these changes Feb 7, 2019

View reviewed changes

jamesemery merged commit 566e97c into master Feb 11, 2019

jamesemery deleted the je_HaplotypeCallerMicroOptimization1 branch February 11, 2019 17:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduced some of the repeated steps in ReferenceConfidenceModel.calcNIndelinformativeReads #5469

Reduced some of the repeated steps in ReferenceConfidenceModel.calcNIndelinformativeReads #5469

jamesemery commented Nov 30, 2018 •

edited

Loading

jamesemery commented Nov 30, 2018

ldgauthier left a comment

ldgauthier Nov 30, 2018

jonn-smith left a comment

jonn-smith Nov 30, 2018

jamesemery Nov 30, 2018

jonn-smith Nov 30, 2018

codecov-io commented Nov 30, 2018 •

edited

Loading

lbergelson left a comment

droazen left a comment

droazen Dec 10, 2018

droazen Dec 10, 2018

jamesemery Jan 9, 2019

droazen Dec 10, 2018

droazen Dec 10, 2018

droazen Dec 10, 2018

droazen Dec 10, 2018

jamesemery Feb 7, 2019

droazen commented Jan 16, 2019

jamesemery commented Jan 29, 2019

droazen left a comment

droazen Feb 5, 2019

jamesemery Feb 7, 2019

droazen Feb 5, 2019

droazen Feb 5, 2019

droazen Feb 5, 2019

droazen Feb 7, 2019

droazen Feb 7, 2019

droazen Feb 7, 2019

droazen Feb 7, 2019

droazen Feb 7, 2019

droazen Feb 7, 2019

jamesemery commented Feb 7, 2019

droazen left a comment



		final int baselineMMSum = sumMismatchingQualities(readBases, readQuals, readStart, refBases, refStart, Integer.MAX_VALUE);
		final int baselineMMSum = sumMismatchingQualities(readBasesAndBaseQualities.getKey(), readBasesAndBaseQualities.getValue(), readStart, refBases, refStart, Integer.MAX_VALUE);



		@Test(dataProvider = "makeGetBasesAndBaseQualitiesAlignedOneToOneTest")
		public void testCalcNIndelInformativeReads(final String readBases, final String cigar, final String expectedBases, final byte[] expectedQuals ) {

Reduced some of the repeated steps in ReferenceConfidenceModel.calcNIndelinformativeReads #5469

Reduced some of the repeated steps in ReferenceConfidenceModel.calcNIndelinformativeReads #5469

Conversation

jamesemery commented Nov 30, 2018 • edited Loading

jamesemery commented Nov 30, 2018

ldgauthier left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonn-smith left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Nov 30, 2018 • edited Loading

Codecov Report

lbergelson left a comment

Choose a reason for hiding this comment

droazen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

droazen commented Jan 16, 2019

jamesemery commented Jan 29, 2019

droazen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamesemery commented Feb 7, 2019

droazen left a comment

Choose a reason for hiding this comment

jamesemery commented Nov 30, 2018 •

edited

Loading

codecov-io commented Nov 30, 2018 •

edited

Loading