choose one disjoint read to report #5926

tedsharpe · 2019-05-07T20:38:20Z

write BAM of rejected reads
trim reads to length of short fragments

codecov · 2019-05-07T21:13:51Z

Codecov Report

Merging #5926 into master will decrease coverage by 0.003%.
The diff coverage is 37.975%.

@@              Coverage Diff               @@
##              master    #5926       +/-   ##
==============================================
- Coverage     86.823%   86.82%   -0.003%     
- Complexity     32344    32348        +4     
==============================================
  Files           1993     1993               
  Lines         149460   149474       +14     
  Branches       16502    16521       +19     
==============================================
+ Hits          129766   129774        +8     
+ Misses         13679    13676        -3     
- Partials        6015     6024        +9

Impacted Files	Coverage Δ	Complexity Δ
...er/tools/AnalyzeSaturationMutagenesisUnitTest.java	`97.878% <100%> (+0.029%)`	`29 <0> (ø)`	⬇️
...hellbender/tools/AnalyzeSaturationMutagenesis.java	`46.945% <30.986%> (-0.023%)`	`15 <5> (+4)`
...lotypecaller/readthreading/ReadThreadingGraph.java	`88.725% <0%> (-0.245%)`	`159% <0%> (ø)`

SHuang-Broad

Except several nonessential SE related comments, I have

a question, "I'm not understanding why disjoint => inconsistent pair." on line 1723
a suggestion regarding lines 1611-1627.

SHuang-Broad · 2019-05-13T18:28:58Z

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

-    private static final MoleculeCounts moleculeCountsUnpaired = new MoleculeCounts();
-    private static final MoleculeCounts moleculeCountsDisjointPair = new MoleculeCounts();
-    private static final MoleculeCounts moleculeCountsOverlappingPair = new MoleculeCounts();
+    private static final String[] unpairedLabels = {


Sounds like a candidate for enum but not critical.
And if you are bored, a class hierarchy for MoleculeCounts is possible I agree it feels like an overkill for now, though I do feel the last several revisions are going that direction.

Don't see how an enum would help. Class hierarchy would work, but I'm going to duck it for now since I'm just replacing labels.

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

SHuang-Broad · 2019-05-13T19:00:39Z

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

+            } else { // mates are disjoint
+                final String ignoredMate = "ignoredMate";
+                if ( read1.isFirstOfPair() ) {
+                    processReport(read1, report1, moleculeCountsDisjointPair);


if we want to loose weight:

final GATKRead rd = read1.isFirstOfPair() ? read1 : read2; final ReadReport rp = read1.isFirstOfPair() ? report1 : report2; processReport(rd, rp, moleculeCountsDisjointPair); if ( bamWriter != null ) { rd.setAttribute(EXCUSE_ATTRIBUTE, ignoredMate); bamWriter.addRead(rd); }

We're processing the report on the selected read, but setting the attribute on the rejected read. This doesn't quite work.

SHuang-Broad · 2019-05-13T19:02:07Z

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

+                        bamWriter.addRead(read1);
+                    }
+                }
+                moleculeCountsDisjointPair.bumpInconsistentPairs();


I'm not understanding why disjoint => inconsistent pair.

That's because it's an ugly hack. Tried to improve it a little.

SHuang-Broad · 2019-05-13T19:10:33Z

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

+        }
+
+        // don't process past end of fragment when fragment length < read length
+        if ( read.isProperlyPaired() ) {


Since Interval trim is being changed in this new if-block, I would propose absorbing this block into calculateTrim(...).
I tried to use "Refactor -> Extract -> Method", and the IDE tells me that it is impossible only because of the two statements return documentLowQualityRead(read), but I think it is still do-able by returning a 0-sized interval in calculateTrim(...), just like what happened in lines 1606-1608.

Refactored along the lines of what you suggested into a method that does the quality trim (the original calculateTrim code), and a 2nd that does the short fragment trim.

tedsharpe

Ah, you know what? Your valid criticism of the MoleculeCounts organization has left me feeling dirty.
It's too much of a hack. I'll rework it and let you know when I'm ready for another look.

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

tedsharpe · 2019-05-14T19:26:13Z

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

-    private static final MoleculeCounts moleculeCountsUnpaired = new MoleculeCounts();
-    private static final MoleculeCounts moleculeCountsDisjointPair = new MoleculeCounts();
-    private static final MoleculeCounts moleculeCountsOverlappingPair = new MoleculeCounts();
+    private static final String[] unpairedLabels = {


Don't see how an enum would help. Class hierarchy would work, but I'm going to duck it for now since I'm just replacing labels.

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

tedsharpe · 2019-05-14T20:06:53Z

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

+        }
+
+        // don't process past end of fragment when fragment length < read length
+        if ( read.isProperlyPaired() ) {


Refactored along the lines of what you suggested into a method that does the quality trim (the original calculateTrim code), and a 2nd that does the short fragment trim.

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

tedsharpe · 2019-05-14T20:14:43Z

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

+            } else { // mates are disjoint
+                final String ignoredMate = "ignoredMate";
+                if ( read1.isFirstOfPair() ) {
+                    processReport(read1, report1, moleculeCountsDisjointPair);


We're processing the report on the selected read, but setting the attribute on the rejected read. This doesn't quite work.

tedsharpe · 2019-05-14T20:15:53Z

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

+                        bamWriter.addRead(read1);
+                    }
+                }
+                moleculeCountsDisjointPair.bumpInconsistentPairs();


That's because it's an ugly hack. Tried to improve it a little.

tedsharpe

@SHuang-Broad Steve, this is ready for another look. I created an enum and redid the counters. I think it feels a little less hacky now.

SHuang-Broad

Several trivial comments, except one that is out of curiosity, regarding line 1640

just curious, is it ever possible, that both coverages are empty?

If you feel this question needs to be addressed in code, I'm happy to take a look again.

SHuang-Broad · 2019-05-18T21:27:47Z

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

                }
            }
+        } catch ( final Exception exception ) {
+            throw new GATKException(
+                    "Caught unexpected exception on read " + readCounts.totalCounts() + ": " + read1.getName(),


do you want to close the bam writer here? not sure.

No. Better to write nothing than to write an incomplete file.

SHuang-Broad · 2019-05-18T21:57:41Z

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

@@ -528,31 +540,18 @@ private static void writeReadCounts() {
        }
    }

-    private static void writeMoleculeCounts( final MoleculeCounts moleculeCounts,
+    private static void writeMoleculeCounts( final ReportTypeCounts counts,


I think this method name is a relic from the previous iterations.
But non-essential.

SHuang-Broad · 2019-05-18T22:11:15Z

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

    @VisibleForTesting static Reference reference;
    @VisibleForTesting static CodonTracker codonTracker; // device for turning SNVs into CodonVariations
+    @VisibleForTesting static SAMFileGATKReadWriter bamWriter;


what about naming it something like rejectedReadsBamWriter, for clarity on purpose?

SHuang-Broad · 2019-05-18T22:17:47Z

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

@@ -1614,21 +1610,81 @@ public void updateCounts( final MoleculeCounts moleculeCounts,
        return new Interval(readStart, readEnd);
    }

-    @VisibleForTesting static void updateCountsForPair( final ReadReport report1, final ReadReport report2 ) {
+    private static Interval calculateShortFragmentTrim( final GATKRead read, final Interval qualityTrim ) {


I'd suggest give a short explanation on the difference between the two calculate...Trim functions.

doc for trim methods added

SHuang-Broad · 2019-05-18T22:25:58Z

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

        } else if ( report2.getRefCoverage().isEmpty() ) {
-            report1.updateCounts(moleculeCountsDisjointPair, codonTracker, variationCounts, reference);
+            if ( !report1.getRefCoverage().isEmpty() ) {


IDE tells me report1.getRefCoverage().isEmpty() is always false, which seem to be right given that the enclosing block is triggered only when report1.getRefCoverage().isEmpty() is false.

But I can understand if it is for symmetry.

Ha. Just silliness. I removed the redundant test.

SHuang-Broad · 2019-05-18T22:27:52Z

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

        if ( report1.getRefCoverage().isEmpty() ) {
-            report2.updateCounts(moleculeCountsDisjointPair, codonTracker, variationCounts, reference);
+            if ( !report2.getRefCoverage().isEmpty() ) {


just curious, is it ever possible, that both coverages are empty?

It is possible that both coverages will be empty. The trimming might pare away the entire alignment for both reads.

Then is the case covered somewhere up in the logic chain? I cannot remember if it is.
And I don't think this case is covered in these if-else block, or am I missing something?

if ( report1 is empty ) { if ( report2 is not empty ) { process it } /* else { do nothing } */

It's covered.

SHuang-Broad · 2019-05-18T22:31:27Z

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

+    private static long totalBaseCalls;
+    private static final ReportTypeCounts readCounts = new ReportTypeCounts();
+    private static final ReportTypeCounts unpairedCounts = new ReportTypeCounts();
+    private static final ReportTypeCounts disjointPairCounts = new ReportTypeCounts();


Is disjoint the same as non-overlapping, in the strict technical sense (i.e. hard clipping not included, soft clipping included)?

Unclear on what you're asking: We're only dealing with primary lines (no hard clips). But we're also doing our own quality trimming and short-fragment trimming, so it's not just the cigar that dictates whether the evaluated bases in a pair overlap, abut, or are disjoint.

I understand now what disjoint means. Thanks!

tedsharpe

Thanks for the review, Steve. I think the code is quite a bit clearer now.

tedsharpe · 2019-05-19T18:03:13Z

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

    @VisibleForTesting static Reference reference;
    @VisibleForTesting static CodonTracker codonTracker; // device for turning SNVs into CodonVariations
+    @VisibleForTesting static SAMFileGATKReadWriter bamWriter;


tedsharpe · 2019-05-19T18:06:09Z

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

+    private static long totalBaseCalls;
+    private static final ReportTypeCounts readCounts = new ReportTypeCounts();
+    private static final ReportTypeCounts unpairedCounts = new ReportTypeCounts();
+    private static final ReportTypeCounts disjointPairCounts = new ReportTypeCounts();


Unclear on what you're asking: We're only dealing with primary lines (no hard clips). But we're also doing our own quality trimming and short-fragment trimming, so it's not just the cigar that dictates whether the evaluated bases in a pair overlap, abut, or are disjoint.

tedsharpe · 2019-05-19T18:06:48Z

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

                }
            }
+        } catch ( final Exception exception ) {
+            throw new GATKException(
+                    "Caught unexpected exception on read " + readCounts.totalCounts() + ": " + read1.getName(),


No. Better to write nothing than to write an incomplete file.

tedsharpe · 2019-05-19T18:08:44Z

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

@@ -528,31 +540,18 @@ private static void writeReadCounts() {
        }
    }

-    private static void writeMoleculeCounts( final MoleculeCounts moleculeCounts,
+    private static void writeMoleculeCounts( final ReportTypeCounts counts,


tedsharpe · 2019-05-19T18:09:15Z

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

@@ -1614,21 +1610,81 @@ public void updateCounts( final MoleculeCounts moleculeCounts,
        return new Interval(readStart, readEnd);
    }

-    @VisibleForTesting static void updateCountsForPair( final ReadReport report1, final ReadReport report2 ) {
+    private static Interval calculateShortFragmentTrim( final GATKRead read, final Interval qualityTrim ) {


doc for trim methods added

tedsharpe · 2019-05-19T18:11:55Z

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

        if ( report1.getRefCoverage().isEmpty() ) {
-            report2.updateCounts(moleculeCountsDisjointPair, codonTracker, variationCounts, reference);
+            if ( !report2.getRefCoverage().isEmpty() ) {


It is possible that both coverages will be empty. The trimming might pare away the entire alignment for both reads.

tedsharpe · 2019-05-19T18:13:23Z

src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java

        } else if ( report2.getRefCoverage().isEmpty() ) {
-            report1.updateCounts(moleculeCountsDisjointPair, codonTracker, variationCounts, reference);
+            if ( !report1.getRefCoverage().isEmpty() ) {


Ha. Just silliness. I removed the redundant test.

SHuang-Broad · 2019-05-20T03:30:52Z

Thanks for making the tool better!

tedsharpe requested a review from SHuang-Broad May 7, 2019 20:38

SHuang-Broad requested changes May 13, 2019

View reviewed changes

tedsharpe commented May 14, 2019

View reviewed changes

tedsharpe commented May 15, 2019

View reviewed changes

SHuang-Broad approved these changes May 18, 2019

View reviewed changes

tedsharpe commented May 19, 2019

View reviewed changes

arbitrarily choose 1 read for disjoint pairs, dump rejected reads

07a03a2

tedsharpe force-pushed the tws_choose_one_disjoint_report branch from 3e4efb7 to 07a03a2 Compare May 19, 2019 18:17

tedsharpe merged commit cfc744b into master May 20, 2019

tedsharpe deleted the tws_choose_one_disjoint_report branch May 20, 2019 15:59

choose one disjoint read to report #5926

choose one disjoint read to report #5926

Conversation

tedsharpe commented May 7, 2019

codecov bot commented May 7, 2019 • edited Loading

Codecov Report

SHuang-Broad left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tedsharpe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tedsharpe left a comment

Choose a reason for hiding this comment

SHuang-Broad left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SHuang-Broad May 18, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tedsharpe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SHuang-Broad commented May 20, 2019

codecov bot commented May 7, 2019 •

edited

Loading

SHuang-Broad May 18, 2019 •

edited

Loading