-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
choose one disjoint read to report #5926
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5926 +/- ##
==============================================
- Coverage 86.823% 86.82% -0.003%
- Complexity 32344 32348 +4
==============================================
Files 1993 1993
Lines 149460 149474 +14
Branches 16502 16521 +19
==============================================
+ Hits 129766 129774 +8
+ Misses 13679 13676 -3
- Partials 6015 6024 +9
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except several nonessential SE related comments, I have
- a question, "I'm not understanding why disjoint => inconsistent pair." on line 1723
- a suggestion regarding lines 1611-1627.
private static final MoleculeCounts moleculeCountsUnpaired = new MoleculeCounts(); | ||
private static final MoleculeCounts moleculeCountsDisjointPair = new MoleculeCounts(); | ||
private static final MoleculeCounts moleculeCountsOverlappingPair = new MoleculeCounts(); | ||
private static final String[] unpairedLabels = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a candidate for enum but not critical.
And if you are bored, a class hierarchy for MoleculeCounts
is possible I agree it feels like an overkill for now, though I do feel the last several revisions are going that direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't see how an enum would help. Class hierarchy would work, but I'm going to duck it for now since I'm just replacing labels.
src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java
Outdated
Show resolved
Hide resolved
} else { // mates are disjoint | ||
final String ignoredMate = "ignoredMate"; | ||
if ( read1.isFirstOfPair() ) { | ||
processReport(read1, report1, moleculeCountsDisjointPair); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we want to loose weight:
final GATKRead rd = read1.isFirstOfPair() ? read1 : read2;
final ReadReport rp = read1.isFirstOfPair() ? report1 : report2;
processReport(rd, rp, moleculeCountsDisjointPair);
if ( bamWriter != null ) {
rd.setAttribute(EXCUSE_ATTRIBUTE, ignoredMate);
bamWriter.addRead(rd);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're processing the report on the selected read, but setting the attribute on the rejected read. This doesn't quite work.
bamWriter.addRead(read1); | ||
} | ||
} | ||
moleculeCountsDisjointPair.bumpInconsistentPairs(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not understanding why disjoint => inconsistent pair.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because it's an ugly hack. Tried to improve it a little.
} | ||
|
||
// don't process past end of fragment when fragment length < read length | ||
if ( read.isProperlyPaired() ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Interval trim
is being changed in this new if
-block, I would propose absorbing this block into calculateTrim(...)
.
I tried to use "Refactor -> Extract -> Method", and the IDE tells me that it is impossible only because of the two statements return documentLowQualityRead(read)
, but I think it is still do-able by returning a 0-sized interval in calculateTrim(...)
, just like what happened in lines 1606-1608.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored along the lines of what you suggested into a method that does the quality trim (the original calculateTrim code), and a 2nd that does the short fragment trim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you know what? Your valid criticism of the MoleculeCounts organization has left me feeling dirty.
It's too much of a hack. I'll rework it and let you know when I'm ready for another look.
src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java
Outdated
Show resolved
Hide resolved
private static final MoleculeCounts moleculeCountsUnpaired = new MoleculeCounts(); | ||
private static final MoleculeCounts moleculeCountsDisjointPair = new MoleculeCounts(); | ||
private static final MoleculeCounts moleculeCountsOverlappingPair = new MoleculeCounts(); | ||
private static final String[] unpairedLabels = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't see how an enum would help. Class hierarchy would work, but I'm going to duck it for now since I'm just replacing labels.
src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java
Outdated
Show resolved
Hide resolved
src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
// don't process past end of fragment when fragment length < read length | ||
if ( read.isProperlyPaired() ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored along the lines of what you suggested into a method that does the quality trim (the original calculateTrim code), and a 2nd that does the short fragment trim.
src/main/java/org/broadinstitute/hellbender/tools/AnalyzeSaturationMutagenesis.java
Outdated
Show resolved
Hide resolved
} else { // mates are disjoint | ||
final String ignoredMate = "ignoredMate"; | ||
if ( read1.isFirstOfPair() ) { | ||
processReport(read1, report1, moleculeCountsDisjointPair); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're processing the report on the selected read, but setting the attribute on the rejected read. This doesn't quite work.
bamWriter.addRead(read1); | ||
} | ||
} | ||
moleculeCountsDisjointPair.bumpInconsistentPairs(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because it's an ugly hack. Tried to improve it a little.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SHuang-Broad Steve, this is ready for another look. I created an enum and redid the counters. I think it feels a little less hacky now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several trivial comments, except one that is out of curiosity, regarding line 1640
just curious, is it ever possible, that both coverages are empty?
If you feel this question needs to be addressed in code, I'm happy to take a look again.
} | ||
} | ||
} catch ( final Exception exception ) { | ||
throw new GATKException( | ||
"Caught unexpected exception on read " + readCounts.totalCounts() + ": " + read1.getName(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want to close the bam writer here? not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Better to write nothing than to write an incomplete file.
@@ -528,31 +540,18 @@ private static void writeReadCounts() { | |||
} | |||
} | |||
|
|||
private static void writeMoleculeCounts( final MoleculeCounts moleculeCounts, | |||
private static void writeMoleculeCounts( final ReportTypeCounts counts, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method name is a relic from the previous iterations.
But non-essential.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@VisibleForTesting static Reference reference; | ||
@VisibleForTesting static CodonTracker codonTracker; // device for turning SNVs into CodonVariations | ||
@VisibleForTesting static SAMFileGATKReadWriter bamWriter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about naming it something like rejectedReadsBamWriter
, for clarity on purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -1614,21 +1610,81 @@ public void updateCounts( final MoleculeCounts moleculeCounts, | |||
return new Interval(readStart, readEnd); | |||
} | |||
|
|||
@VisibleForTesting static void updateCountsForPair( final ReadReport report1, final ReadReport report2 ) { | |||
private static Interval calculateShortFragmentTrim( final GATKRead read, final Interval qualityTrim ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest give a short explanation on the difference between the two calculate...Trim
functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc for trim methods added
} else if ( report2.getRefCoverage().isEmpty() ) { | ||
report1.updateCounts(moleculeCountsDisjointPair, codonTracker, variationCounts, reference); | ||
if ( !report1.getRefCoverage().isEmpty() ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDE tells me report1.getRefCoverage().isEmpty()
is always false, which seem to be right given that the enclosing block is triggered only when report1.getRefCoverage().isEmpty()
is false.
But I can understand if it is for symmetry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha. Just silliness. I removed the redundant test.
if ( report1.getRefCoverage().isEmpty() ) { | ||
report2.updateCounts(moleculeCountsDisjointPair, codonTracker, variationCounts, reference); | ||
if ( !report2.getRefCoverage().isEmpty() ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious, is it ever possible, that both coverages are empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible that both coverages will be empty. The trimming might pare away the entire alignment for both reads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then is the case covered somewhere up in the logic chain? I cannot remember if it is.
And I don't think this case is covered in these if-else
block, or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ( report1 is empty ) {
if ( report2 is not empty ) {
process it
} /* else {
do nothing
} */
It's covered.
private static long totalBaseCalls; | ||
private static final ReportTypeCounts readCounts = new ReportTypeCounts(); | ||
private static final ReportTypeCounts unpairedCounts = new ReportTypeCounts(); | ||
private static final ReportTypeCounts disjointPairCounts = new ReportTypeCounts(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is disjoint
the same as non-overlapping
, in the strict technical sense (i.e. hard clipping not included, soft clipping included)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear on what you're asking: We're only dealing with primary lines (no hard clips). But we're also doing our own quality trimming and short-fragment trimming, so it's not just the cigar that dictates whether the evaluated bases in a pair overlap, abut, or are disjoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand now what disjoint
means. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, Steve. I think the code is quite a bit clearer now.
@VisibleForTesting static Reference reference; | ||
@VisibleForTesting static CodonTracker codonTracker; // device for turning SNVs into CodonVariations | ||
@VisibleForTesting static SAMFileGATKReadWriter bamWriter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
private static long totalBaseCalls; | ||
private static final ReportTypeCounts readCounts = new ReportTypeCounts(); | ||
private static final ReportTypeCounts unpairedCounts = new ReportTypeCounts(); | ||
private static final ReportTypeCounts disjointPairCounts = new ReportTypeCounts(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclear on what you're asking: We're only dealing with primary lines (no hard clips). But we're also doing our own quality trimming and short-fragment trimming, so it's not just the cigar that dictates whether the evaluated bases in a pair overlap, abut, or are disjoint.
} | ||
} | ||
} catch ( final Exception exception ) { | ||
throw new GATKException( | ||
"Caught unexpected exception on read " + readCounts.totalCounts() + ": " + read1.getName(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Better to write nothing than to write an incomplete file.
@@ -528,31 +540,18 @@ private static void writeReadCounts() { | |||
} | |||
} | |||
|
|||
private static void writeMoleculeCounts( final MoleculeCounts moleculeCounts, | |||
private static void writeMoleculeCounts( final ReportTypeCounts counts, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -1614,21 +1610,81 @@ public void updateCounts( final MoleculeCounts moleculeCounts, | |||
return new Interval(readStart, readEnd); | |||
} | |||
|
|||
@VisibleForTesting static void updateCountsForPair( final ReadReport report1, final ReadReport report2 ) { | |||
private static Interval calculateShortFragmentTrim( final GATKRead read, final Interval qualityTrim ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc for trim methods added
if ( report1.getRefCoverage().isEmpty() ) { | ||
report2.updateCounts(moleculeCountsDisjointPair, codonTracker, variationCounts, reference); | ||
if ( !report2.getRefCoverage().isEmpty() ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible that both coverages will be empty. The trimming might pare away the entire alignment for both reads.
} else if ( report2.getRefCoverage().isEmpty() ) { | ||
report1.updateCounts(moleculeCountsDisjointPair, codonTracker, variationCounts, reference); | ||
if ( !report1.getRefCoverage().isEmpty() ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha. Just silliness. I removed the redundant test.
3e4efb7
to
07a03a2
Compare
Thanks for making the tool better! |
write BAM of rejected reads
trim reads to length of short fragments