-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GatherPileupSummaries #5599
GatherPileupSummaries #5599
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5599 +/- ##
===============================================
+ Coverage 87.031% 87.041% +0.011%
- Complexity 32107 32151 +44
===============================================
Files 1972 1975 +3
Lines 147194 147415 +221
Branches 16201 16225 +24
===============================================
+ Hits 128104 128312 +208
- Misses 13184 13188 +4
- Partials 5906 5915 +9
|
As discussed offline, let's put pileup summaries inside the M2 task, but later leave CollectF1R2 in its own separate task. Otherwise looks good. |
bb1b14f
to
606f04e
Compare
@davidbenjamin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back to @takutosato with mild suggestions. Nice tests!
scripts/mutect2_wdl/mutect2.wdl
Outdated
call LearnReadOrientationModel { | ||
input: | ||
alt_tables = CollectF1R2Counts.alt_table, | ||
ref_histograms = CollectF1R2Counts.ref_histogram, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ref has plural "histograms" on LHS and singular "histogram" on RHS; alt has plural on both sides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
scripts/mutect2_wdl/mutect2.wdl
Outdated
gatk --java-options "-Xmx${command_mem}m" GetPileupSummaries -R ${ref_fasta} -I ${tumor_bam} ${"--interval-set-rule INTERSECTION -L " + intervals} \ | ||
-V ${variants_for_contamination} -L ${variants_for_contamination} -O tumor-pileups.table | ||
|
||
if [[ -f ${normal_bam} ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this logic work with NIO where the normal bam isn't localized? I would think [[ -f ]]
would give false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the case and you have a fix I would do the same for variants for contamination. It's currently a File
but we want to future-proof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to ! -z
. That should work, will test.
export GATK_LOCAL_JAR=${default="/root/gatk.jar" gatk_override} | ||
|
||
gatk --java-options "-Xmx${command_mem}m" GatherPileupSummaries \ | ||
--sequence-dictionary ${ref_dict} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent
|
||
gatk --java-options "-Xmx${command_mem}m" GatherPileupSummaries \ | ||
--sequence-dictionary ${ref_dict} \ | ||
-I ${sep=' -I ' input_tables} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double-checking: is this error-free when input tables is empty or not present?
scripts/mutect2_wdl/mutect2.wdl
Outdated
-ref-hist ${ref_histogram} \ | ||
-alt-hist ${alt_histograms} \ | ||
-O "${tumor_sample}-artifact-prior-table.tsv" | ||
-at ${sep=" -at " alt_tables} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent
if (contigIndex1 != contigIndex2){ | ||
return contigIndex1 - contigIndex2; | ||
} else { | ||
return ps1.getStart() - ps2.getStart(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
) | ||
|
||
public class CollectF1R2Counts extends LocusWalker { | ||
public static final String ALT_DATA_TABLE_LONG_NAME = "alt-table"; | ||
public static final String ALT_DATA_TABLE_SHORT_NAME = "at"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better not to define short names for these. The long names are short enough. Note that this changes your wdls, which used the short names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Done
@@ -72,25 +80,22 @@ | |||
|
|||
@Override | |||
protected void onStartup(){ | |||
final MetricsFile<?, Integer> referenceSiteMetrics = readMetricsFile(refHistogramTable); | |||
refHistograms = referenceSiteMetrics.getAllHistograms(); | |||
refHistograms = sumHistogramsFromFiles(refHistogramFiles); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should validate that the input lists all have the same length. If there's some check to make sure the headers' orders correspond that would be good too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate input list size, done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a check for headers of the histograms, and TableReader
will ensure that that altSiteRecords
are formatted properly.
for (final Histogram<Integer> jthHistogram : ithHistograms){ | ||
final String refContext = jthHistogram.getValueLabel(); | ||
final Optional<Histogram<Integer>> hist = histogramList.stream().filter(h -> h.getValueLabel().equals(refContext)).findAny(); | ||
if (! hist.isPresent()){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Utils.validate(hist.isPresent())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
return new ImmutablePair<>(sample, records); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra whitespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
7662a18
to
ce633c0
Compare
@davidbenjamin at long last, back to you. I updated the nio wdl too, and it passes the Firecloud M2 wdl validation with the HCC sample, but not with the cram. But that's because that cram file is aligned to hg38, whereas the workspace uses hg19. I didn't touch anything related to the CramToBam task in the nio wdl so I think we're OK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I wouldn't test on a CRAM, either. It's just not worth our time for such a remote risk.
@takutosato I figured it was a rebase thing. You can merge once Travis passes. |
@davidbenjamin will you review? Scatter-gather works as expected on the orientation bias evaluation samples.