Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GatherPileupSummaries #5599

Merged
merged 1 commit into from
Mar 21, 2019
Merged

GatherPileupSummaries #5599

merged 1 commit into from
Mar 21, 2019

Conversation

takutosato
Copy link
Contributor

@davidbenjamin will you review? Scatter-gather works as expected on the orientation bias evaluation samples.

@codecov-io
Copy link

codecov-io commented Jan 23, 2019

Codecov Report

Merging #5599 into master will increase coverage by 0.011%.
The diff coverage is 93.916%.

@@               Coverage Diff               @@
##              master     #5599       +/-   ##
===============================================
+ Coverage     87.031%   87.041%   +0.011%     
- Complexity     32107     32151       +44     
===============================================
  Files           1972      1975        +3     
  Lines         147194    147415      +221     
  Branches       16201     16225       +24     
===============================================
+ Hits          128104    128312      +208     
- Misses         13184     13188        +4     
- Partials        5906      5915        +9
Impacted Files Coverage Δ Complexity Δ
...titute/hellbender/utils/IntervalUtilsUnitTest.java 91.892% <ø> (-0.014%) 146 <0> (ø)
...ntamination/GetPileupSummariesIntegrationTest.java 95.455% <ø> (-0.29%) 4 <0> (ø)
...ols/walkers/readorientation/CollectF1R2Counts.java 75.949% <ø> (ø) 29 <0> (-1) ⬇️
...va/org/broadinstitute/hellbender/GATKBaseTest.java 100% <100%> (ø) 7 <0> (ø) ⬇️
...s/walkers/contamination/GatherPileupSummaries.java 100% <100%> (ø) 5 <5> (?)
...ientation/ReadOrientationModelIntegrationTest.java 100% <100%> (ø) 12 <10> (?)
...orientation/LearnReadOrientationModelUnitTest.java 100% <100%> (ø) 7 <7> (?)
...s/walkers/contamination/PileupSummaryUnitTest.java 100% <100%> (ø) 6 <5> (+4) ⬆️
...ation/LearnReadOrientationModelEngineUnitTest.java 95.745% <100%> (+0.074%) 45 <2> (+2) ⬆️
...ers/readorientation/LearnReadOrientationModel.java 92% <80%> (-4.078%) 36 <12> (+6)
... and 6 more

@davidbenjamin
Copy link
Contributor

As discussed offline, let's put pileup summaries inside the M2 task, but later leave CollectF1R2 in its own separate task. Otherwise looks good.

@takutosato
Copy link
Contributor Author

@davidbenjamin LearnReadOrientationModel now takes in lists of inputs, which allows for parallelization of CollectF1R2Counts. The updated wdl includes this change. I also moved GetPileupSumamries to the M2 task, as discussed offline. All the new changes are in the third commit, "update the wdl, prallelize..."

Copy link
Contributor

@davidbenjamin davidbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back to @takutosato with mild suggestions. Nice tests!

call LearnReadOrientationModel {
input:
alt_tables = CollectF1R2Counts.alt_table,
ref_histograms = CollectF1R2Counts.ref_histogram,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ref has plural "histograms" on LHS and singular "histogram" on RHS; alt has plural on both sides.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

gatk --java-options "-Xmx${command_mem}m" GetPileupSummaries -R ${ref_fasta} -I ${tumor_bam} ${"--interval-set-rule INTERSECTION -L " + intervals} \
-V ${variants_for_contamination} -L ${variants_for_contamination} -O tumor-pileups.table

if [[ -f ${normal_bam} ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this logic work with NIO where the normal bam isn't localized? I would think [[ -f ]] would give false.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the case and you have a fix I would do the same for variants for contamination. It's currently a File but we want to future-proof.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to ! -z. That should work, will test.

export GATK_LOCAL_JAR=${default="/root/gatk.jar" gatk_override}

gatk --java-options "-Xmx${command_mem}m" GatherPileupSummaries \
--sequence-dictionary ${ref_dict} \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent


gatk --java-options "-Xmx${command_mem}m" GatherPileupSummaries \
--sequence-dictionary ${ref_dict} \
-I ${sep=' -I ' input_tables} \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double-checking: is this error-free when input tables is empty or not present?

-ref-hist ${ref_histogram} \
-alt-hist ${alt_histograms} \
-O "${tumor_sample}-artifact-prior-table.tsv"
-at ${sep=" -at " alt_tables} \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

if (contigIndex1 != contigIndex2){
return contigIndex1 - contigIndex2;
} else {
return ps1.getStart() - ps2.getStart();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

)

public class CollectF1R2Counts extends LocusWalker {
public static final String ALT_DATA_TABLE_LONG_NAME = "alt-table";
public static final String ALT_DATA_TABLE_SHORT_NAME = "at";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better not to define short names for these. The long names are short enough. Note that this changes your wdls, which used the short names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Done

@@ -72,25 +80,22 @@

@Override
protected void onStartup(){
final MetricsFile<?, Integer> referenceSiteMetrics = readMetricsFile(refHistogramTable);
refHistograms = referenceSiteMetrics.getAllHistograms();
refHistograms = sumHistogramsFromFiles(refHistogramFiles);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should validate that the input lists all have the same length. If there's some check to make sure the headers' orders correspond that would be good too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validate input list size, done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a check for headers of the histograms, and TableReader will ensure that that altSiteRecords are formatted properly.

for (final Histogram<Integer> jthHistogram : ithHistograms){
final String refContext = jthHistogram.getValueLabel();
final Optional<Histogram<Integer>> hist = histogramList.stream().filter(h -> h.getValueLabel().equals(refContext)).findAny();
if (! hist.isPresent()){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Utils.validate(hist.isPresent())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

return new ImmutablePair<>(sample, records);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra whitespace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@takutosato takutosato force-pushed the ts_parallel branch 2 times, most recently from 7662a18 to ce633c0 Compare March 20, 2019 15:49
@takutosato
Copy link
Contributor Author

@davidbenjamin at long last, back to you. I updated the nio wdl too, and it passes the Firecloud M2 wdl validation with the HCC sample, but not with the cram. But that's because that cram file is aligned to hg38, whereas the workspace uses hg19. I didn't touch anything related to the CramToBam task in the nio wdl so I think we're OK.

Copy link
Contributor

@davidbenjamin davidbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I wouldn't test on a CRAM, either. It's just not worth our time for such a remote risk.

@davidbenjamin
Copy link
Contributor

@takutosato I figured it was a rebase thing. You can merge once Travis passes.

@takutosato takutosato merged commit fb2b5a2 into master Mar 21, 2019
@takutosato takutosato deleted the ts_parallel branch March 21, 2019 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants