Make HaplotypeCallerSpark extend AssemblyRegionWalkerSpark #5386

tomwhite · 2018-11-02T14:20:51Z

This brings HaplotypeCallerSpark into closer alignment with the walker version.

codecov-io · 2018-11-02T15:25:03Z

Codecov Report

Merging #5386 into master will increase coverage by 0.04%.
The diff coverage is 85.047%.

@@              Coverage Diff               @@
##              master     #5386      +/-   ##
==============================================
+ Coverage     86.985%   87.025%   +0.04%     
- Complexity     30423     30439      +16     
==============================================
  Files           1850      1852       +2     
  Lines         140944    140936       -8     
  Branches       15505     15508       +3     
==============================================
+ Hits          122600    122650      +50     
+ Misses         12705     12647      -58     
  Partials        5639      5639

Impacted Files	Coverage Δ	Complexity Δ
...der/tools/HaplotypeCallerSparkIntegrationTest.java	`58.73% <ø> (-3.035%)`	`12 <0> (-1)`
...ols/examples/ExampleAssemblyRegionWalkerSpark.java	`0% <0%> (ø)`	`0 <0> (ø)`	⬇️
...engine/spark/AssemblyRegionArgumentCollection.java	`100% <100%> (ø)`	`1 <1> (?)`
...nder/tools/spark/pipelines/ReadsPipelineSpark.java	`90.566% <100%> (+0.983%)`	`13 <1> (+1)`	⬆️
...ark/AssemblyRegionReadShardArgumentCollection.java	`100% <100%> (ø)`	`1 <1> (?)`
...bender/engine/spark/AssemblyRegionWalkerSpark.java	`77.083% <85.185%> (+77.083%)`	`15 <7> (+15)`	⬆️
...stitute/hellbender/tools/HaplotypeCallerSpark.java	`88.462% <92%> (+11.234%)`	`22 <13> (-4)`	⬇️
...utils/smithwaterman/SmithWatermanIntelAligner.java	`50% <0%> (-30%)`	`1% <0%> (-2%)`
...oadinstitute/hellbender/utils/gcs/BucketUtils.java	`80.488% <0%> (-0.61%)`	`42% <0%> (ø)`
...ithwaterman/SmithWatermanIntelAlignerUnitTest.java	`60% <0%> (ø)`	`2% <0%> (ø)`	⬇️
... and 7 more

jamesemery

I don't have a whole lot to say about this branch. I think it is a good change to fold the haplotype caller into an assembly region walker, though we may end up doing some extensive work on the walker at some point .

jamesemery · 2018-11-05T21:30:43Z

src/main/java/org/broadinstitute/hellbender/engine/spark/AssemblyRegionWalkerSpark.java

-                        new FeatureContext(features, assemblyRegion.getExtendedSpan()))).iterator();
+            AssemblyRegionEvaluator assemblyRegionEvaluator = supplierBroadcast.getValue().get(); // one AssemblyRegionEvaluator instance per Spark partition
+            final ReadsDownsampler readsDownsampler = assemblyRegionArgs.maxReadsPerAlignmentStart > 0 ?
+                    new PositionalDownsampler(assemblyRegionArgs.maxReadsPerAlignmentStart, header) : null;


Does the downsampler want to be configurable at this stage?

Not sure that's needed yet since the examples and HaplotypeCaller both use the PositionalDownsampler. There would be a bit of work to ensure the downsampler is serializable.

jamesemery · 2018-11-05T21:33:28Z

src/main/java/org/broadinstitute/hellbender/engine/spark/AssemblyRegionWalkerSpark.java

+                            new DownsampleableSparkReadShard(
+                                    new ShardBoundary(shardedRead.getInterval(), shardedRead.getPaddedInterval()), shardedRead, readsDownsampler)))
+                    .map(shardedRead -> {
+                final Iterator<AssemblyRegion> assemblyRegionIter = new AssemblyRegionIterator(


A potential optimization we may want to look at for the haplotype caller is being able to generate these assembly regions and reconstruct the constituent up the reads later to save on holding an entire bam in an rdd at a time. That doesn't seem easy to do given the way this class is structured.

I've got a branch based on this one that does exactly that :-)

jamesemery · 2018-11-05T21:34:11Z

src/main/java/org/broadinstitute/hellbender/tools/HaplotypeCallerSpark.java


+    @Override
+    protected AssemblyRegionArgumentCollection getAssemblyRegionArgumentCollection() {
+        return new ExampleAssemblyRegionWalkerSpark.ExampleAssemblyRegionArgumentCollection();


Why are these arguments getting pulled in from the example tool? Why can't this be its own argument collection?

That's a mistake - I've fixed it now.

…ll allowing ReadsPipelineSpark to share common code.

tomwhite · 2018-11-07T14:08:11Z

Thanks for the review @jamesemery. I have tried these changes on a cluster with an exome-sized input, with no performance issues, so I'd be comfortable merging. As you say it will be a useful basis for the HaplotypeCallerSpark work we are doing.

jamesemery

This branch looks more or less good to me. Considering that these changes are temporary before your future changes to this machinery I would say this could be approved 👍

…titute#5386) * Factor out command line options for assembly region walkers (Spark) * Make HaplotypeCallerSpark extend AssemblyRegionWalkerSpark, while still allowing ReadsPipelineSpark to share common code.

Factor out command line options for assembly region walkers (Spark)

379f17a

jamesemery reviewed Nov 5, 2018

View reviewed changes

tomwhite force-pushed the tw_spark_files_optimization_hc_as_spark_walker branch from a1730f7 to a7caab2 Compare November 7, 2018 09:51

Make HaplotypeCallerSpark extend AssemblyRegionWalkerSpark, while sti…

9d1c3b5

…ll allowing ReadsPipelineSpark to share common code.

tomwhite force-pushed the tw_spark_files_optimization_hc_as_spark_walker branch from a7caab2 to 9d1c3b5 Compare November 7, 2018 14:05

jamesemery approved these changes Nov 7, 2018

View reviewed changes

tomwhite merged commit 626c887 into master Nov 8, 2018

tomwhite deleted the tw_spark_files_optimization_hc_as_spark_walker branch November 8, 2018 08:22

tomwhite added a commit that referenced this pull request Nov 15, 2018

Update Spark scripts to reflect changes from #5386 and #5127.

8dcf46f

tomwhite added a commit that referenced this pull request Dec 10, 2018

Update Spark scripts to reflect changes from #5386 and #5127. (#5415)

1de11f1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make HaplotypeCallerSpark extend AssemblyRegionWalkerSpark #5386

Make HaplotypeCallerSpark extend AssemblyRegionWalkerSpark #5386

tomwhite commented Nov 2, 2018

codecov-io commented Nov 2, 2018 •

edited

Loading

jamesemery left a comment

jamesemery Nov 5, 2018

tomwhite Nov 7, 2018

jamesemery Nov 5, 2018

tomwhite Nov 7, 2018

jamesemery Nov 5, 2018

tomwhite Nov 7, 2018

tomwhite commented Nov 7, 2018

jamesemery left a comment

Make HaplotypeCallerSpark extend AssemblyRegionWalkerSpark #5386

Make HaplotypeCallerSpark extend AssemblyRegionWalkerSpark #5386

Conversation

tomwhite commented Nov 2, 2018

codecov-io commented Nov 2, 2018 • edited Loading

Codecov Report

jamesemery left a comment

Choose a reason for hiding this comment

jamesemery Nov 5, 2018

Choose a reason for hiding this comment

tomwhite Nov 7, 2018

Choose a reason for hiding this comment

jamesemery Nov 5, 2018

Choose a reason for hiding this comment

tomwhite Nov 7, 2018

Choose a reason for hiding this comment

jamesemery Nov 5, 2018

Choose a reason for hiding this comment

tomwhite Nov 7, 2018

Choose a reason for hiding this comment

tomwhite commented Nov 7, 2018

jamesemery left a comment

Choose a reason for hiding this comment

codecov-io commented Nov 2, 2018 •

edited

Loading