Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ported force-active argument for HaplotypeCaller #5635

Merged
merged 2 commits into from
Feb 14, 2019
Merged

Conversation

asmirnov239
Copy link
Collaborator

@droazen Could you please review?

@codecov-io
Copy link

codecov-io commented Feb 3, 2019

Codecov Report

Merging #5635 into master will increase coverage by 0.017%.
The diff coverage is 100%.

@@               Coverage Diff               @@
##              master     #5635       +/-   ##
===============================================
+ Coverage     87.035%   87.052%   +0.017%     
+ Complexity     31726     31712       -14     
===============================================
  Files           1943      1940        -3     
  Lines         146193    146157       -36     
  Branches       16141     16129       -12     
===============================================
- Hits          127239    127232        -7     
+ Misses         13067     13040       -27     
+ Partials        5887      5885        -2
Impacted Files Coverage Δ Complexity Δ
...aplotypecaller/HaplotypeCallerIntegrationTest.java 88.521% <100%> (+0.259%) 85 <1> (+1) ⬆️
...oadinstitute/hellbender/engine/AssemblyRegion.java 84.733% <100%> (+0.237%) 49 <1> (+1) ⬆️
...titute/hellbender/engine/AssemblyRegionWalker.java 86.869% <100%> (+0.41%) 27 <0> (+1) ⬆️
...ender/engine/spark/datasources/ReadsSparkSink.java 56.944% <0%> (-2.778%) 17% <0%> (ø)
...te/hellbender/tools/PrintReadsIntegrationTest.java 96.795% <0%> (ø) 26% <0%> (ø) ⬇️
...nder/tools/spark/pipelines/ReadsPipelineSpark.java 90.741% <0%> (ø) 13% <0%> (ø) ⬇️
...hellbender/tools/spark/pipelines/SortSamSpark.java 100% <0%> (ø) 5% <0%> (-1%) ⬇️
...ollections/IntervalsSkipListOneContigUnitTest.java
...r/utils/collections/IntervalsSkipListUnitTest.java
...broadinstitute/hellbender/engine/ContextShard.java
... and 15 more

@droazen
Copy link
Contributor

droazen commented Feb 7, 2019

@asmirnov239 Thanks for implementing this. After our conversation, I realized that the GATK3 version of the --force-active argument works a bit differently: it sets all of the individual per-locus probabilities of being active to 1.0, rather than preserving the true probabilities and just setting every region's isActive flag to true, as you're doing here.

Ie., GATK3 does this:

    private void addIsActiveResult(final ActiveRegionWalker<M, T> walker,
                                   final RefMetaDataTracker tracker, final ReferenceContext refContext,
                                   final AlignmentContext locus) {
        // must be called, even if we won't use the result, to satisfy walker contract
        final ActivityProfileState state = walker.isActive( tracker, refContext, locus );
        if ( walker.forceActive) state.isActiveProb = 1.0;
        if ( ! walkerHasPresetRegions ) {
            activityProfile.add(state);
        }
    }

I'm not convinced that the GATK3 behavior is actually better. It almost seems preferable to me to keep the true "is active" probabilities intact (so that you get realistic region boundaries), and just tell the HaplotypeCallerEngine to treat all of the resulting regions as active, as you're doing in this branch, instead of overwriting all of the isActive probabilities with 1.0 as GATK3 did. But maybe there is a good argument in favor of the GATK 3 behavior.

Let's see what other people think -- @ldgauthier @davidbenjamin could you please weigh in as to which of the two --force-active implementations described above you'd prefer? Thanks!

@davidbenjamin
Copy link
Contributor

@asmirnov239's implementation makes a lot more sense to me than the GATK 3 one.

@ldgauthier
Copy link
Contributor

I don't have a good argument for overwriting the probabilities. I like the new approach.

Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back to you with my comments @asmirnov239. Seems like there's consensus that the approach in this branch is superior to the GATK3 implementation of this arg.

@@ -166,6 +166,10 @@ public boolean isActive() {
return isActive;
}

void setIsActive(final boolean value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add javadoc for this method, and explain why it's package-private (eg., "Changing the isActive state after construction is a debug-level operation that only engine classes like AssemblyRegionWalker should be able to do")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -74,6 +75,10 @@
@Argument(fullName = PROPAGATION_LONG_NAME, doc="Upper limit on how many bases away probability mass can be moved around when calculating the boundaries between active and inactive assembly regions", optional = true)
protected int maxProbPropagationDistance = defaultMaxProbPropagationDistance();

@Advanced
@Argument(fullName = FORCE_ACTIVE_REGIONS_LONG_NAME, doc = "If provided, all bases will be tagged as active", optional = true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"all bases will be tagged as active" -> "all regions will be marked as active"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"-L", "20:1-5000",
"-O", out.getAbsolutePath(),
"-pairHMM", "AVX_LOGLESS_CACHING",
"--" + AssemblyRegionWalker.FORCE_ACTIVE_REGIONS_LONG_NAME, "true",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm that this test case fails if you omit the --force-active option? Ie., try running it without the --force-active option in the command line and see what happens.

Copy link
Collaborator Author

@asmirnov239 asmirnov239 Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It fails. It outputs bunch of "-1"-s in "assembly-region-out" output instead of "1"s

@droazen droazen assigned asmirnov239 and unassigned droazen Feb 8, 2019
@asmirnov239
Copy link
Collaborator Author

Thanks for the review @droazen! Done addressing the comments, back to you

@asmirnov239 asmirnov239 assigned droazen and unassigned asmirnov239 Feb 13, 2019
Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 looks good -- merging

@droazen droazen merged commit 6e28a92 into master Feb 14, 2019
@droazen droazen deleted the as_port_force_active branch February 14, 2019 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants