Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numerical consistency between python and java, transpose fix #4652

Merged
merged 1 commit into from
Apr 24, 2018

Conversation

lucidtronix
Copy link
Contributor

This PR brings pure python inference and java-python streaming inference into much better agreement. also a transpose bugfix.

@cmnbroad cmnbroad self-assigned this Apr 13, 2018
}
while (readIt.hasNext()) {
sb.append(GATKReadToString(readIt.next()));
GATKRead r = readIt.next();
if (r.getReadGroup().toLowerCase().contains("artificial")){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lucidtronix Whats the reason for this change ? It seems to be casting a pretty wide net. HC-generated artificial read groups ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah exactly, we've always excluded them from the read tensor. But I agree, I will add an argument to toggle this.

@codecov-io
Copy link

codecov-io commented Apr 13, 2018

Codecov Report

Merging #4652 into master will increase coverage by 0.121%.
The diff coverage is 100%.

@@               Coverage Diff               @@
##              master     #4652       +/-   ##
===============================================
+ Coverage     79.852%   79.972%   +0.121%     
- Complexity     17336     17710      +374     
===============================================
  Files           1074      1076        +2     
  Lines          62938     63946     +1008     
  Branches       10186     10496      +310     
===============================================
+ Hits           50257     51139      +882     
- Misses          8701      8781       +80     
- Partials        3980      4026       +46
Impacted Files Coverage Δ Complexity Δ
...hellbender/utils/haplotype/HaplotypeBAMWriter.java 98.077% <ø> (ø) 10 <0> (ø) ⬇️
...r/engine/filters/ReadGroupBlackListReadFilter.java 83.636% <100%> (+0.303%) 17 <0> (ø) ⬇️
...ellbender/tools/walkers/vqsr/CNNScoreVariants.java 74.886% <100%> (+0.829%) 41 <1> (+1) ⬆️
...utils/smithwaterman/SmithWatermanIntelAligner.java 50% <0%> (-30%) 1% <0%> (-2%)
...decs/xsvLocatableTable/XsvLocatableTableCodec.java 78.947% <0%> (-3.769%) 69% <0%> (+9%)
...itute/hellbender/tools/walkers/bqsr/ApplyBQSR.java 90.909% <0%> (-0.758%) 11% <0%> (+5%)
...lkers/ReferenceConfidenceVariantContextMerger.java 94.979% <0%> (-0.278%) 69% <0%> (-2%)
...lugin/DefaultGATKReadFilterArgumentCollection.java 100% <0%> (ø) 8% <0%> (+4%) ⬆️
...llbender/utils/test/SimpleIntervalTestFactory.java 0% <0%> (ø) 0% <0%> (?)
.../annotatedregion/SimpleAnnotatedGenomicRegion.java 82.222% <0%> (ø) 14% <0%> (?)
... and 11 more

}
while (readIt.hasNext()) {
sb.append(GATKReadToString(readIt.next()));
GATKRead r = readIt.next();
if (!keepArtificialReadGroup && r.getReadGroup().toLowerCase().contains("artificial")){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I was thinking it was overly broad because it matches any group name containing the string "artificial", rather than because of lack of command line control. Instead of doing this manually, I would add an override of getDefaultReadFilters that adds a ReadGroupBlackListReadFilter initialized with HaplotypeBamWriter.DEFAULT_HAPLOTYPE_READ_GROUP_ID (which is the actual string used by HC - will have to be made public), prefixed with the string "ID:". That will provide an exact match to the synthetic HC read group, the engine will filter the reads out automatically so the tool never sees them, and it can be disabled using the existing builtin argument --disable-read-filter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I also had to make the read group black list argument optional and for gatk3 compatibility also filter "ArtificialHaplotype" read groups.

Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor changes requested.

public ReadGroupBlackListReadFilter(final List<String> blackLists, final SAMFileHeader header) {
super.setHeader(header);
this.blackList = blackLists;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would copy the contents (this.blackList.addAll(blackLists)) rather than just the reference, in case the input changes or is immutable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -208,6 +207,16 @@ protected VariantFilter makeVariantFilter(){
}
}

@Override
public List<ReadFilter> getDefaultReadFilters() {
List<ReadFilter> readFilters = new ArrayList<>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list should be initialized with the values returned from super.getDefaultReadFilters to make sure you get the engine defaults like WellFormedReadFilter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

List<ReadFilter> readFilters = new ArrayList<>();
List<String> filterList = new ArrayList<>();
filterList.add("ID:" + HaplotypeBAMWriter.DEFAULT_HAPLOTYPE_READ_GROUP_ID);
filterList.add("ID:ArtificialHaplotype"); // GATK3 Compatibility
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a DEFAULT_GATK3_HAPLOTYPE_READ_GROUP_ID constant to HaplotypeBAMWriter with a comment saying its for use with backward compatible read group read filters, and reference that here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@cmnbroad
Copy link
Collaborator

@lucidtronix This looks good now. Can you squash it down to one commit and rebase and then we can merge once the tests pass.

@lucidtronix lucidtronix force-pushed the sf_cnn_fixes branch 2 times, most recently from 10c0d87 to 9945a5c Compare April 24, 2018 12:21
@lucidtronix lucidtronix merged commit 7d25ceb into master Apr 24, 2018
@lucidtronix lucidtronix deleted the sf_cnn_fixes branch April 24, 2018 13:27
cwhelan pushed a commit to cwhelan/gatk-linked-reads that referenced this pull request May 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants