Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change UpdateVCFSequenceDictionary to use the specified dictionary uniformly. #5093

Merged
merged 3 commits into from
Feb 11, 2019

Conversation

cmnbroad
Copy link
Collaborator

@cmnbroad cmnbroad commented Aug 8, 2018

Fixes #5087

@codecov-io
Copy link

codecov-io commented Aug 8, 2018

Codecov Report

Merging #5093 into master will decrease coverage by 0.001%.
The diff coverage is 90.476%.

@@               Coverage Diff               @@
##              master     #5093       +/-   ##
===============================================
- Coverage     87.037%   87.036%   -0.001%     
- Complexity     31728     31735        +7     
===============================================
  Files           1943      1943               
  Lines         146193    146213       +20     
  Branches       16141     16145        +4     
===============================================
+ Hits          127242    127258       +16     
- Misses         13064     13068        +4     
  Partials        5887      5887
Impacted Files Coverage Δ Complexity Δ
...institute/hellbender/engine/VariantWalkerBase.java 100% <ø> (ø) 14 <0> (ø) ⬇️
...kers/variantutils/UpdateVCFSequenceDictionary.java 88.462% <85.714%> (+1.505%) 16 <5> (+2) ⬆️
...ls/UpdateVCFSequenceDictionaryIntegrationTest.java 92.727% <92.857%> (+0.044%) 16 <6> (+7) ⬆️
...utils/smithwaterman/SmithWatermanIntelAligner.java 50% <0%> (-30%) 1% <0%> (-2%)
...ithwaterman/SmithWatermanIntelAlignerUnitTest.java 60% <0%> (ø) 2% <0%> (ø) ⬇️

@droazen droazen self-requested a review August 24, 2018 18:36
@droazen droazen self-assigned this Aug 24, 2018
@droazen droazen requested review from jonn-smith and removed request for droazen August 24, 2018 19:00
@droazen droazen assigned jonn-smith and unassigned droazen Aug 24, 2018
@droazen
Copy link
Contributor

droazen commented Aug 24, 2018

@jonn-smith Can you do a review on this one when you get a chance?

Copy link
Collaborator

@jonn-smith jonn-smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor questions / comments.

final SAMSequenceDictionary masterDictionary = getMasterSequenceDictionary();
if (dictionarySource == null) {
if (masterDictionary != null) {
// We'll accept the master dictionary if one was specified. Using the master dictionary
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you throw in a logging statement that the dictionary is being overridden? I know it'll be captured in the arguments used to call a tool, but it seems like another explicit warning would be good (particularly in cases where the specified dictionary is different from the embedded dictionary in the file / etc).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, done.

if (dictionarySource == null) {
if (masterDictionary != null) {
// We'll accept the master dictionary if one was specified. Using the master dictionary
// arg will result in sequence dictionary validation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case where invalid dictionaries for the data are used (such as in #5087), can we detect it and throw a warning?

If the user specifies a bad sequence dictionary we should let them know that something bad could happen.

(This is essentially the same request as the above comment.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tool does have code to validate the new SD (in the first few lines of the apply method), and there are also tests for that. The problem in #5087 was that the indexer was getting the old (invalid) SD via the standard GATK machinery that uses the SD from the input. The change in this PR was to unify those so the new one is used everywhere.

final SAMSequenceDictionary expectedSequenceDictionary = SAMSequenceDictionaryExtractor.extractDictionary(
Paths.get(new File(testDir, "exampleFASTA.dict").getAbsolutePath()));

// verify only the sequence names and lengths, since other attributes such as MD/UR will have been updated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmnbroad Is it worth verifying that the other attributes are different?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is. In real life usage, whether or not these attributes these attributes change depends on the state of the original inputs. Since this test uses a starting vcf that has no sequence dictionary at all, the MD5 and UR attributes get added on, but we just need to validate that the correct SD got propagated.

@cmnbroad
Copy link
Collaborator Author

cmnbroad commented Aug 31, 2018

Answers inline and made a minor update based on review comments. Back to @jonn-smith. Also this branch has conflict now I'll rebase and run test again once we're ready to merge.

@droazen droazen assigned jamesemery and unassigned jonn-smith Feb 8, 2019
Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 This looks good, in the interest of merging things today I'm going to rebase this and try to merge it for you.

@cmnbroad cmnbroad dismissed jonn-smith’s stale review February 11, 2019 14:58

Comments all addressed and approved.

@cmnbroad cmnbroad merged commit e8069b7 into master Feb 11, 2019
@cmnbroad
Copy link
Collaborator Author

Thanks @jonn-smith and @jamesemery.

@cmnbroad cmnbroad deleted the cn_update_seqdict branch February 11, 2019 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants