Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed FilterIntervals to operate on the intersection of intervals in all inputs. #5408

Merged
merged 2 commits into from
Dec 7, 2018

Conversation

samuelklee
Copy link
Contributor

Closes #5388.

@codecov-io
Copy link

codecov-io commented Nov 14, 2018

Codecov Report

Merging #5408 into master will decrease coverage by 0.001%.
The diff coverage is 97.872%.

@@              Coverage Diff               @@
##             master     #5408       +/-   ##
==============================================
- Coverage     87.07%   87.069%   -0.001%     
- Complexity    31244     31249        +5     
==============================================
  Files          1922      1922               
  Lines        144210    144262       +52     
  Branches      15916     15916               
==============================================
+ Hits         125564    125608       +44     
- Misses        12872     12881        +9     
+ Partials       5774      5773        -1
Impacted Files Coverage Δ Complexity Δ
...ols/copynumber/FilterIntervalsIntegrationTest.java 100% <100%> (ø) 27 <4> (+4) ⬆️
...e/hellbender/tools/copynumber/FilterIntervals.java 90.446% <95.918%> (+0.591%) 45 <9> (+1) ⬆️
...r/arguments/CopyNumberArgumentValidationUtils.java 66.667% <0%> (-11.111%) 19% <0%> (-1%)
...e/hellbender/engine/spark/SparkContextFactory.java 73.973% <0%> (+2.74%) 11% <0%> (ø) ⬇️

Copy link
Collaborator

@asmirnov239 asmirnov239 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @slee, thanks for making this fix

final File firstReadCountFile = inputReadCountFiles.get(0);
final SimpleCountCollection firstReadCounts = SimpleCountCollection.read(firstReadCountFile);
metadata = firstReadCounts.getMetadata();
resolved = intervalArgumentCollection.getIntervals(metadata.getSequenceDictionary());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe check that the sequence dictionary is the same in counts file and intervals file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intervals file may not have a sequence dictionary (e.g., we can just pass intervals as strings from the command line). This code takes the sequence dictionary from the first read count file, which is used to check other inputs. Probably we should check for the presence of a master dictionary, which may come from the intervals---let me try that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, actually it looks like checking interval files for a sequence dictionary would require a bit of extra code. (Even getBestAvailableSequenceDictionary in GATKTool has a TODO for this.) Will punt for the time being.

countFiles, lowCountFilterCountThreshold, percentageOfSamples, 1., 99., percentageOfSamples,
IntStream.range(numIntervalsBelowCountThreshold + 1, numIntervalsBelowCountThreshold + 99).boxed().collect(Collectors.toList())},
{intervalsFile, annotatedIntervalsFile, 0.1, 0.9, 0.1, 0.9, 0.1, 0.9,
{intervalsFile, Collections.emptyList(), annotatedIntervalsFile, 0.1, 0.9, 0.1, 0.9, 0.1, 0.9,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a test case where the annotatedIntervalsFile and countFiles is missing a few entries, up to you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, added a few simple tests where the -L intervals have an extra interval that should be dropped. Didn't add tests to check for the intersection of annotated intervals, count intervals, and -L intervals, though...

@samuelklee
Copy link
Contributor Author

Thanks @asmirnov239, will merge once tests pass unless you have further objections!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants