-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changed FilterIntervals to operate on the intersection of intervals in all inputs. #5408
Conversation
11ce2cf
to
3a3263f
Compare
Codecov Report
@@ Coverage Diff @@
## master #5408 +/- ##
==============================================
- Coverage 87.07% 87.069% -0.001%
- Complexity 31244 31249 +5
==============================================
Files 1922 1922
Lines 144210 144262 +52
Branches 15916 15916
==============================================
+ Hits 125564 125608 +44
- Misses 12872 12881 +9
+ Partials 5774 5773 -1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great @slee, thanks for making this fix
final File firstReadCountFile = inputReadCountFiles.get(0); | ||
final SimpleCountCollection firstReadCounts = SimpleCountCollection.read(firstReadCountFile); | ||
metadata = firstReadCounts.getMetadata(); | ||
resolved = intervalArgumentCollection.getIntervals(metadata.getSequenceDictionary()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe check that the sequence dictionary is the same in counts file and intervals file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intervals file may not have a sequence dictionary (e.g., we can just pass intervals as strings from the command line). This code takes the sequence dictionary from the first read count file, which is used to check other inputs. Probably we should check for the presence of a master dictionary, which may come from the intervals---let me try that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, actually it looks like checking interval files for a sequence dictionary would require a bit of extra code. (Even getBestAvailableSequenceDictionary
in GATKTool
has a TODO
for this.) Will punt for the time being.
countFiles, lowCountFilterCountThreshold, percentageOfSamples, 1., 99., percentageOfSamples, | ||
IntStream.range(numIntervalsBelowCountThreshold + 1, numIntervalsBelowCountThreshold + 99).boxed().collect(Collectors.toList())}, | ||
{intervalsFile, annotatedIntervalsFile, 0.1, 0.9, 0.1, 0.9, 0.1, 0.9, | ||
{intervalsFile, Collections.emptyList(), annotatedIntervalsFile, 0.1, 0.9, 0.1, 0.9, 0.1, 0.9, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have a test case where the annotatedIntervalsFile and countFiles is missing a few entries, up to you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, added a few simple tests where the -L
intervals have an extra interval that should be dropped. Didn't add tests to check for the intersection of annotated intervals, count intervals, and -L
intervals, though...
3a3263f
to
bfc33e1
Compare
Thanks @asmirnov239, will merge once tests pass unless you have further objections! |
Closes #5388.