-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GermlineCNVCaller -XL
gives error "Intervals for read-count file do not contain all specified intervals"
#5388
Comments
@samuelklee, I tested using v4.0.11.0 and see the same error with |
Can you describe the inputs you are passing to Is there any issue when you use |
Just to be clear, the gCNV code expects that the intervals obtained after resolving the inputs of |
Yes, I see the same error when using FilterIntervals in the context of the WDL when supplying the BED file with the |
Thanks for clarifying the |
Hmm, actually, I think there might be more to it. The current FilterIntervals code also assumes that the annotated bins match the bins resolved from -L/-XL (this is a bug). So if you pass the same -L/-XL upstream to AnnotateIntervals, then you should be OK. However, you might still get into trouble if your coverage bins don't match exactly (which might happen if -XL splits some bins). Let me try and clean some of this up in a future PR. (I think the underlying issue is that we are trying to leverage some of the -L/-XL machinery provided by the engine, which should hopefully be familiar to users, but some of the assumptions the engine makes aren't really in line with what we need for CNV. This is also reflected in the awkward need for |
Did some more thinking about this issue. Ideally, we'd drop all -L bins that overlap at all with any -XL regions, then check that the remaining bins are a subset of the annotated intervals and/or count files, if available. This seems most natural, in that -L/-XL would specify the desired set of intervals for filtering, and we'd fail if all of these are not available in the other inputs. However, due to the way intervals are resolved by the engine, I don't think it's easy to identify which bins overlap with -XL regions---the engine will instead split bins and retain the parts that don't overlap. So alternatively, if we assume that in typical use the annotated intervals and count files will contain the desired intervals as a subset, we can simply take the intersection of all intervals to drop these partial bins. However, if a user screws up and provides annotated intervals or count files with bins that don't match those specified via -L, then we don't really have a good strategy for failing---probably the only fair check we can do is fail if no bins remain after intersection. If we assume that users will typically be using or following the WDL, I think I'm OK with the second strategy. Any objections or thoughts, @sooheelee? |
Bug Report
Affected tool(s) or class(es)
GermlineCNVCaller
Affected version(s)
v4.0.4.0 and v4.0.11.0 tested with same result
Description
Command runs fine sans
-XL
parameter. The contents of-XL
are simply:Expected behavior
It would be great to be able to iterate GermlineCNVCaller on coverage data while excluding various regions, e.g. centromeric regions, to test the impact of such regions on the denoising. Currently, the hypothetical workaround would be to collect coverage while excluding regions or to manually remove such intervals from the coverage data. Having to collect coverage once over all of the data is preferable to collecting coverage again and again over slightly variable regions.
The text was updated successfully, but these errors were encountered: