-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IntervalWalker, VariantWalker enhancements, and GenomeLoc -> SimpleInterval migration in the engine #297
Conversation
Coverage increased (+0.1%) to 75.04% when pulling f1a0bc8293137afb737bc82e0aaa1e30a1cd9d3b on dr_interval_walker into ae98b8c on master. |
Coverage increased (+0.08%) to 75.03% when pulling 61cb9e15970383ed11b35449f7a8dc055838b2e2 on dr_interval_walker into ae98b8c on master. |
61cb9e1
to
fa873e6
Compare
Comments addressed, ready for a second round of review. Two new commits: one for the trivial changes, and another to satisfy the request to push more functionality up into |
Coverage increased (+0.11%) to 75.03% when pulling fa873e6c3d88d0f8eadace5bd323ac57b3829eb3 on dr_interval_walker into db029de on master. |
f6ca61f
to
6cb4dc2
Compare
Coverage increased (+0.11%) to 75.03% when pulling 6cb4dc27cda2bd4843ac4feb4025ee9dfa8eca15 on dr_interval_walker into db029de on master. |
Coverage increased (+0.11%) to 75.03% when pulling 6cb4dc27cda2bd4843ac4feb4025ee9dfa8eca15 on dr_interval_walker into db029de on master. |
6cb4dc2
to
1508c9a
Compare
Coverage increased (+0.11%) to 75.03% when pulling 1508c9a634fd458e82d34eb37da75c3bcd83d1df on dr_interval_walker into db029de on master. |
…terval migration in the engine -Created a new class of tool, IntervalWalker, that processes a single interval at a time, with the ability to query optional overlapping sources of reads, reference data, and/or features/variants. Current implementation is simple/naive with no special caching; performance issues will be addressed once we port this traversal type to dataflow. -Added the ability for VariantWalkers to access contextual reads/reference/feature data. -To enable the above changes, migrated most of the engine to use SimpleIntervals rather than GenomeLocs. This allows for the creation of Context objects in traversals where there is not necessarily a sequence dictionary available (eg., VariantWalker). -Moved shared arguments/code from Walker classes up into GATKTool. Still some issues related to marking engine-wide arguments as optional/required on a per-traversal or per-tool basis, but tickets have been created for these. -Since there isn't yet an htsjdk release that contains SimpleInterval, temporarily checked a copy of it into our repo, which we can remove the next time we rev htsjdk. TODOs: -We currently still require a sequence dictionary to actually parse intervals in IntervalArgumentCollection. This is due entirely to our support of intervals without specific stop positions (eg., "chr1" and "chr1:1+") -- for these intervals we must look up the stop position in a sequence dictionary. This means that IntervalWalkers currently require at least one input that contains a sequence dictionary (although VariantWalkers do not). We should look into ways of relaxing this restriction. Resolves #109
1508c9a
to
d814052
Compare
All comments addressed -- merging! |
IntervalWalker, VariantWalker enhancements, and GenomeLoc -> SimpleInterval migration in the engine
-Created a new class of tool, IntervalWalker, that processes a single interval at a time,
with the ability to query optional overlapping sources of reads, reference data, and/or
features/variants. Current implementation is simple/naive with no special caching;
performance issues will be addressed once we port this traversal type to dataflow.
-Added the ability for VariantWalkers to access contextual reads/reference/feature data.
-To enable the above changes, migrated most of the engine to use SimpleIntervals rather
than GenomeLocs. This allows for the creation of Context objects in traversals where there
is not necessarily a sequence dictionary available (eg., VariantWalker).
-Moved shared arguments/code from Walker classes up into GATKTool. Still some issues
related to marking engine-wide arguments as optional/required on a per-traversal or
per-tool basis, but tickets have been created for these.
-Since there isn't yet an htsjdk release that contains SimpleInterval, temporarily
checked a copy of it into our repo, which we can remove the next time we
rev htsjdk.
TODOs:
-We currently still require a sequence dictionary to actually parse intervals in
IntervalArgumentCollection. This is due entirely to our support of intervals without
specific stop positions (eg., "chr1" and "chr1:1+") -- for these intervals we must
look up the stop position in a sequence dictionary. This means that IntervalWalkers
currently require at least one input that contains a sequence dictionary (although
VariantWalkers do not). We should look into ways of relaxing this restriction.
Resolves #109