Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IntervalWalker, VariantWalker enhancements, and GenomeLoc -> SimpleInterval migration in the engine #297

Merged
merged 1 commit into from
Mar 14, 2015

Conversation

droazen
Copy link
Contributor

@droazen droazen commented Mar 12, 2015

-Created a new class of tool, IntervalWalker, that processes a single interval at a time,
with the ability to query optional overlapping sources of reads, reference data, and/or
features/variants. Current implementation is simple/naive with no special caching;
performance issues will be addressed once we port this traversal type to dataflow.

-Added the ability for VariantWalkers to access contextual reads/reference/feature data.

-To enable the above changes, migrated most of the engine to use SimpleIntervals rather
than GenomeLocs. This allows for the creation of Context objects in traversals where there
is not necessarily a sequence dictionary available (eg., VariantWalker).

-Moved shared arguments/code from Walker classes up into GATKTool. Still some issues
related to marking engine-wide arguments as optional/required on a per-traversal or
per-tool basis, but tickets have been created for these.

-Since there isn't yet an htsjdk release that contains SimpleInterval, temporarily
checked a copy of it into our repo, which we can remove the next time we
rev htsjdk.

TODOs:

-We currently still require a sequence dictionary to actually parse intervals in
IntervalArgumentCollection. This is due entirely to our support of intervals without
specific stop positions (eg., "chr1" and "chr1:1+") -- for these intervals we must
look up the stop position in a sequence dictionary. This means that IntervalWalkers
currently require at least one input that contains a sequence dictionary (although
VariantWalkers do not). We should look into ways of relaxing this restriction.

Resolves #109

@coveralls
Copy link
Collaborator

Coverage Status

Coverage increased (+0.1%) to 75.04% when pulling f1a0bc8293137afb737bc82e0aaa1e30a1cd9d3b on dr_interval_walker into ae98b8c on master.

@coveralls
Copy link
Collaborator

Coverage Status

Coverage increased (+0.08%) to 75.03% when pulling 61cb9e15970383ed11b35449f7a8dc055838b2e2 on dr_interval_walker into ae98b8c on master.

@droazen
Copy link
Contributor Author

droazen commented Mar 13, 2015

Comments addressed, ready for a second round of review. Two new commits: one for the trivial changes, and another to satisfy the request to push more functionality up into GATKTool. Currently there is still a bit of awkwardness (engine-wide input arguments are nominally marked as optional, and traversal types and/or individual tools override methods like requiresReads() to indicate input requirements), but I feel it's at least a step in the right direction.

@coveralls
Copy link
Collaborator

Coverage Status

Coverage increased (+0.11%) to 75.03% when pulling fa873e6c3d88d0f8eadace5bd323ac57b3829eb3 on dr_interval_walker into db029de on master.

@droazen droazen force-pushed the dr_interval_walker branch 2 times, most recently from f6ca61f to 6cb4dc2 Compare March 14, 2015 14:58
@coveralls
Copy link
Collaborator

Coverage Status

Coverage increased (+0.11%) to 75.03% when pulling 6cb4dc27cda2bd4843ac4feb4025ee9dfa8eca15 on dr_interval_walker into db029de on master.

@coveralls
Copy link
Collaborator

Coverage Status

Coverage increased (+0.11%) to 75.03% when pulling 6cb4dc27cda2bd4843ac4feb4025ee9dfa8eca15 on dr_interval_walker into db029de on master.

@coveralls
Copy link
Collaborator

Coverage Status

Coverage increased (+0.11%) to 75.03% when pulling 1508c9a634fd458e82d34eb37da75c3bcd83d1df on dr_interval_walker into db029de on master.

…terval migration in the engine

-Created a new class of tool, IntervalWalker, that processes a single interval at a time,
 with the ability to query optional overlapping sources of reads, reference data, and/or
 features/variants. Current implementation is simple/naive with no special caching;
 performance issues will be addressed once we port this traversal type to dataflow.

-Added the ability for VariantWalkers to access contextual reads/reference/feature data.

-To enable the above changes, migrated most of the engine to use SimpleIntervals rather
 than GenomeLocs. This allows for the creation of Context objects in traversals where there
 is not necessarily a sequence dictionary available (eg., VariantWalker).

-Moved shared arguments/code from Walker classes up into GATKTool. Still some issues
 related to marking engine-wide arguments as optional/required on a per-traversal or
 per-tool basis, but tickets have been created for these.

-Since there isn't yet an htsjdk release that contains SimpleInterval, temporarily
 checked a copy of it into our repo, which we can remove the next time we
 rev htsjdk.

TODOs:

-We currently still require a sequence dictionary to actually parse intervals in
 IntervalArgumentCollection. This is due entirely to our support of intervals without
 specific stop positions (eg., "chr1" and "chr1:1+") -- for these intervals we must
 look up the stop position in a sequence dictionary. This means that IntervalWalkers
 currently require at least one input that contains a sequence dictionary (although
 VariantWalkers do not). We should look into ways of relaxing this restriction.

Resolves #109
@coveralls
Copy link
Collaborator

Coverage Status

Coverage increased (+0.11%) to 75.03% when pulling d814052 on dr_interval_walker into db029de on master.

@droazen
Copy link
Contributor Author

droazen commented Mar 14, 2015

All comments addressed -- merging!

droazen added a commit that referenced this pull request Mar 14, 2015
IntervalWalker, VariantWalker enhancements, and GenomeLoc -> SimpleInterval migration in the engine
@droazen droazen merged commit d2f2590 into master Mar 14, 2015
@droazen droazen deleted the dr_interval_walker branch March 14, 2015 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement an IntervalWalker
2 participants