Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ported from Classic-GATK: CombineGVCFs --convertToBasePairResolution issue #265

Closed
vdauwera opened this issue Mar 7, 2015 · 8 comments
Closed

Comments

@vdauwera
Copy link
Contributor

vdauwera commented Mar 7, 2015

Original ticket by @jmthibault79

CombineGVCFs with --convertToBasePairResolution doesn't fully cover the intervals given

I'm trying to create very small files (for NAVS testing) from GVCFs.

An input file has these reference blocks. Let's call them Blocks 1,2,3:

1       10189399        .       A       <NON_REF>       .       .       END=10189558    GT:DP:GQ:MIN_DP:PL      0/0:94:99:46:0,96,1440
1       10190507        .       A       <NON_REF>       .       .       END=10190923    GT:DP:GQ:MIN_DP:PL      0/0:214:99:45:0,105,1575
1       10192376        .       C       <NON_REF>       .       .       END=10192406    GT:DP:GQ:MIN_DP:PL      0/0:14:42:8:0,21,297

This command line with an interval entirely contained in Block 2 produces nothing:

java -jar <GATK> \
-T CombineGVCFs \
-L 1:10,190,850-10,190,889 \
--convertToBasePairResolution \
-V <infile> -o <outfile>

Expanding the interval to overlap portions of Block 1 and Block 3 produces results for Block 2 and the portion of Block 3 which corresponds to my intervals.

It appears that only reference blocks which begin in the supplied intervals are processed.


@jmthibault79:
This may be a more general problem with processing GVCFs, and it may also relate to the CombineGVCFs bug @valentinruanorubio is working on.

@vruano:
I suspect that this is rather something to do with the VCF RodBinding processing code not using the END info field to determine whether a record overlaps a position. I guess it relies on the POS value and the length of the REF string to do that. That should be fixed in the (VCF) ROD traversal code. Perhaps we could have a GVCF specific code if it does help.
Also if it were possible to explicitly get the previous record that does not overlap the position programmatically, that would be enough to address this issue. However the other solution above would be cleaner.

@eitanbanks:
To fix this problem the getEnd() method of VariantContext would need to check for the presence of the "END" annotation in the INFO field. However, I'm not sure the INFO field is always decoded at this point (and doing so might be expensive). Could be a complicated fix.

@vruano:
I think this must be moved to hellbender... IMO RoD walkers on VCF files should consider the END info field.

@jmthibault79
Copy link
Contributor

This question was raised in #4 - was it resolved in #164 ?

@akiezun akiezun added the tools label Apr 15, 2015
@akiezun
Copy link
Contributor

akiezun commented Apr 15, 2015

@vruano is this a bug?

@akiezun
Copy link
Contributor

akiezun commented May 6, 2015

assigning to @vruano to clarify is this a bug

@jmthibault79
Copy link
Contributor

This can be described more simply as:

Ensure that intervals in GVCF traversals are END tag aware, so all reference blocks are included correctly. GATK3 considers start position only, so some reference blocks are missed.

@akiezun
Copy link
Contributor

akiezun commented May 6, 2015

thanks @jmthibault79 so it's a bug.
@vdauwera why is this not scheduled for GATK3? Is it too hard to fix there?

@akiezun akiezun added the bug label May 6, 2015
@vdauwera
Copy link
Contributor Author

vdauwera commented May 6, 2015

@akiezun I don't know that it's necessarily hard; at the time I was migrating issues to github, I was told that these types of issues would be addressed in Hellbender, and that no effort would be put into fixing them in GATK3. I would love to see this fixed in GATK3 of course.

@akiezun
Copy link
Contributor

akiezun commented Jun 22, 2015

marking as low prio unless i hear otherwise

@droazen
Copy link
Contributor

droazen commented Mar 20, 2017

Closing as obsolete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants