-
Notifications
You must be signed in to change notification settings - Fork 597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VCF row validation error on gCNV results #8834
Comments
Hi @MattWellie |
@gokalpcelik you might be my hero ❤️ |
If the issue is solved I am closing it. If you still observe problems you may reopen it and we can take a look. |
Hi @gokalpcelik |
@MattWellie What I suspect is happening is that the ploidy call from gCNV does not match your ped file. Is this sample a sex aneuploidy or have an incorrect ploidy assigned by gCNV? If so, try setting sex to unknown in your ped file (i.e. changing the 1 or 2 to a 0). We should probably improve the default behavior here however. |
Thanks @mwalker174 , we did a little digging and may have found a sample swap in the sample set being processed, but the swap is not the sample identified in the error log. I've got a couple more runs going now, hopefully I'll have more information soon. |
Hi again, some updates! I've updated our PED with inferred rather than stated sex, which has resolved a number of these issues. As stated before, the samples which had incorrectly assigned sexes were not the named samples appearing the error logs, which made troubleshooting a challenge. We're now running into issues with true sex aneuploidy, which the tool doesn't work for, as stated in your docs page. Do you have a way to cleanly handle these sex aneuploidies so that we can retain calls for the rest of the genome, rather than excluding the sample(s) entirely? This might be a process question, so I'm going to raise separately on the GATK-SV repo FYI @cassimons |
We'd also be interested in finding out how the tool supports arbitrary autosomal ploidies, if you have any information on that we'd be keen to have a look! |
I've pulled the problem VCF and a couple of successful ones locally and I can confirm that when running with 4 VCFs:
Command used in my toy dataset:
|
gatk/src/main/java/org/broadinstitute/hellbender/tools/walkers/sv/JointGermlineCNVSegmentation.java
Line 701 in c6daf7d
I'm running into a recurrent issue in JointGermlineCNVSegmentation, running after PostprocessGermlineCNVCalls in a gCNV pipeline. A number of batches are being merged in parallel - some of those succeed, some fail. It's not clear just yet if this is a deterministic failure, I'll re-run a few times and see if I can answer that.
The VCF row in question is
The characterisation of this row as
type=NO_VARIATION alleles=[N*]
seems... partially correct? There is no variation at this locus, but I'm not sure why alleles isN*
.In this situation, as I read it, the first clause should be satisfied: 1 allele, and allele is no-call. Instead the variant process is dying in the else side of the condition. Could you clarify if I'm interpreting this correctly?
Relevant versioning:
The text was updated successfully, but these errors were encountered: