-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FilterMutectCalls: errorRate must be good probability but got NaN #5821
Comments
@igordot I believe this is fixed in master and will be in the 4.1.1.0 release this Thursday. |
I tried 4.1.1.0. Although that error is fixed, now I am getting a new one:
|
@igordot Could you provide your command line? Also, could you check whether the error persists when you use a panel of normals, or nothing at all, instead of the unmatched normal? |
Also, is this hg38? |
This is the command:
And yes, this is hg38. Do you think it's related to that? |
A few recent bugs, which are all entirely my fault, came about because the liftover of gnomAD to hg38 (there is no official hg38 gnomAD yet) exposed some new edge cases, such as Here's what I will do: 1) correct our hg38 gnomAD to fix liftover artifacts and put this new resource in the GATK bucket. 2) Create a Firecloud workspace with a few hg38 samples in order to reproduce the error and to make sure future changes don't create new problems 3) try to fix the error because even if 1) works it's sloppy to rely on the fact that gnomAD won't have these edge cases. I hope 1) succeeds because it will be available immediately without waiting for the next release. |
Thank you for looking into it. I am curious if that is really the problem. If the reference files were causing problems, shouldn't that impact all samples? I am seeing this error with some, but not most of the samples. Even using a different matched control sample with the same tumor sample will cause or fix the error. |
What we saw recently wasn't the reference itself, but rather our AF-only gnomAD resource lifted-over to the hg38 reference. The error only came up for sites that reached genotyping, which depends on the specific tumor sample as well as the lack of evidence in the normal. That's why it only appeared in some tumor-normal combinations. That being said, it might be something else. If you are able to share the unfiltered vcf file and the vcf.stats file it would be the most direct way to debug. |
@igordot I have not yet succeeded in reproducing the error with the few hg38 samples I have tested (2) and nothing obvious showed up in various If you can share your unfiltered vcf input it would be very helpful, but if that's not possible could you post the contents of your |
By the way, I believe the same issue has come up recently on the GATK forum: https://gatkforums.broadinstitute.org/gatk/discussion/23685/issues-on-filtermutectcalls-log10-probability-must-be-0-or-less#latest. @igordot If your contamination table shows a contamination of 1 or greater don't bother sending the vcf -- contamination would definitely be the issue in that case. |
You are right. The error seems to be happening when contamination is 1 or NaN. That is probably due to a non-matched normal. The same panel with a true matched normal gives much more reasonable results (<0.01), so I don't know if the panel size is entirely at fault here. Should |
@igordot Thanks very much for following up on this. Just to clarify, are you saying that the 1/NaN contamination occurs when you run Also, I would not recommend using a non-matched normal anywhere in |
Thank you for the suggestions. I realize using a non-matched normal is not ideal. I was using it to at least filter out any technical artifacts. It seems to work well for that. Where do I find the panel of normals that you mentioned? |
@igordot A few panel can be found in the GATK best practices bucket, for example: gs://gatk-best-practices/somatic-b37/Mutect2-exome-panel.vcf |
How do I access that? I thought that the GATK resources were located here: https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0/ Is there a reason this is not in the GATK resource bundle? |
They're public, so just install Google Cloud gsutil and copy with
If you install gsutil this works when running locally as well, but for speed I would recommend downloading the pon.
Not that I can think of. |
Is there also a hg38 version? |
There's gs://gatk-best-practices/somatic-hg38/1000g_pon.hg38.vcf.gz for hg38. |
I think this issue still persists:
I am using ExAC lifted to hg38 as a germline resource in mutect2 with only a tumor sample, and getting the above error in filtermutectcalls. I recently updated to v4.1.3.0 to have the latest changes to mutect2. I was not having this issue with v4.0.5.1. Here is extracted information from the VCF which caused the issue.
|
@MikeWLloyd can you try version 4.1.4.0? |
Yes, I will try. I didn't realize 4.1.4.0 was available. |
No worries; fingers crossed. |
I loaded the docker repo GATK v4.1.4.0 and had the same (or similar) error result.
Contents of *.vcf.stats
Below is output from:
|
@MikeWLloyd It seems like you are running The error seems to occur because the |
I ran GATK 4.1.0.0 Mutect2 on a small (~1Mb) targeted panel. I am using a normal control that is not the same individual (basically to exclude technical artifacts), so I do expect to see more variants than with a proper matched normal. I was getting around 100-300 variants per sample with GATK 4.0.6.0. I am still roughly in the same range for some samples GATK 4.1.0.0, but I am getting 0 for some.
The problem seems to be at the FilterMutectCalls stage where I am seeing the following error:
I see there was a previous similar issue (#5553), but that one was apparently resolved. Curiously, I used several versions of GATK 4 and it never came up for me before.
The text was updated successfully, but these errors were encountered: