-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gCNV nan errors #4824
Comments
How did you guys generate the target list for hg38? I've been having some problems with regions near the centromeres for SNPs and indels as well. The centromeres.bed for hg38 from UCSC seems to include the computationally generated centromeres, but not the additional gross regions nearby that we excluded from b37. Laurent was excluding a fair amount of territory beyond the "official" centromeres for his QC based on the density of multi-allelic variant calls. |
@mbabadi We should look into ways to be more robust against NaNs, but I think we should just go ahead and blacklist these regions. This can be done from the outset of the pipeline via the |
For SVs, we are not blacklisting any regions except sometimes gaps and centromeres. Unfortunately many of the events occur in messy areas like this and I think it’s going to be a major issue if we can’t guarantee that the model will be robust in such regions. |
@mwalker174 the region you found above is included in the 10X SV blacklist. What list is the SV team currently using? If the read-depth data is not reliable in these regions, I would not expect the model fit to be very good, even if we made the model more robust against nans. So I wouldn't think the results would be very useful for SV integration. Is there a way to make CNV-SV integration "Bayesian" in the sense that we could fall back on a prior in the case of missing CNV data? |
@samuelklee We aren't using any blacklists currently. I am less concerned about noisy calls because they usually don't line up well with read pair evidence. That said, I did not get nan errors with 1kbp bins - perhaps we could bump up the bin sizes in regions where this happens? |
@samuelklee In my experience the 10x blacklist is very very conservative and will likely exclude many regions of common copy number variation. I'd recommend something less restrictive for general use. |
There were more nans in these other chunks, all of which overlap the UCSC centromeres regions:
I'm going to try again with centromeres blacklisted. |
Blacklisting centromeres resolved the NaN errors except for one block on chrY that roughly corresponds to region q11.23. I blacklisted that block and got no more errors. |
My guess is that these regions have unusually high coverage, which is probably yielding NaN likelihoods. @mwalker174 any way you can check this from your previous runs? Our philosophy so far has been to keep the tools relatively agnostic by allowing generic blacklisting via |
@mwalker174 can you go back and check whether high coverage was causing the NaNs? |
@mwalker174 does this need to be addressed? |
@samuelklee Yes. Looking forward, we will want to reduce the extent of our blacklist and interval filtering, which are currently needed to prevent these errors. |
@mwalker174 let's check whether these NaNs were caused by high coverage. Perhaps we can address them along with those due to vanishing overdispersion (which were caused by large values of interval-psi-scale). |
At least partially addressed in #6245, we can reopen if there are other NaNs that we have to patch. |
@samuelklee @asmirnov239 @mbabadi I tried to run a 30-sample cohort through gCNV on all canonical chromosomes with 250bp bins sharded in 10k-interval blocks, but PostprocessGermlineCNVCalls gave the following error:
After inspecting the output from shard 225, it seems that the model starts producing nan values after ~1600 warmup iterations (looking at the ELBO log). This shard corresponds to a pericentromeric region chr3:91540501-94090250.
It would be nice to have the option to bypass this error in PostprocessGermlineCNVCalls.
Here is the model config for the shard:
The text was updated successfully, but these errors were encountered: