-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sensitivity issues #50
Comments
Hi Valentina, in general everything is good! For human genome hg38 I think you also need to turn --hg38 flag, but it often gives an error when the wrong reference genome was used. What are the varaints which were not called, are they very short? Are they mosaic? Was this dataset described somewhere so I could take a look at the panel design? |
Thanks for the quick answer! Here is the link to the dataset publication, where you can find more about it : https://pubmed.ncbi.nlm.nih.gov/30175241/. To give you a quick overview, the samples has been analyzed with the Illumina TruSight Cancer Panel, comprising ~100 cancer related genes. I did not use --hg38 (even if the tool did not throw any error). The variants missed are germline and most of them is small, comprising 1 or 2 exons. If you need any other information, just ask ;) Valentina |
I see =) so the varaints could be discarded if they were in "centromere" of hg38 (it could be normal part in hg19) and also set --lengthG 0 (which means even the smallest variants are called). Run this and let me know the sensitivity - it is not the maximum what can be done =) (I hope you have this test run and sensitivity check as a script so you can run it several times without involving a lot of human efforts) 100 genes, in particular, is not a lot of genetic material and some parts could be filtered out as e.g. GC extreme - but this additional trick will improve the situation https://github.com/GermanDemidov/segmentation_before_CNV_calling - but first try to run without it, just with |
Thank you!!! By running the script with your suggestions, I got a sensitivity of 94.4%! A great improvement ;) I also checked the FP rate and looks fine, with an average of 2,1 calls per sample. Do you have any other suggestions, beside trying out the segmntation as well? (https://github.com/GermanDemidov/segmentation_before_CNV_calling ) Again thanks, |
Hi Valentina, great news! You can probably also check if the true positive calls have higher log-likelihood score than the false positives and tune your threshold accordingly. What are the missing CNVs? Are they in PMS2 gene? Can you open .seg files from the false negative samples in the IGV and check the regions of missing CNVs, what do you see? |
Hi German, I have the results of the FN analysis, and it turns out that PMS2 is not the issue. The benchmark has been performed using the exon as the fundamental unit, so that the total number TP+FN is the total number of events tested. As you can see from the attached file, most of the exons not called are ate the end or at the beginning of a bigger variant and are annotated as "TooShortOrNA" or "GCnormFailed" in the _cov.seg file. The attached file has the following columns:
Thank you, |
Hi Valentina, Yes, I see. I think you can expand small exons to be like 100base pairs. I doubt the actual target for enrichment was 50bp, I would use the original enrichment BED file, not the list of exons. You can change the 50bp threshold here: https://github.com/imgag/ClinCNV/blob/master/clinCNV.R, line 293. For GC normalization, you may relax the criteria here https://github.com/imgag/ClinCNV/blob/master/generalHelpers.R#L132 or line 134, put 10 instead of 25 and 50. It means "if you have less than 10 exons with GC of this value, remove them, otherwise, perform GC normalization". Try to run this modified code and let me know if it is better. |
Hi German, Thank you as always for the helpful suggestions! I applied those changes and I get a sensitivity of 96.40%, missing 13 exons out of a total of 357. Most of these exons are at the border of a larger variant, only 3 variants are completely missed (all small variants). Here I print the variants together with the _cov.seg VALUE of the relative exons: SDHB Exon 1 deletion 1.56 It is a really good result! Of course, I call double the number of the variants, but I can manage filters using the log-likelihood. Do you think those variants could be recovered? Thanks! |
Hi Valentina, the answer actually depends. Can you check the plots produced, there should be a clustering picture - do you see any clustering of the samples? 1.56 is actually closer to diploid than to deletion (1 vs 2) - this one is impossible to save, unless you have clustering of the samples. |
Hi @GermanDemidov,
Thanks for the precious tool and the great documentation on it.
I am considering to use the tool for germline (and eventually mosaic) CNV calling in panels and WES scenarios.
In order to have a fair comparison with the tool previously used, I am testing the algorithm using the ICR639 dataset. It has 74 CNVs on 72 samples and I am using the 72 samples as "reference". Using this approach, I obtain a sensitivity of 81% compared to the previous one which was around 98%.
Here are the steps and commands used for the benchmark:
Merged coverage than has the following structure:
Am I missing some crucial points?
Tools versions:
ngs-bits 2019_09 (installed using bioconda)
clinCNV 1.18.3 (latest)
The text was updated successfully, but these errors were encountered: