Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about info score #75

Open
Antennaria opened this issue Dec 9, 2022 · 1 comment
Open

A question about info score #75

Antennaria opened this issue Dec 9, 2022 · 1 comment

Comments

@Antennaria
Copy link

Hi!
I'm now running STITCH on plants, and I wasn't able to get a good distribution of info score -- there are a lot of SNPs with info score between 0.2 and 1. I see that you set 0.4 as the threshold of info score for the plots of allele frequencies, so I also used 0.4 as a threshold, but I wonder how to interpret this score anyway.. Can you please share some ideas about what the info score reflects and how to define a reasonable threshold?

@rwdavies
Copy link
Owner

Hi,

Sorry for my slow reply, I've been involved with undergrad interviews here at Oxford the last few days, which has been all encompassing.

Feel free to let me know a bit more about your project, and the parameters you used to do the imputation, so that I can comment if some changes might be beneficial and potentially increase the average INFO score.

The INFO score used here is a standard one used in imputation, which can be read about, for instance here
https://www.well.ox.ac.uk/~gav/snptest/#info_measures
Informally, it is closer to 1 if the imputation process is confident, and closer to 0 if it is less confident
In slightly more detail, confidence comes from the distribution of genotype posteriors. If the genotype posteriors are fully confident, i.e. always 0 or 1, then the INFO score should be close to 1. If the genotype posteriors are not confident, i.e. close to 1/3, the INFO score should be close to 0.

Now, generally, STITCH is very well calibrated, so the INFO score at a variant should be monotonically related to the expected imputation accuracy. Ideally you'd have some truth data set, that would allow you to compare how an INFO score threshold correlates with accuracy. In the past, I and others have found 0.4 seems to be a reasonable threshold, so that's why I suggest it. If you have some other way to measure accuracy, or your own truth data, you might find a different threshold to be more reasonable.

Best,
Robbie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants