You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi!
I'm now running STITCH on plants, and I wasn't able to get a good distribution of info score -- there are a lot of SNPs with info score between 0.2 and 1. I see that you set 0.4 as the threshold of info score for the plots of allele frequencies, so I also used 0.4 as a threshold, but I wonder how to interpret this score anyway.. Can you please share some ideas about what the info score reflects and how to define a reasonable threshold?
The text was updated successfully, but these errors were encountered:
Sorry for my slow reply, I've been involved with undergrad interviews here at Oxford the last few days, which has been all encompassing.
Feel free to let me know a bit more about your project, and the parameters you used to do the imputation, so that I can comment if some changes might be beneficial and potentially increase the average INFO score.
The INFO score used here is a standard one used in imputation, which can be read about, for instance here https://www.well.ox.ac.uk/~gav/snptest/#info_measures
Informally, it is closer to 1 if the imputation process is confident, and closer to 0 if it is less confident
In slightly more detail, confidence comes from the distribution of genotype posteriors. If the genotype posteriors are fully confident, i.e. always 0 or 1, then the INFO score should be close to 1. If the genotype posteriors are not confident, i.e. close to 1/3, the INFO score should be close to 0.
Now, generally, STITCH is very well calibrated, so the INFO score at a variant should be monotonically related to the expected imputation accuracy. Ideally you'd have some truth data set, that would allow you to compare how an INFO score threshold correlates with accuracy. In the past, I and others have found 0.4 seems to be a reasonable threshold, so that's why I suggest it. If you have some other way to measure accuracy, or your own truth data, you might find a different threshold to be more reasonable.
Hi!
I'm now running STITCH on plants, and I wasn't able to get a good distribution of info score -- there are a lot of SNPs with info score between 0.2 and 1. I see that you set 0.4 as the threshold of info score for the plots of allele frequencies, so I also used 0.4 as a threshold, but I wonder how to interpret this score anyway.. Can you please share some ideas about what the info score reflects and how to define a reasonable threshold?
The text was updated successfully, but these errors were encountered: