Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K optimization: r2 vs score #59

Open
nbedelman opened this issue Nov 19, 2021 · 2 comments
Open

K optimization: r2 vs score #59

nbedelman opened this issue Nov 19, 2021 · 2 comments

Comments

@nbedelman
Copy link

Hello,

I am running STITCH with a relatively small sample (40 individuals), with ~0.5X per-sample haplotagging (linked-read) sequencing. I recognize, that this sample size is pretty small for imputation, but I'm hoping STITCH will still be at least somewhat effective. I've run the program, varying K and number of generations as suggested, and am now evaluating the output. It seems that the mean score, and number of sites with scores > 0.4 increase as K increases from 2-35, where the values seem to asymptote. However, the r2 values reach their peak around K=14 (r2=0.875), and drop off on either side (K=35, r2=0.73). Number of generations has minimal effect on r2, but runs with fewer generations (10-100) consistently yield more sites with high scores than those with more generations (300-1000). Does this make sense, and would you recommend maximizing score values or r2 values when selecting K?

Thanks!
Nate

@rwdavies
Copy link
Owner

Hi Nate,

I'm honestly shocked that such large K perform well with N=40 0.5X samples (~20X total coverage).

But anyway, I would go with external r2 as the best measure, in general, for instance what you write about K=14. If as you say for the nGen parameter, there is limited effect on r2, but some on the INFO score, I would go with the option that maximized the INFO score anyway, just in case.

Best,
Robbie

@nbedelman
Copy link
Author

perfect, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants