You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running STITCH with a relatively small sample (40 individuals), with ~0.5X per-sample haplotagging (linked-read) sequencing. I recognize, that this sample size is pretty small for imputation, but I'm hoping STITCH will still be at least somewhat effective. I've run the program, varying K and number of generations as suggested, and am now evaluating the output. It seems that the mean score, and number of sites with scores > 0.4 increase as K increases from 2-35, where the values seem to asymptote. However, the r2 values reach their peak around K=14 (r2=0.875), and drop off on either side (K=35, r2=0.73). Number of generations has minimal effect on r2, but runs with fewer generations (10-100) consistently yield more sites with high scores than those with more generations (300-1000). Does this make sense, and would you recommend maximizing score values or r2 values when selecting K?
Thanks!
Nate
The text was updated successfully, but these errors were encountered:
I'm honestly shocked that such large K perform well with N=40 0.5X samples (~20X total coverage).
But anyway, I would go with external r2 as the best measure, in general, for instance what you write about K=14. If as you say for the nGen parameter, there is limited effect on r2, but some on the INFO score, I would go with the option that maximized the INFO score anyway, just in case.
Hello,
I am running STITCH with a relatively small sample (40 individuals), with ~0.5X per-sample haplotagging (linked-read) sequencing. I recognize, that this sample size is pretty small for imputation, but I'm hoping STITCH will still be at least somewhat effective. I've run the program, varying K and number of generations as suggested, and am now evaluating the output. It seems that the mean score, and number of sites with scores > 0.4 increase as K increases from 2-35, where the values seem to asymptote. However, the r2 values reach their peak around K=14 (r2=0.875), and drop off on either side (K=35, r2=0.73). Number of generations has minimal effect on r2, but runs with fewer generations (10-100) consistently yield more sites with high scores than those with more generations (300-1000). Does this make sense, and would you recommend maximizing score values or r2 values when selecting K?
Thanks!
Nate
The text was updated successfully, but these errors were encountered: