-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--genfile its value and generating the file #67
Comments
This would work, though perhaps it's cleanest from a software evaluation perspective to use external software. I normally use the good old fashioned GATK 3 UnifiedGenotyper (now many years old!), as it is fast, and I can tell it to genotype sites given specific reference and alt alleles. I assume HaplotypeCaller can do the same thing, but am not entirely sure. I think samtools/bcftools can do the same thing. But this should work as well (though to be clear on phrasing: are you suggesting (2 = hom alt) (NA = missing) or (2 = (hom alt or missing)), I assume the former, thought phrasing unclear |
Hi Yes it totally makes sense to use a real genotype caller rather than (mistakenly) rely on sufficient coverage at ever genotype. I actually have individual .vcf files, so I just need to merge them, I was trying to be too lazy. As far as phrasing, I took tthe text from the STITCH help, I understand it as 2 = hom alt and NA i= indicating missing genotype (pretty close to the "vcftools --012" option , but where "-1" needs to be replaced by "NA") Guy genfile |
I mean, STITCH should do what GATK3 UG does, I think exactly, though I'm probably missing some things. And yeah, STITCH won't filter sites, so if you want annotations to do that, you'll need something else. OK cool re: NA, sounds good |
Hi
I am thinking about trying the option --genfile. As of my 7500 samples 50 of them have a sequencing coverage of >x15.
I was thinking of taking the file from --outputInputInVCFFormat = TRUE. and using the data from only the 50 high coverage samples to make the required file ( tab separated column, with 0 = hom ref, 1 = het, 2 = hom alt and NA).
Does that sound like a plan that might work? The number of reads removed during the "generate inputs" look pretty modest for all samples.
Do you think that --genfile might be an option work trying? I have a large number of trios and my Mendelian error rate is very low; I was jus curious if it might be further improved - if it is easy for me to have a go--.
Thanks
Guy
The text was updated successfully, but these errors were encountered: