Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subsetting donor VCF file for Vireo strategy 2 #85

Open
lucygarner opened this issue Jun 5, 2023 · 3 comments
Open

Subsetting donor VCF file for Vireo strategy 2 #85

lucygarner opened this issue Jun 5, 2023 · 3 comments

Comments

@lucygarner
Copy link

Hi,

For strategy 2, you recommend subsetting the donor VCF file as below. Specifically how would you recommend to filter out SNPs with "too much missing values" or "genotypes too similar across donors"?

For efficient loading of donor VCF file, we recommend subset it bcftools view donor.vcf.gz -R cellSNP.cells.vcf.gz -Oz -o sub.vcf.gz

You can also add -s or -S for subsetting samples.

Make sure you only keep informative SNPs, e.g., by filtering out SNPs with too much missing values or the genotypes too similar across donors.

Best wishes,
Lucy

@huangyh09
Copy link
Collaborator

Hi, good question. You may try filtering out SNPs with missing values in >50% of donors or SNPs with the same genotype in >90% of donors. But this is just a rule of thumb and we haven't benchmarked on it.

Yuanhua

@lucygarner
Copy link
Author

Thanks, and what software would you recommend to do this?

@ghuls
Copy link
Contributor

ghuls commented Aug 18, 2023

@lucygarner You can use bcftools view for that.

Some examples:
https://github.com/aertslab/popscle_helper_tools/blob/master/filter_vcf_file_for_popscle.sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants