-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long running times in population mode when using raw reads #22
Comments
Hi Ido, Building with reads rather than assemblies is more computationally intensive, as Bifrost has to conduct an additional filtering step to remove low coverage k-mers. Also, there are more sequences to include in the graph than using assemblies, which also increases runtime. To answer your questions:
I hope this helps. Please let me know if you have any further questions. Sam |
Thanks for your reply @samhorsfield96 , At the moment trying to run all the samples together in Thanks, Ido |
Hi Ido, We haven't extensively tested the computational efficiency of each step, although I would predict it will be dataset dependent (more variation will make graph generation take longer with more memory, more genomes will impact calling). I'm not 100% sure on the effect of incrementally adding genomes either. If you are happy to do so, I would be interested to hear how your approach fares with regards to memory consumption. All the best, Sam |
Hi,
I'm running
unitig-caller
in population mode to by used bypyseer
for GWAS (following the documentation).I don't have assemblies for all my samples (~250 samples), so I'm using the raw reads (2x 1-3gb gzipped paired-end fastq files for each sample) and it takes a long time to process.
I was wondering a couple of things:
pyseer
)?Thanks, Ido
The text was updated successfully, but these errors were encountered: