-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Genotyping / Downstream Analysis ideas #40
Comments
@jfy133 @sc13-bioinf @JudithNeukamm Any ideas/thoughts/complaints on this plan? |
@EisenRa might also want to comment on this one :-) |
Downstream, I currently don't have any other opinions. But do you mean to not include GATK at all? Or were you focusing on just the downstream steps (which you would turn off the dedicated genotyping step). For me and the people I work with we still rely heavily on GATK, so I would be happy if that would still be retained. |
I thought about not including it, but that is something I'd like to discuss: Are you for example using it? |
Yes, I am still using it. Unfortunately some of the down stream tools still can only accept GATK style VCFs for example. We've only just convinced the person to configure it so it'll accept haplotypecaller ;) |
Downstream tools would be? I guess snpAD produces VCF as well, but I'm not really certain what kind of version and whether its standardized. Do you need both the "old" UnifiedGenotyper and HaplotypeCaller ? |
MultiVCFAnalyzer (https://github.com/alexherbig/MultiVCFAnalyzer) - the new
version isn't yet avaliable but is coming soon.
No, HaplotypeCaller would be sufficient (and would add more pressure to at
least get rid of reliance on un-maintained tools ;))
…On Thu, 11 Oct 2018 at 09:26, Alexander Peltzer ***@***.***> wrote:
Downstream tools would be? I guess snpAD produces VCF as well, but I'm not
really certain what kind of version and whether its standardized.
Do you need both the "old" UnifiedGenotyper and HaplotypeCaller ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#40 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ARHmT5IiuIzv4-HIcVv4QTdiEQDBwUKoks5ujvKigaJpZM4XWR9t>
.
|
I'm ambivalent about the variant callers tailored for ancient DNA, as from what I understand, they focus more on human genomes. They still could be useful for people working on ancient human DNA, but I would not be the person to ask. I agree with James that having HaplotypeCaller would be good as it is well supported and has other tools that rely on its output. Another variant caller to consider is FreeBayes. It is quite popular, works well for both human and microbial genomes, and provides a well-annotated VCF file (can also do joint calling between samples). It is used in the SNIPPY pipeline, which focuses on modern bacterial genome variant calling. |
Thanks everyone for their valuable thoughts and ideas.
That should be nice for pathogen / bacterial people :-) And additionally:
for our human genetics people. (Though they could also of course rely on GATK/FreeBayes if they want to) |
Going to have to unfortunately request to include UnifiedGenotyper for pathogen stuff because HC does local de-novo asssembly around possible SNP sites - but this doesn't work for low coverage data :. Note - downstream stats for this can be provided by |
Note that we will have to package GATK 3.5, because past 3.6 (I think, possible v4), IndelRealigner has been removed. The latest systems in GATK are not really comaptible with short read data anymore. And 3.5 doesn't accept .csi files, only .bai 😓 . Maybe we should make a an option for which indexing system should beuseD? |
The last version of 3.x series (3.8-1) still has IndelRealigner. |
Current GATK PR contains: #238
Will probably add more ANGSD, snpAD, etc on-demand in a separate release. |
See closed #10 for GenConS. |
The former way to do things in EAGER1.X was to use GATK to call variants on the preprocessed / filtered BAM files and then use that to recreate e.g. a consensus FastA for small genomes and/or create a VCF for downstream tools.
There are nowadays however tools out there that can be used for downstream genotyping, aware of ancient DNA damage etc, for example snpAD and IIRC angsd and sequenceTools that I'd rather like to rely on, as they are specifically designed for aDNA usage.
The learning curve for these is okayish, as I think that basic functionality as for example solely output for downstream analysis tools is required.
My plan for now is to incorporate some of the functionality of:
Additionally, I'd love to incorporate:
These changes are planned features for V2.1 of the pipeline, 2.0 will "just" provide functionality for preprocessing, QC and mapping using BWA for now.
The text was updated successfully, but these errors were encountered: