Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genotyping / Downstream Analysis ideas #40

Closed
apeltzer opened this issue Oct 10, 2018 · 14 comments
Closed

Genotyping / Downstream Analysis ideas #40

apeltzer opened this issue Oct 10, 2018 · 14 comments
Labels
question Further information is requested

Comments

@apeltzer
Copy link
Member

The former way to do things in EAGER1.X was to use GATK to call variants on the preprocessed / filtered BAM files and then use that to recreate e.g. a consensus FastA for small genomes and/or create a VCF for downstream tools.

There are nowadays however tools out there that can be used for downstream genotyping, aware of ancient DNA damage etc, for example snpAD and IIRC angsd and sequenceTools that I'd rather like to rely on, as they are specifically designed for aDNA usage.

The learning curve for these is okayish, as I think that basic functionality as for example solely output for downstream analysis tools is required.

My plan for now is to incorporate some of the functionality of:

  • snpAD
  • ANGSD
  • sequenceTools

Additionally, I'd love to incorporate:

These changes are planned features for V2.1 of the pipeline, 2.0 will "just" provide functionality for preprocessing, QC and mapping using BWA for now.

@apeltzer apeltzer added the question Further information is requested label Oct 10, 2018
@apeltzer apeltzer added this to the V2.1 "Ulm" milestone Oct 10, 2018
@apeltzer
Copy link
Member Author

@jfy133 @sc13-bioinf @JudithNeukamm

Any ideas/thoughts/complaints on this plan?

@apeltzer
Copy link
Member Author

@EisenRa might also want to comment on this one :-)

@jfy133
Copy link
Member

jfy133 commented Oct 11, 2018

Downstream, I currently don't have any other opinions. But do you mean to not include GATK at all? Or were you focusing on just the downstream steps (which you would turn off the dedicated genotyping step).

For me and the people I work with we still rely heavily on GATK, so I would be happy if that would still be retained.

@apeltzer
Copy link
Member Author

I thought about not including it, but that is something I'd like to discuss: Are you for example using it?

@jfy133
Copy link
Member

jfy133 commented Oct 11, 2018

Yes, I am still using it. Unfortunately some of the down stream tools still can only accept GATK style VCFs for example. We've only just convinced the person to configure it so it'll accept haplotypecaller ;)

@apeltzer
Copy link
Member Author

Downstream tools would be? I guess snpAD produces VCF as well, but I'm not really certain what kind of version and whether its standardized.

Do you need both the "old" UnifiedGenotyper and HaplotypeCaller ?

@jfy133
Copy link
Member

jfy133 commented Oct 11, 2018 via email

@EisenRa
Copy link

EisenRa commented Oct 11, 2018

I'm ambivalent about the variant callers tailored for ancient DNA, as from what I understand, they focus more on human genomes. They still could be useful for people working on ancient human DNA, but I would not be the person to ask.

I agree with James that having HaplotypeCaller would be good as it is well supported and has other tools that rely on its output.

Another variant caller to consider is FreeBayes. It is quite popular, works well for both human and microbial genomes, and provides a well-annotated VCF file (can also do joint calling between samples). It is used in the SNIPPY pipeline, which focuses on modern bacterial genome variant calling.

@apeltzer
Copy link
Member Author

apeltzer commented Oct 11, 2018

Thanks everyone for their valuable thoughts and ideas.
That means I will most likely implement something like this:

  • GATK (HaplotypeCaller + UnifiedGenotyper)
  • FreeBayes

That should be nice for pathogen / bacterial people :-)

And additionally:

  • ANGSD
  • snpAD
  • sequenceTools

for our human genetics people. (Though they could also of course rely on GATK/FreeBayes if they want to)

@jfy133
Copy link
Member

jfy133 commented Jan 9, 2019

Going to have to unfortunately request to include UnifiedGenotyper for pathogen stuff because HC does local de-novo asssembly around possible SNP sites - but this doesn't work for low coverage data :.

Note - downstream stats for this can be provided by bcftools stats, which 🎉 is already supported by MultiQC

@jfy133
Copy link
Member

jfy133 commented May 10, 2019

Note that we will have to package GATK 3.5, because past 3.6 (I think, possible v4), IndelRealigner has been removed. The latest systems in GATK are not really comaptible with short read data anymore.

And 3.5 doesn't accept .csi files, only .bai 😓 . Maybe we should make a an option for which indexing system should beuseD?

@teepean
Copy link

teepean commented May 10, 2019

The last version of 3.x series (3.8-1) still has IndelRealigner.

@apeltzer
Copy link
Member Author

Current GATK PR contains: #238

  • UnifiedGenotyper (GATK 3.8.1)
  • HaplotypeCaller (latest GATK4.X)

Will probably add more ANGSD, snpAD, etc on-demand in a separate release.

@jfy133
Copy link
Member

jfy133 commented Sep 29, 2019

See closed #10 for GenConS.

@jfy133 jfy133 removed this from the V2.1 "Ulm" milestone Dec 4, 2019
@apeltzer apeltzer added this to the Unclear Topics / Feature Requests milestone Feb 29, 2020
jfy133 added a commit that referenced this issue Apr 29, 2020
@jfy133 jfy133 closed this as completed Jun 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants