Full Stack Example

Before using epa-ng, it is important to know where it sits in the bigger picture. On this page you will find an example of what a full placement pipeline might look like.

There are two major components to phylogenetic placement:

a set of sequences to place (called query sequences)
a set of sequences that represent the context within which we want to place (called reference sequences)

If you've landed at this page, you probably have your query sequences already. Typically these come from metagenomic or metabarcoding sequencing, and are already filtered to belong to some genetic region (like 16S, 18S, or other common barcodes).

The most common question then is: for my query sequences, I want to know where they belong in terms of taxonomy. What is the taxonomic composition of my environmental sample?

epa-ng can answer this question, but it needs a handful of other programs to prepare the data and to perform any in-depth post analysis you might require for your research.

Step 1: Selecting the reference sequences

This is the most biologically involved step (apart from the wet-lab work), as it requires knowledge of the environment the query sequences were sampled from, the organisms that are suspected to inhabit it, possible contaminants, and so on.

Some general advice I can give here:

we found its more robust to include representative sequences of a species, or even genus, rather than many, as too high similarity will just produce many placements of low certainty, spread across many members of such a group. This unneccesarily increases runtime.
include as much diversity as possible
keep in mind that the tree should still be small enough to visualize (unless you reduce its size during post processing), and there are limits to how big of a tree and alignment the programs involved can handle

Step 2: Building a reference alignment and tree

This can be done using standard methods. The only requirement that epa-ng (currently) imposes, is that you should remember / obtain the model parameters with which the tree was inferred, and supply them to epa-ng when calling the main placement routine (currently only the GTRGAMMA model is supported).

Step 3: Aligning the query sequences

Now that you have a reference MSA, we need to align our queries against it. There are multiple tools to do this, but we usually reccomend either hmmer/hmmalign, or papara

papara actually takes the reference tree into account when aligning sequences, and a call to it would look something like

papara -t $TREE -s $REF_MSA -q $QRY -r -n some_name

(tree in newick, ref_msa in phylip and qry in fasta format)

Note the -r option: this is vital to ensure comparability between different query files for the same reference tree, as it forces papara not to add any sites to the original reference alignment!

Step 4: Placing the query sequences

Before the actual placement can commence, we need to explicitly prepare the input, as epa-ng (currently) only accepts separate query and reference alignment files, both in fasta (or bfast) format. Papara, for example, outputs the aligned queries together with the reference MSA, in phylip format.

(there will be a convenience function for this shortly)

Finally, the actual call to epa-ng will look something like

epa-ng --tree $TREE --ref-msa $REF_MSA --query $QRY_MSA --out-dir $OUT --model $INFO

$INFO is used to pass the aforementioned model parameters of the reference tree to epa-ng. It may (currently) be one of two things: either a raxml-ng-style model descriptor, like so:

GTR{0.7/1.8/1.2/0.6/3.0/1.0}+FU{0.25/0.23/0.30/0.22}+G4{0.47}

or, alternatively, a RAxML_info file resulting from a call using its -f e-option, looking something like

raxmlHPC-AVX -f e -s $REF_MSA -t $TREE -n info -m GTRGAMMAX

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full Stack Example

Step 1: Selecting the reference sequences

Step 2: Building a reference alignment and tree

Step 3: Aligning the query sequences

Step 4: Placing the query sequences

Step 5 and onward: Visualization, post-analysis

Clone this wiki locally