Skip to content

Multiple sequence alignment

Ryan Wick edited this page Jul 30, 2020 · 21 revisions

Requirements

Before running this step, you'll need to have completed the previous one (reconciling contigs). I.e. you should have a Trycycler output directory (which I'll assume is called trycycler) with subdirectories for each of your good clusters, each of which contains a 1_contigs subdirectory and a 2_all_seqs.fasta file.

Concept

This step takes the reconciled contig sequences (2_all_seqs.fasta) and runs a multiple sequence alignment.

For example, it would take sequences like this:

GGCAGAGCGACGTAAATTACGAGTAAAGGAGGGGAGAGCATTAAGCATGCCTAAACTG
GGCAGAGCGCGACGTAAATTACGAGTAAAAGGAGGGAGGAGCATTAAGCCATGCCTACTG
GGCAGAGCGCGACTAAATTTACGAGTAAAGGAGGGAGGAGCATAGCCATGCCTAAACTG

And produce an alignment like this:

GGCAGAG--CGACGTAAA-TTACGAGT-AAAGGAGGGGA-GAGCATTAAG-CATGCCTAAACTG
GGCAGAGCGCGACGTAAA-TTACGAGTAAAAGGA-GGGAGGAGCATTAAGCCATGCCT--ACTG
GGCAGAGCGCGAC-TAAATTTACGAGT-AAAGGA-GGGAGGAGCAT--AGCCATGCCTAAACTG

Running Trycycler msa

The Trycycler msa command must be run separately for each of your good clusters. Assuming your good clusters are numbers 1, 2 and 3, these are the commands you would run:

trycycler msa --cluster_dir trycycler/cluster_001
trycycler msa --cluster_dir trycycler/cluster_002
trycycler msa --cluster_dir trycycler/cluster_003

Unlike in previous steps of Trycycler, the msa step should be hands-off. I.e. no manual intervention is required – just run it and wait for it to finish.

Trycycler msa will typically take a few minutes. Longer sequences and larger numbers of sequences will be slower.

Settings

Trycycler msa has the following parameters you can adjust:

  • --kmer: the k-mer size used for sequence partitioning (default = 32).
  • --step: the step size used for sequence partitioning (default = 1000).
  • --lookahead: the look-ahead margin used for sequence partitioning (default = 10000).
  • --threads: this is how many parallel instances of MUSCLE will be used when aligning the sequence partitions. It will only affect the speed performance, so you'll probably want to use as many threads as you have available.

You likely won't need to adjust the partitioning parameters (--kmer, --step and --lookahead) and can just leave them at the defaults. If you're curious about what they are used for, see How multiple sequence alignment partitioning works.

Output

When finished, Trycycler reconcile will make a 3_msa.fasta file in the cluster directory, a FASTA-formatted multiple sequence alignment of the contigs ready for use in generating a consensus. The consensus step will also need partitioned reads, so that's the next step in the process.

Clone this wiki locally