-
Notifications
You must be signed in to change notification settings - Fork 28
Partitioning reads
Before this step, you'll need to have completed Trycycler reconcile for each of your good clusters. I.e. each cluster directory should have a 2_all_seqs.fasta
file. You'll also need the same long-read set you used in previous steps, which I'll assume is in reads.fastq
Now that you have reconciled sequences for each cluster, this step will partition your reads between these clusters. I.e. each read will be assigned to whichever cluster it best aligns and saved into a file for that cluster.
This step is run once for your entire genome (i.e. not on a per-cluster basis).
This step takes your cluster directories as input, each of which must have the 2_all_seqs.fasta
file made by Trycycler reconcile.
Assuming you have deleted all of the bad cluster directories (i.e. the only cluster directories left are the good ones on which you've run Trycycler reconcile), this command would do the trick:
trycycler partition --reads reads.fastq --cluster_dirs trycycler/cluster_*
Note that the star in that command is a glob which will expand to all the cluster directories. You could also explicitly list the cluster directories like this (assuming your good clusters are numbers 1, 2 and 3):
trycycler partition --reads reads.fastq --cluster_dirs trycycler/cluster_001 trycycler/cluster_002 trycycler/cluster_003
Trycycler partition has the following parameters you can adjust:
-
--min_aligned_len
: reads with less than this many bases aligned (default = 1000) will be ignored. -
--min_read_cov
: reads with less than this percentage of their length covered by alignments (default = 90.0) will be ignored. -
--threads
: this is how many threads Trycycler will use for read alignment. It will only affect the speed performance, so you'll probably want to use as many threads as you have available.
After Trycycler partition completes, each of the cluster directories should have a 4_reads.fastq
file which contains its share of the total reads.
You may notice that when Trycycler partition finishes, not all of the reads have been assigned to a cluster. This is especially apparent when you only have a single cluster (e.g. for a one-chromosome-no-plasmids genome).
This is normal – the missing reads are simply those which failed to meet the --min_aligned_len
and --min_read_cov
thresholds.
- Home
- Software requirements
- Installation
-
How to run Trycycler
- Quick start
- Step 1: Generating assemblies
- Step 2: Clustering contigs
- Step 3: Reconciling contigs
- Step 4: Multiple sequence alignment
- Step 5: Partitioning reads
- Step 6: Generating a consensus
- Step 7: Polishing after Trycycler
- Illustrated pipeline overview
- Demo datasets
- Implementation details
- FAQ and miscellaneous tips
- Other pages
- Guide to bacterial genome assembly (choose your own adventure)
- Accuracy vs depth