This algorithm is used to cluster mutations as a preprocessing step for tumor phylogenetic reconstruction. While it can be used in conjunction with any phylogenetic algorithm, it is particular effective in contexts where the reconstruction is sensitive to the presence and absence of clones in samples including:
This code paritions mutations according to the samples that they are present in, runs a clustering algoritm on each partition, then merges the resulting set of clusters.
- snakemake
- python3
- pyclone
- gnu parallel
A full example can be found in the example
directory.
Input data is provided as example/input.tsv
.
To run the example:
```
cd code
snakemake --configfile ../example/example.config --cores 30
```
This will produce an output file example/results/cluster_assignments.txt
.
The clustering pipeline is run using snakemake. To begin, cd
into the code/
directory.
Required parameters:
input
: The input file in SPRUCE formatoutdir
: The output directory. This directory will be created if one does not exist. This directory should be unique for each separate dataset.
These parameters can be provided via command line,
snakemake --config input='../example/input.tsv' outdir='../example/results'
or using a configration file,
snakemake --configfile ../example/example.config
The output of the pipeline is {outdir}/cluster_assignments.txt
. Here, each line will correspond to a cluster, and
the semi-colon separated entries of each line correspond to charcter names provided in the input file.