Identify and count clusters across a series of .sam files.
python run.py --help
git clone https://github.com/DomBennett/Project-cluster.git
Or download the zipped folder:
wget https://github.com/DomBennett/Project-cluster/archive/master.zip
- One .sam file stored per folder
- cdhit
- Python (v2 or v3)
- Convert .sam to .fasta by extracting the orthologous sequence identified within the .sam file.
- Run cdhit
- Count clusters with greater than
min_nsqs
- Report number of clusters per .sam in a .csv
D.J. Bennett & J.S. Eriksson