Identifying pathogens in human post mortem brain.
Using a existing pipline converted to snakemake.
You will need some data to run the pipeline, testdata is included but will not give any good results.
required tools:
your system needs python3
clone project
git clone https://[email protected]/JoukeProfijt/snakemake-pipeline.git
cd snakemake-pipeline
create virtual enviroment with snakemake installed
virtualenv -p /usr/bin/python3 venv_location
source venv_location/bin/activate
pip3 install snakemake
run pipline
cd src
snakemake --snakefile pathogen_run.smk
Configuration is split up in 4 sections
-
Run config
-
Data locations
-
Script locations
-
Constants
only one option:
amound of threads to be used. make sure to define --cores in snakemake command if you want to use maxumum threads
thread_max: thread_count
defines where all data is located your run is based on
sample_dir: directory where samples are located
pathogen_fasta: pathogen_fasta file
human_fasta: human fasta file location
defines script locations
get_genomes: path to get_genomes.py, default in scripts
genome_dict_obj: location where generated genome dict is stored
accession_to_name: path to accession_to_name.py, default in scripts
pileup: path to pileup.sh in bbmap install
pileup_mem: how much memory is allocated to the pileup runs
exclude_py: path to exclude_0_Allignment.py
get_total: path to calculate_unidentified.py
R_script: path to barplot R script
configures output paths.
workdir: main workdirectory, default output_dir/
human_index: where the human index is stored, default output_dir/index/human/
pathogen_index: where the pathogen index is stored, default output_dir/index/pathogen/
sam_dir: where sam files will be generated, default output_dir/sam_output/
tmp: temporary directory, default output_dir/tmp/
unmapped_dir: unmapped sample file location, default output_dir/Unmapped/
log_dir: run log location, default output_dir/log/
bam_dir: location for bam files, default output_dir/bam_output/
mapped_bam: location for mapped bam files, default output_dir/bam_output/mapped_bam/
results: location for results, default output_dir/results/
accession: part of results, accession numbers, default output_dir/results/accession_numbers/
science_names: part of resultsm scientific names, default output_dir/results/scientific_names/
single_names: part of results, single names, default output_dir/results/scientific_names/single_names/
- Snakemake - Workflow manager
- Jouke Profijt - Jouke Profijt - bitbucket - CxJuke - github
This project is licensed under the GNU General Public Licencse v3.0 - see the LICENSE.md file for details
- Original Pipeline by Iris Gorter
- project was a assignment in my Bio-informatics Dataprocessing course.