Pathogen pipline

Identifying pathogens in human post mortem brain.

Using a existing pipline converted to snakemake.

Getting Started

You will need some data to run the pipeline, testdata is included but will not give any good results.

Prerequisites

required tools:

your system needs python3

Installing

clone project

git clone https://[email protected]/JoukeProfijt/snakemake-pipeline.git
cd snakemake-pipeline

create virtual enviroment with snakemake installed

virtualenv -p /usr/bin/python3 venv_location
source venv_location/bin/activate

pip3 install snakemake

run pipline

cd src
snakemake --snakefile pathogen_run.smk

Configuration

Configuration is split up in 4 sections

Run config
Data locations
Script locations
Constants

Run config

only one option:

amound of threads to be used. make sure to define --cores in snakemake command if you want to use maxumum threads

thread_max: thread_count

Data Locations

defines where all data is located your run is based on

sample_dir: directory where samples are located
pathogen_fasta: pathogen_fasta file
human_fasta: human fasta file location

Script locations

defines script locations

get_genomes: path to get_genomes.py, default in scripts
genome_dict_obj: location where generated genome dict is stored
accession_to_name: path to accession_to_name.py, default in scripts
pileup: path to pileup.sh in bbmap install
pileup_mem: how much memory is allocated to the pileup runs
exclude_py: path to exclude_0_Allignment.py
get_total: path to calculate_unidentified.py
R_script: path to barplot R script

Constants

configures output paths.

workdir: main workdirectory, default output_dir/
human_index: where the human index is stored, default output_dir/index/human/
pathogen_index: where the pathogen index is stored, default output_dir/index/pathogen/
sam_dir: where sam files will be generated, default output_dir/sam_output/
tmp: temporary directory, default output_dir/tmp/
unmapped_dir: unmapped sample file location, default output_dir/Unmapped/
log_dir: run log location, default output_dir/log/
bam_dir: location for bam files, default output_dir/bam_output/
mapped_bam: location for mapped bam files, default output_dir/bam_output/mapped_bam/
results: location for results, default output_dir/results/
accession: part of results, accession numbers, default output_dir/results/accession_numbers/
science_names: part of resultsm scientific names, default output_dir/results/scientific_names/
single_names: part of results, single names, default output_dir/results/scientific_names/single_names/

Built With

Snakemake - Workflow manager

Authors

Jouke Profijt - Jouke Profijt - bitbucket - CxJuke - github

License

This project is licensed under the GNU General Public Licencse v3.0 - see the LICENSE.md file for details

Acknowledgments

Original Pipeline by Iris Gorter
project was a assignment in my Bio-informatics Dataprocessing course.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
src		src
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pathogen pipline

Getting Started

Prerequisites

Installing

Configuration

Run config

Data Locations

Script locations

Constants

Built With

Authors

License

Acknowledgments

About

Releases

Packages

Languages

License

jprofijt/Snakemake-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Pathogen pipline

Getting Started

Prerequisites

Installing

Configuration

Run config

Data Locations

Script locations

Constants

Built With

Authors

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages