This is the Nextstrain build for Ebola, visible at nextstrain.org/ebola.
The build encompasses fetching data, preparing it for analysis, doing quality control, performing analyses, and saving the results in a format suitable for visualization (with auspice). This involves running components of Nextstrain such as fauna and augur.
All Ebola-specific steps and functionality for the Nextstrain pipeline should be housed in this repository.
If you're unfamiliar with Nextstrain builds, you may want to follow our quickstart guide first and then come back here.
The easiest way to run this pathogen build is using the Nextstrain command-line tool:
nextstrain build .
See the nextstrain-cli README for how to install the nextstrain
command.
Alternatively, you should be able to run the build using snakemake
within an
suitably-configured local environment. Details of setting that up are not yet
well-documented, but will be in the future.
Build output goes into the directories data/
, results/
and auspice/
.
Once you've run the build, you can view the results in auspice:
nextstrain view auspice/
Configuration takes place entirely with the Snakefile
. This can be read top-to-bottom, each rule
specifies its file inputs and output and also its parameters. There is little redirection and each
rule should be able to be reasoned with on its own.
This build starts by pulling sequences from our live fauna database (a RethinkDB instance). This
requires environment variables RETHINK_HOST
and RETHINK_AUTH_KEY
to be set.
If you don't have access to our database, you can run the build using the example data provided in
this repository. Before running the build, copy the example sequences into the data/
directory
like so:
mkdir -p data/ cp example_data/ebola.fasta data/