happi
: a
Hierarchical Approach to Pangenomics Inference
happi
is a method for modeling gene presence in pangenomics that
leverages information about genome quality to improve inference. happi
models the association between an experimental condition and gene
presence where the experimental condition is the primary
predictor of interest and gene presence is the outcome while
incorporating user-chosen information on genome quality metrics
(e.g. mean coverage, contamination, completion, etc…).
You might be interested in using happi
to conduct your pangenomics
hypothesis testing if you work with fragmented genomes such as
metagenome assembled genomes (MAGs). happi
is currently distributed as
an R
package and can be installed using the instructions below.
If you’re new to shotgun metagenomics we understand that things can feel
overwhelming! On top of all the tools and names floating around you’re
probably wondering where does happi
fit into the vast suite of
bioinformatics tools for metagenomics data and how can you use it in
your work? happi
can be used after you have assembled, binned,
annotated, and refined your genomes or metagenome-assembled genomes
(MAGs) and as such it can be used with any bioinformatics workflow that
conducts assembly, binning, annotation, and refinement.
We highly recommend checking out Mike Lee’s resources on genomics to orient yourself to the breadth of metagenomics tools for building a bioinformatics pipeline/workflow from raw reads to assembled genomes. We recommend spending time to understand the pros and cons of the tools you’re using at each step of your workflow, particularly as software and methods continue to improve.
If you’re looking for a platform that wraps a lot of these tools into
its code base to conduct assembly, binning, annotation, and refinement
of MAGs we suggest checking out the anvi’o
platform: metagenomics workflow
here and
pangenomics workflow
here. Future work for
our development team is to integrate happi
into the anvi’o platform
for a seamless user experience. In the meantime, please refer to the
format of your data in the following data input section for what is
required to utilize happi
.
if (!require("devtools", quietly = TRUE))
install.packages("devtools") # check that devtools is installed
devtools::install_github("statdivlab/happi", build_vignettes = TRUE) # install happi using devtools
library(happi)
You will need two pieces of information:
- Information on the presence/absence of your genes in your MAGs/genomes; example here
- Metadata/covariate information for the MAGs/genomes that you’d like to use for hypothesis testing; example here
The vignettes provide detailed information of how to use happi
and all
its main functions through the R
interactive session. You can follow
the vignettes by running the following code in R
:
utils::browseVignettes(package = "happi")
An example snakemake workflow of happi
’s usage has been made available
under the workflows/
folder of this github directory. To run the
example workflow you’ll need to install snakemake. We recommend creating
a conda environment with your snakemake installation:
conda install -n base -c conda-forge mamba
conda activate base
mamba create -c conda-forge -c bioconda -n snakemake snakemake
You can activate your snakemake conda environment and check it:
conda activate snakemake
snakemake --help
Once your snakemake has been installed in a conda environment and
activated you can download the contest in the workflows/
folder and
run the following to execute the workflow from within the folder:
snakemake --cores 6
The Snakefile is customizable for your own input data and parameters.
Please refer to the sample data files that have been provided in
workflows
for formatting of your input data.
If you use happi
please cite our work:
An open-access preprint is available here.
If you have any issues using our software or further questions please submit an issue here.