If you are currently using bam2bakR, we highly suggest checking out fastq2EZbakR, a more flexible alternative with additional functionality. fastq2EZbakR can do everything bam2bakR can, and a lot more! fastq2EZbakR is accompanied by a similar improvement/extension of bakR, called EZbakR. Check out our preprint for more details. While we will continue to maintain bam2bakR, all future development will occur on fastq2EZbakR. In fact, in the near future, bam2bakR will simply be a fastq2EZbakR wrapper, but with less functionality.
There is now a single website that hosts documentation for all of the Snakemake workflows that I have and will develop, including bam2bakR. Information relevant to bam2bakR on this website includes:
- Introduction to bam2bakR, its required input, and its key output.
- Instructions on how to deploy and run any of these pipelines. Instructions specific to Yale HPC (and somewhat more generally systems that use a SLURM scheduler) are also available.
- Details about configuring bam2bakR.
- Description of all output produced by bam2bakR.
- FAQs.
bam2bakR is a Snakemake implementation of the TimeLapse pipeline developed by the Simon lab at Yale. The contributors to the original pipeline are Matthew Simon, Jeremy Schofield, Martin Machyna, Lea Kiefer, and Joshua Zimmer. bam2bakR takes either bam files (preferred) or fastq files as input, and produces an easy to work with table that is compatible with an R package developed by Isaac Vock (also the developer of bam2bakR) called bakR. bakR analyzes and performs comparative analysis of NR-seq data (TimeLapse-seq, SLAM-seq, etc.); see the bakR repository for more details.
Version 3.0.0 of bam2bakR comes with a major change meant to significantly cut down on pipeline runtime. In previous versions, bam2bakR used HTSeq to assign sequencing reads to annotated genomic features (e.g., genes). In version 3.0.0, HTseq was replaced with featureCounts. While HTSeq might take over an hour to assign all reads in a bam file with 25 million reads to features in a standard human annotation, featureCounts performs the same assignment in a couple minutes. featureCounts, unlike HTseq, is multi-threaded, and thus can be further sped up by providing it with additional cores.
Below is an efficient walk-through of all of the steps necessary to run bam2bakR. A more detailed description of all of these steps can be found here. A description of all of the tunable parameters in the bam2bakR config file can be found here.
###
# PREREQUISITES: INSTALL MAMBA AND GIT (only need to do ONCE)
###
# CREATE ENVIRONMENT (only need to do ONCE)
mamba create -c conda-forge -c bioconda --name deploy_snakemake snakemake snakedeploy
# CREATE AND NAVIGATE TO WORKING DIRECTORY (only need to do ONCE)
mkdir path/to/working/directory
cd path/to/working/directory
# DEPLOY PIPELINE TO YOUR WORKING DIRECTORY (only need to do ONCE)
conda activate deploy_snakemake
snakedeploy deploy-workflow https://github.com/simonlabcode/bam2bakR.git . --branch main
###
# EDIT CONFIG FILE (need to do ONCE PER NEW DATASET)
###
# RUN PIPELINE
# See [here](https://snakemake.readthedocs.io/en/stable/executing/cli.html) for details on all of the configurable parameters
snakemake --cores all --use-conda --rerun-triggers mtime
Please cite the following if you end up using bam2bakR in published work:
TimeLapse-seq paper, where initial pipeline was introduced:
- Schofield JA, Duffy EE, Kiefer L, Sullivan MC, and Simon MD. 2018. TimeLapse-seq: adding a temporal dimension to RNA sequencing through nucleoside recoding. Nature Methods. 15:221-225. doi:10.1038/nmeth.4582.
bakR paper, where Snakemake implementation was introduced:
- Vock IW and Simon MD. 2023. bakR: uncovering differential RNA synthesis and degradation kinetics transcriptome-wide with Bayesian hierarchical modeling. RNA:rna.079451.122. doi:10.1261/rna.079451.122.
If you have any questions or run into any problems, feel free to post them to Issues.