Skip to content

Commit

Permalink
Merge branch 'master' into bump_versions
Browse files Browse the repository at this point in the history
  • Loading branch information
scanon committed Jul 6, 2022
2 parents 8160000 + e9c27b7 commit 62ba609
Show file tree
Hide file tree
Showing 7 changed files with 294 additions and 58 deletions.
36 changes: 36 additions & 0 deletions Docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
FROM ubuntu:20.04

LABEL base.image="ubuntu:20.04"
LABEL dockerfile.version="1"
LABEL software="BBTools"
LABEL software.version="38.96"
LABEL description="A set of tools labeled as \"Bestus Bioinformaticus\""
LABEL website="https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/"
LABEL license="https://jgi.doe.gov/disclaimer/"
LABEL maintainer="Chienchi Lo"
LABEL maintainer.email="[email protected]"

ENV DEBIAN_FRONTEND=noninteractive
ENV LANG=en_US.UTF-8
ENV JAVA_HOME=/usr/java/openjdk-13
ENV PATH=/usr/java/openjdk-13/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
ENV JAVA_VERSION=13.0.1
ENV JAVA_URL=https://download.java.net/java/GA/jdk13.0.1/cec27d702aa74d5a8630c65ae61e4305/9/GPL/openjdk-13.0.1_linux-x64_bin.tar.gz
ENV JAVA_SHA256=2e01716546395694d3fad54c9b36d1cd46c5894c06f72d156772efbcf4b41335


RUN apt-get update && apt-get install -y build-essential file python \
wget samtools curl && \
apt-get autoclean && rm -rf /var/lib/apt/lists/*

# openjdk-13-jre \
RUN /bin/sh -c set -eux; curl -fL -o /openjdk.tgz "$JAVA_URL"; echo "$JAVA_SHA256 */openjdk.tgz" | sha256sum -c -; mkdir -p "$JAVA_HOME"; tar --extract --file /openjdk.tgz --directory "$JAVA_HOME" --strip-components 1; rm /openjdk.tgz; ln -sfT "$JAVA_HOME" /usr/java/default; ln -sfT "$JAVA_HOME" /usr/java/latest; for bin in "$JAVA_HOME/bin/"*; do base="$(basename "$bin")"; [ ! -e "/usr/bin/$base" ]; update-alternatives --install "/usr/bin/$base" "$base" "$bin" 20000; done; java -Xshare:dump; java --version; javac --version

RUN wget https://sourceforge.net/projects/bbmap/files/BBMap_38.96.tar.gz && \
tar -xzf BBMap_38.96.tar.gz && \
rm BBMap_38.96.tar.gz

ENV PATH="${PATH}:/bbmap"\
LC_ALL=C

WORKDIR /data
14 changes: 10 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Summary

This workflow is replicate the [QA protocol](https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/data-preprocessing/) implemented at JGI for Illumina reads and use the program “rqcfilter2” from BBTools(38:44) which implements them as a pipeline.
This workflow is a replicate of the [QA protocol](https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/data-preprocessing/) implemented at JGI for Illumina reads and use the program “rqcfilter2” from BBTools(38:96) which implements them as a pipeline.

## Required Database

Expand All @@ -27,15 +27,18 @@ Description of the files:

## The Docker image and Dockerfile can be found here

[microbiomedata/bbtools:38.44](https://hub.docker.com/r/microbiomedata/bbtools)
[microbiomedata/bbtools:38.92](https://hub.docker.com/r/microbiomedata/bbtools)

## Input files

1. database path,
2. fastq (illumina paired-end interleaved fastq),
3. output path
4. memory (optional) ex: "jgi_rqcfilter.memory": "35G"
5. threads (optional) ex: "jgi_rqcfilter.threads": "16"
4. input_interleaved (boolean)
5. forwards reads fastq file (when input_interleaved is false)
6. reverse reads fastq file (when input_interleaved is false)
7. memory (optional) ex: "jgi_rqcfilter.memory": "35G"
8. threads (optional) ex: "jgi_rqcfilter.threads": "16"

```
{
Expand All @@ -46,6 +49,9 @@ Description of the files:
"/global/cfs/cdirs/m3408/ficus/8434.3.102077.ATGTCA.fastq.gz"
],
"jgi_rqcfilter.outdir": "/global/cfs/cdirs/m3408/ficus_rqcfiltered",
"jgi_rqcfilter.input_interleaved": true,
"jgi_rqcfilter.input_fq1":[],
"jgi_rqcfilter.input_fq2":[],
"jgi_rqcfilter.memory": "35G",
"jgi_rqcfilter.threads": "16"
}
Expand Down
34 changes: 31 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,28 @@ Workflow Overview

This workflow utilizes the program “rqcfilter2” from BBTools to perform quality control on raw Illumina reads. The workflow performs quality trimming, artifact removal, linker trimming, adapter trimming, and spike-in removal (using BBDuk), and performs human/cat/dog/mouse/microbe removal (using BBMap).

The following parameters are used for "rqcfilter2" in this workflow::
- qtrim=r : Quality-trim from right ends before mapping.
- trimq=0 : Trim quality threshold.
- maxns=3 : Reads with more Ns than this will be discarded.
- maq=3 : Reads with average quality (before trimming) below this will be discarded.
- minlen=51 : Reads shorter than this after trimming will be discarded. Pairs will be discarded only if both are shorter.
- mlf=0.33 : Reads shorter than this fraction of original length after trimming will be discarded.
- phix=true : Remove reads containing phiX kmers.
- khist=true : Generate a kmer-frequency histogram of the output data.
- kapa=true : Remove and quantify kapa tag
- trimpolyg=5 : Trim reads that start or end with a G polymer at least this long
- clumpify=true : Run clumpify; all deduplication flags require this.
- removehuman=true : Remove human reads via mapping.
- removedog=true : Remove dog reads via mapping.
- removecat=true : Remove cat reads via mapping.
- removemouse=true : Remove mouse reads via mapping.
- barcodefilter=false : Disable improper barcodes filter
- chastityfilter=false: Remove illumina reads failing chastity filter.
- trimfragadapter=true: Trim all known Illumina adapter sequences, including TruSeq and Nextera.
- removemicrobes=true : Remove common contaminant microbial reads via mapping, and place them in a separate file.

Workflow Availability
---------------------

Expand Down Expand Up @@ -39,7 +61,7 @@ Workflow Dependencies
Third party software (This is included in the Docker image.)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- `BBTools v38.90 <https://jgi.doe.gov/data-and-tools/bbtools/>`_ (License: `BSD-3-Clause-LBNL <https://bitbucket.org/berkeleylab/jgi-bbtools/src/master/license.txt>`_)
- `BBTools v38.96 <https://jgi.doe.gov/data-and-tools/bbtools/>`_ (License: `BSD-3-Clause-LBNL <https://bitbucket.org/berkeleylab/jgi-bbtools/src/master/license.txt>`_)

Requisite database
~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -79,8 +101,11 @@ A JSON file containing the following information:
1. the path to the database
2. the path to the interleaved fastq file (input data)
3. the path to the output directory
4. (optional) parameters for memory
5. (optional) number of threads requested
4. input_interleaved (boolean)
5. forwards reads fastq file (when input_interleaved is false)
6. reverse reads fastq file (when input_interleaved is false)
7. (optional) parameters for memory
8. (optional) number of threads requested


An example input JSON file is shown below:
Expand All @@ -92,6 +117,9 @@ An example input JSON file is shown below:
"jgi_rqcfilter.input_files": [
"/path/to/SRR7877884-int-0.1.fastq.gz "
],
"jgi_rqcfilter.input_interleaved": true,
"jgi_rqcfilter.input_fq1":[],
"jgi_rqcfilter.input_fq2":[],
"jgi_rqcfilter.outdir": "/path/to/rqcfiltered",
"jgi_rqcfilter.memory": "35G",
"jgi_rqcfilter.threads": "16"
Expand Down
7 changes: 5 additions & 2 deletions input.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,8 @@
"/global/cfs/cdirs/m3408/ficus/8434.1.102069.ACAGTG.fastq.gz",
"/global/cfs/cdirs/m3408/ficus/8434.3.102077.ATGTCA.fastq.gz"
],
"jgi_rqcfilter.outdir": "/global/cfs/cdirs/m3408/ficus_rqcfiltered"
}
"jgi_rqcfilter.input_fq1":[],
"jgi_rqcfilter.input_fq2":[],
"jgi_rqcfilter.outdir": "/global/cfs/cdirs/m3408/ficus_rqcfiltered",
"jgi_rqcfilter.input_interleaved": true
}
Loading

0 comments on commit 62ba609

Please sign in to comment.