metaT: The Metatranscriptome Workflow

Summary

This workflow is designed to analyze metatranscriptomes.

All parts of this workflow are housed in their own repositories and imported via WDL v1.0 https importing. The following repositories are used in this workflow:

Version

0.0.6

Third party tools and packages

To run this workflow you will need a Docker (Docker ≥ v2.1.0.3) instance and cromwell. All the third party tools are pulled from Dockerhub.

bbtools ≥ v38.94
Python ≥ v3.7.12
pandas ≥ v1.0.5 (python package)
gffutils ≥ v0.10.1 (python package)

Databases

metaT uses the same database uses for metagenome annotation. See README here for required databases. For QC databases see here

Running workflow

In a server with shifter

The submit script will request a node and launch the Cromwell. The Cromwell manages the workflow by using Shifter to run applications.

java -Dconfig.file=wdls/shifter.conf -jar /full/path/to/cromwell-XX.jar run -i input.json /full/path/to/wdls/metaT.wdl

Docker images

Inputs

{
    "metaT.input_files": ["./test_data/small_test/test_small_interleave.fastq.gz"],
    "metaT.project_id":"nmdc:xxxxxxx",
    "metaT.strand_type": "aRNA"
}

Input option descriptions:

project_id: A unique name for your project or sample.
input_file: Full path to the fastq file. The file must be intereleaved paired end fastq.
input_fq1 and input_fq2 if non-interleaved paired end fastqs
strand_type: (optional) RNA strandedness, either left blank, aRNA, or non_stranded_RNA

Outputs

All outputs can be found in the outdir folder. There are following subfolders:

outdir/annotation: contains gff files from annotation run.
outdir/assembly: contains FASTA files from assembly and BAM files where reads were mapped back to the contigs.
outdir/readMapping: JSON files for sense and antisense that have records for feature, their annotations, read counts, ans associated statistics.
outdir/readsQC: contains cleaned reads and a file with associated statistics.

Output JSON

The output file is a JSON formatted file called out.json with JSON records that contains reads and information from annotation. An example JSON record:

        {
        "featuretype": "CDS",
        "seqid": "nmdc:xxxxxxx_001",
        "id": "nmdc:xxxxxxx_001_1_588",
        "source": "Prodigal v2.6.3_patched",
        "start": 1,
        "end": 588,
        "length": 588,
        "strand": "+",
        "frame": "0",
        "product": "hypothetical protein",
        "product_source": "Hypo-rule applied",
        "sense_read_count": 25,
        "mean": 5.0,
        "median": 3.0,
        "stdev": 6.1,
        "antisense_read_count": 28,
        "meanA": 7.14,
        "medianA": 7,
        "stdevA": 5.7
    }

Test

To test the workflow, we have provided a small test dataset and a step by step guidance. See test_data folder.

Name		Name	Last commit message	Last commit date
Latest commit History 295 Commits
.github/workflows		.github/workflows
docker		docker
docs		docs
test_data		test_data
.gitignore		.gitignore
README.md		README.md
input.json		input.json
metaT.slurm		metaT.slurm
metaT.wdl		metaT.wdl
metat_tasks.wdl		metat_tasks.wdl
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

metaT: The Metatranscriptome Workflow

Summary

Version

Third party tools and packages

Databases

Running workflow

In a server with shifter

Docker images

Inputs

Input option descriptions:

Outputs

Output JSON

Test

About

Releases 6

Packages

Contributors 6

Languages

microbiomedata/metaT

Folders and files

Latest commit

History

Repository files navigation

metaT: The Metatranscriptome Workflow

Summary

Version

Third party tools and packages

Databases

Running workflow

In a server with shifter

Docker images

Inputs

Input option descriptions:

Outputs

Output JSON

Test

About

Topics

Resources

Stars

Watchers

Forks

Releases 6

Packages 0

Contributors 6

Languages

Packages