-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
113 changed files
with
8,518 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Sphinx build info version 1 | ||
# This file records the configuration used when building these files. When it is not found, a full rebuild will be done. | ||
config: 4dc3b1b5f2582bb3fc7bd1eb47eda37b | ||
tags: 645f666f9bcd5a90fca523b33c5a78b7 |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Empty file.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
.. _home-page-about: | ||
|
||
******************* | ||
About Master of Pores 3 | ||
******************* | ||
|
||
.. autosummary:: | ||
:toctree: generated | ||
|
||
|
||
.. |docker| image:: https://img.shields.io/badge/Docker-v20.10.8-blue | ||
.. |status| image:: https://github.com/biocorecrg/master_of_pores/actions/workflows/build.yml/badge.svg | ||
.. |license| image:: https://img.shields.io/badge/License-MIT-yellow.svg | ||
.. |nver| image:: https://img.shields.io/badge/Nextflow-21.04.1-brightgreen | ||
.. |sing| image:: https://img.shields.io/badge/Singularity-v3.2.1-green.svg | ||
|
||
.. list-table:: | ||
:widths: 10 10 10 10 10 | ||
:header-rows: 0 | ||
|
||
* - |docker| | ||
- |status| | ||
- |license| | ||
- |nver| | ||
- |sing| | ||
|
||
`Master of Pores 3 <https://github.com/biocorecrg/master_of_pores>`_ is a collection of pipelines written in Nextflow DSL2 for the analysis of Nanopore data. It can handle reads from direct RNAseq, cDNAseq, DNAseq etc. | ||
|
||
The software is composed by four pipelines: | ||
|
||
- mop_preprocess: preprocessing of input data. Basecalling, demultiplexing, alignment, read counts, and more! | ||
- mop_mod: detecting chemical modifications. It reads the output directly from mop_preprocess | ||
- mop_tail: estimating polyA tail size. It reads the output directly from mop_preprocess | ||
- mop_consensus: it generates a consensus from the predictions from mop_mod. It reads the output directly from mop_mod | ||
|
||
The name is inspired by Metallica's `Master Of Puppets <https://www.youtube.com/watch?v=S7blkui3nQc>`_ | ||
|
||
.. image:: ../img/goku3.png | ||
:width: 600 | ||
|
||
This is a joint project between `CRG bioinformatics core <https://biocore.crg.eu/>`_ and `Epitranscriptomics and RNA Dynamics research group <https://public-docs.crg.es/enovoa/public/website/index.html>`_. | ||
|
||
|
||
Reference | ||
====================== | ||
|
||
If you use this tool, please cite our papers: | ||
|
||
`"Nanopore Direct RNA Sequencing Data Processing and Analysis Using MasterOfPores" <https://link.springer.com/protocol/10.1007/978-1-0716-2962-8_13>`__ Cozzuto L, Delgado-Tejedor A, Hermoso Pulido T, Novoa EM, Ponomarenko J. N. Methods Mol Biol. 2023;2624:185-205. doi: 10.1007/978-1-0716-2962-8_13. | ||
|
||
`"MasterOfPores: A Workflow for the Analysis of Oxford Nanopore Direct RNA Sequencing Datasets" <https://doi.org/10.3389/fgene.2020.00211](https://www.frontiersin.org/articles/10.3389/fgene.2020.00211/full>`_ Luca Cozzuto, Huanle Liu, Leszek P. Pryszcz, Toni Hermoso Pulido, Anna Delgado-Tejedor, Julia Ponomarenko, Eva Maria Novoa. Front. Genet., 17 March 2020. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
******************* | ||
Benchmark | ||
******************* | ||
|
||
We tested MoP on two minION runs using the CRG's HPC where we can run up to 100 jobs in parallel (maximum 8 CPUs each) and using up to 10 GPU cards (GeForce RTX 2080 Ti). The test dataset was published at `ENA <https://www.ebi.ac.uk/>`_ with the accession `ERR5296640 <https://www.ebi.ac.uk/ena/browser/view/ERR5296640>`__ for pU samples and `ERR5303454 <https://www.ebi.ac.uk/ena/browser/view/ERR5303454>`__ for Nm samples. | ||
|
||
|
||
|
||
.. list-table:: Dataset | ||
|
||
* - | ||
- MOP_PREPROCESS | ||
- MOP_MOD | ||
- MOP_TAIL | ||
- MOP_CONSENSUS | ||
* - Input data | ||
- 95 Gb | ||
- 137 Gb | ||
- 137 Gb | ||
- 14 Mb | ||
* - Execution time | ||
- 10 hours | ||
- 6 hours | ||
- 2.5 hours | ||
- 3 mins | ||
* - Work folder | ||
- 382 Gb | ||
- 595 Gb | ||
- 3 Gb | ||
- 25 Mb | ||
* - Output folder | ||
- 137 Gb | ||
- 14 Mb | ||
- 76 Mb | ||
- 13 Mb |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
.. _home-page-changelog: | ||
|
||
************** | ||
CHANGELOG | ||
************** | ||
|
||
.. autosummary:: | ||
:toctree: generated | ||
|
||
Version 3.0 | ||
================ | ||
* mop_preprocess | ||
* We added a custom model for m6A basecalling. It is automatically installed when running INSTALL.sh. For using it you need to indicated ``--pars_tools "drna_tool_splice_m6A_opt.tsv" `` | ||
* We add support to cuda11 for guppy version > 4.4.1. | ||
* Added readucks for improving demultiplexing with guppy (optional). | ||
* New parameter "barcodes" where you can specify a file with barcodes to be kept. Example in **keep_barcodes.txt** | ||
* Adding a `new model for direct RNA basecalling <https://www.biorxiv.org/content/10.1101/2023.11.28.568965v1>`__. | ||
* Added support to dorado basecalling. Not yet supported the demultiplexing | ||
* Also guppy version >= 6.5.x are supported. No need for indicating different command lines for different guppy versions inside tool_opts. The pipeline will get the version and act accordingly | ||
* pod5 are supported for dorado and guppy >= 6.5.x. No fast5 and stats files will be output. This will limit other pipelines. | ||
* mop_tail | ||
* we upgraded tailfindR to version 1.3 | ||
* Tailfinder can be used either in standard mode or nano3p mode (chemistry R10 and R9) by specifying the *tailfindr_mode* to: standard, n3ps_r9 or n3ps_r10. | ||
Version 2.0 | ||
================ | ||
* Completely rewritten using the powerful `DSL2 <https://www.nextflow.io/docs/latest/dsl2.html>`__. | ||
* Subworkflows are stored in the independent repository `BioNextflow <https://github.com/biocorecrg/BioNextflow>`__. | ||
* Global nextflow config is broken down to different profiles (cluster, cloud, local...) | ||
* Added the new module **mop_consensus** | ||
* mop_preprocess (formerly known as nanoPreprocess + nanoPreprocessSimple) | ||
* now can read multiple runs per time using the syntax **"PATH/\*\*/*.fast5"** | ||
* can demultiplex fast5 using guppy too | ||
* deeplexicon can be run on GPU too | ||
* Parameters of each tool are stored in a tsv file. We have different ones already pre-set for cDNA, DNA and dRNA (option **--pars_tools**) | ||
* Added new process **discovery** with bambu / isoquant for discovering and quantifying new transcripts. | ||
* demultiplexing, filtering, mapping, counting and discovery can be switched off by setting "NO" as a parameter | ||
* saveSpace can be set to "YES" to reduce the amount of disk space required. **WARNING This will prevent the possibility to resume!** | ||
* Merged old NanoPreprocess and NanoPreprocessSimple in **mop_preprocess**. Using fastq or fast5 will switch among the two executions. | ||
* Htseq-count now accepts alignments generated by minimap2. https://github.com/htseq/htseq/issues/33 | ||
* We can specify a **final_summary_**.txt** for extracting kit and flowcell info in the params.config file. If not present we should specify those info or a custom model via extra parameters in one of the **\*_opt.tsv** files or guppy will trigger an error. | ||
* This module can be run in AWS BATCH using the profile **awsbatch** | ||
* demultiplexing of fast5 with deeplexicon is now faster thanks to multithreading and parallelization | ||
* mop_tail (formerly known as nanoTail) | ||
* now you can launch each analysis independently | ||
* Fine tuning of parameter for each step in tools_opt.tsv | ||
* mop_mod (formerly known as nanoMod) | ||
* coming SOON! | ||
Version 1.1 | ||
================= | ||
* Added a new module called NanoPreprocessSimple that starts from fastq files instead of fast5 files. It allows the analysis of multiple files at a time. | ||
* Added support to vbz compressed fast5 https://github.com/nanoporetech/vbz_compression in NanoPreprocess, NanoMod and NanoTail | ||
* NanoPreprocess now outputs also CRAM files and can do downsampling with the parameter --downsampling | ||
* NanoPreprocess allows performing variant calling using medaka (BETA) | ||
* NanoPreprocess allows performing demultiplexing with GUPPY | ||
* Added plots for Epinano output in NanoMod | ||
* Added a conversion of Tombo results in bed format in NanoMod | ||
* Added a INSTALL.sh file for automatically retrieve guppy 3.4.5 from https://mirror.oxfordnanoportal.com/, place it in NanoPreprocess/bin and making the required links | ||
* Added profiles for being used locally and on the CRG SGE cluster | ||
Version 1.0 | ||
================ | ||
This is the original version published in the paper `MasterOfPores: A Workflow for the Analysis of Oxford Nanopore Direct RNA Sequencing Datasets <https://www.frontiersin.org/articles/10.3389/fgene.2020.00211/full>`__ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
.. _home-page-about: | ||
|
||
******************* | ||
Continuous integration | ||
******************* | ||
|
||
.. autosummary:: | ||
:toctree: generated | ||
|
||
The following pipelines are continuously checked using GitHub actions: | ||
|
||
* mop_preprocess | ||
* mop_mod | ||
* mop_tail | ||
|
||
.. image:: https://github.com/biocorecrg/master_of_pores/actions/workflows/build.yml/badge.svg | ||
:target: https://github.com/biocorecrg/master_of_pores | ||
:alt: pipeline status |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
.. _home-page-index: | ||
|
||
******************* | ||
Welcome to the documentation of Master Of Pores 3 | ||
******************* | ||
|
||
|
||
.. autosummary:: | ||
:toctree: generated | ||
|
||
.. image:: ../img/goku3.png | ||
:width: 600 | ||
|
||
Master of Pores is a pipeline written in Nextflow DSL2 for the analysis of Nanopore data. It can handle reads from direct RNAseq, cDNAseq, DNAseq etc. | ||
|
||
The pipeline is composed by four modules: | ||
- mop_preprocess: preprocessing | ||
- mop_mod: detecting chemical modifications. It reads the output directly from mop_preprocess | ||
- mop_tail: estimating polyA tail size. It reads the output directly from mop_preprocess | ||
- mop_consensus: it generates a consensus from the predictions from mop_mod. It reads the output directly from mop_mod | ||
|
||
.. MoP3 documentation master file, created by | ||
Luca Cozzuto. | ||
You can adapt this file completely to your liking, but it should at least | ||
contain the root `toctree` directive. | ||
Contents: | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
about | ||
install | ||
mop_preprocess | ||
mop_mod | ||
mop_consensus | ||
mop_tail | ||
reporting | ||
awsbatch | ||
benchmark | ||
changelog | ||
ci | ||
troubleshooting |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
.. _home-page-install: | ||
|
||
************** | ||
Get Started | ||
************** | ||
|
||
.. autosummary:: | ||
:toctree: generated | ||
|
||
Please install nextflow `Nextflow <https://www.nextflow.io/>`_ and either `Singularity <https://sylabs.io/>`_ or `Docker <https://www.docker.com/>`_ before. | ||
|
||
For installing Nextflow you need a POSIX compatible system (Linux, OS X, etc). It requires Bash 3.2 (or later) and Java 11 (or later, up to 17). Windows system is supported through WSL. For the installation of Nextflow just run: | ||
|
||
.. code-block:: console | ||
curl -s https://get.nextflow.io | bash | ||
To install the pipeline you need to download the repo: | ||
|
||
.. code-block:: console | ||
git clone --depth 1 --recurse-submodules https://github.com/biocorecrg/master_of_pores.git | ||
Installing Guppy | ||
============ | ||
|
||
You can use **INSTALL.sh** and the version of Guppy you want to download. | ||
|
||
.. note:: | ||
|
||
Please consider that the support of VBZ compression of fast5 started with version 3.4.X. | ||
|
||
|
||
.. code-block:: console | ||
cd master_of_pores; bash INSTALL.sh 6.0.1 | ||
or for installing the default 3.4.5 | ||
|
||
.. code-block:: console | ||
cd master_of_pores; bash INSTALL.sh | ||
Guppy custom models for RNA basecalling will be downloaded from our repository https://biocore.crg.eu/public/mop3_pub/models.tar and placed automatically within the right path inside the pipeline. | ||
|
||
You can install different versions of Guppy but only one will be run during the pipeline execution. For switching among them you need to run INSTALL.sh with the version you prefer. | ||
|
||
Testing | ||
============ | ||
|
||
.. code-block:: console | ||
cd mop_preprocess | ||
nextflow run mop_preprocess.nf -params-file params.f5.yaml -with-singularity -bg -profile local > log | ||
.. tip:: | ||
|
||
You can replace ```-with-singularity``` with ```-with-docker``` if you want to use the docker engine. | ||
|
||
Profiles | ||
============ | ||
Some nextflow configuration files are stored within the folder **conf** and can be selected using different profiles. Currently, we have: | ||
|
||
- ci: for continuous integration testing (low resources) | ||
- local: for being used in a laptop without GPU support | ||
- m1mac: for running the containers in emulation for being used on M1/M2/M3 Apple processors. | ||
- sge: for being used in an HPC with Sun Grid Engine | ||
- cluster or crg: for being used in the custom HPC environment at CRG | ||
- slurm: for being used in an HPC with SLURM | ||
- awsbatch: for being used in Amazon AWS cloud infrastructure | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
.. _home-page-mopconsensus: | ||
|
||
******************* | ||
MOP_CONSENSUS | ||
******************* | ||
|
||
.. autosummary:: | ||
:toctree: generated | ||
|
||
This pipeline takes as input the output from MOP_MOD with all the four worklows. It outputs the consensus of the diferent predictions running the tool `Nanoconsensus <https://github.com/ADelgadoT/NanoConsensus>`__ in parallel on each transcript for each comparison. | ||
|
||
|
||
Input Parameters | ||
====================== | ||
|
||
The input parameters are stored in yaml files like the one represented here: | ||
|
||
.. literalinclude:: ../mop_consensus/params.yaml | ||
:language: yaml | ||
|
||
|
||
How to run the pipeline | ||
============================= | ||
|
||
Before launching the pipeline,user should: | ||
|
||
1. Decide which containers to use - either docker or singularity **[-with-docker / -with-singularity]**. | ||
2. Fill in both **params.config** and **tools_opt.tsv** files. | ||
|
||
To launch the pipeline, please use the following command: | ||
|
||
.. code-block:: console | ||
nextflow run mop_consensus.nf -params-file params.yaml -with-singularity > log.txt | ||
You can run the pipeline in the background adding the nextflow parameter **-bg**: | ||
|
||
.. code-block:: console | ||
nextflow run mop_consensus.nf -params-file params.yaml -with-singularity -bg > log.txt | ||
You can change the parameters either by changing **params.config** file or by feeding the parameters via command line: | ||
|
||
.. code-block:: console | ||
nextflow run mop_consensus.nf -params-file params.yaml -with-singularity -bg --output test2 > log.txt | ||
You can specify a different working directory with temporary files: | ||
|
||
.. code-block:: console | ||
nextflow run mop_consensus.nf -params-file params.yaml -with-singularity -bg -w /path/working_directory > log.txt | ||
Results | ||
==================== | ||
|
||
Here an example of a result: | ||
|
||
.. image:: ../img/nanocons.png | ||
:width: 800 |
Oops, something went wrong.