diff --git a/CHANGELOG.md b/CHANGELOG.md
index 3d48b275..c5ea8246 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,8 +7,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### `Changed`
+- [#135](https://github.com/nf-core/bacass/pull/135) Replaced nf-core MultiQC module with a custom MultiQC module.
+
### `Added`
+- [#135](https://github.com/nf-core/bacass/pull/135) Implementation of KmerFinder subworkflow Custom Quast, and Custom MultiQC Reports:
+
+ - Added KmerFinder subworkflow for read quality control, purity assessment, and sample grouping based on reference genome estimation.
+ - Enhanced Quast Assembly QC to run both general and reference genome-based analyses when KmerFinder is invoked.
+ - Implemented custom MultiQC module with multiqc_config.yml files for different assembly modes (short, long, hybrid).
+ - Generated custom MultiQC HTML report consolidating metrics from KmerFinder, Quast, and other relevant sources.
+
- [#133](https://github.com/nf-core/bacass/pull/133) Update nf-core/bacass to the new nf-core 2.14.1 `TEMPLATE`.
### `Fixed`
diff --git a/README.md b/README.md
index af950fd1..589004f5 100644
--- a/README.md
+++ b/README.md
@@ -29,11 +29,12 @@ On release, automated continuous integration tests run the pipeline on a full-si
### Short Read Assembly
-This pipeline is primarily for bacterial assembly of next-generation sequencing reads. It can be used to quality trim your reads using [FastP](https://github.com/OpenGene/fastp) and performs basic sequencing QC using [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Afterwards, the pipeline performs read assembly using [Unicycler](https://github.com/rrwick/Unicycler). Contamination of the assembly is checked using [Kraken2](https://ccb.jhu.edu/software/kraken2/) to verify sample purity.
+This pipeline is primarily for bacterial assembly of next-generation sequencing reads. It can be used to quality trim your reads using [FastP](https://github.com/OpenGene/fastp) and performs basic sequencing QC using [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Afterwards, the pipeline performs read assembly using [Unicycler](https://github.com/rrwick/Unicycler). Contamination of the assembly is checked using [Kraken2](https://ccb.jhu.edu/software/kraken2/) and [Kmerfinder](https://bitbucket.org/genomicepidemiology/kmerfinder/src/master/) to verify sample purity.
### Long Read Assembly
-For users that only have Nanopore data, the pipeline quality trims these using [PoreChop](https://github.com/rrwick/Porechop) and assesses basic sequencing QC utilizing [NanoPlot](https://github.com/wdecoster/NanoPlot) and [PycoQC](https://github.com/a-slide/pycoQC).
+For users that only have Nanopore data, the pipeline quality trims these using [PoreChop](https://github.com/rrwick/Porechop) and assesses basic sequencing QC utilizing [NanoPlot](https://github.com/wdecoster/NanoPlot) and [PycoQC](https://github.com/a-slide/pycoQC). Contamination of the assembly is checked using [Kraken2](https://ccb.jhu.edu/software/kraken2/) and [Kmerfinder](https://bitbucket.org/genomicepidemiology/kmerfinder/src/master/) to verify sample purity.
+
The pipeline can then perform long read assembly utilizing [Unicycler](https://github.com/rrwick/Unicycler), [Miniasm](https://github.com/lh3/miniasm) in combination with [Racon](https://github.com/isovic/racon), [Canu](https://github.com/marbl/canu) or [Flye](https://github.com/fenderglass/Flye) by using the [Dragonflye](https://github.com/rpetit3/dragonflye)(\*) pipeline. Long reads assembly can be polished using [Medaka](https://github.com/nanoporetech/medaka) or [NanoPolish](https://github.com/jts/nanopolish) with Fast5 files.
> [!NOTE]
@@ -47,6 +48,10 @@ For users specifying both short read and long read (NanoPore) data, the pipeline
In all cases, the assembly is assessed using [QUAST](http://bioinf.spbau.ru/quast). The resulting bacterial assembly is furthermore annotated using [Prokka](https://github.com/tseemann/prokka), [Bakta](https://github.com/oschwengers/bakta) or [DFAST](https://github.com/nigyta/dfast_core).
+If Kmerfinder is invoked, the pipeline will group samples according to the [Kmerfinder](https://bitbucket.org/genomicepidemiology/kmerfinder/src/master/)-estimated reference genomes. Afterwards, two QUAST steps will be carried out: an initial ('general') [QUAST](http://bioinf.spbau.ru/quast) of all samples without reference genomes, and subsequently, a 'by reference genome' [QUAST](http://bioinf.spbau.ru/quast) to aggregate samples with their reference genomes.
+
+> NOTE: This scenario is supported when [Kmerfinder](https://bitbucket.org/genomicepidemiology/kmerfinder/src/master/) analysis is performed only.
+
## Usage
> [!NOTE]
diff --git a/assets/multiqc_config_hybrid.yml b/assets/multiqc_config_hybrid.yml
new file mode 100644
index 00000000..4c036265
--- /dev/null
+++ b/assets/multiqc_config_hybrid.yml
@@ -0,0 +1,166 @@
+report_comment: >
+ This report has been generated by the nf-core/bacass
+ analysis pipeline. For information about how to interpret these results, please see the
+ documentation.
+
+data_format: "yaml"
+
+max_table_rows: 10000
+
+run_modules:
+ - custom_content
+ - fastqc
+ - fastp
+ - nanostat
+ - porechop
+ - pycoqc
+ - kraken2
+ - quast
+ - prokka
+ - bakta
+
+exclude_modules:
+ - general_stats
+
+module_order:
+ - fastqc:
+ name: "PREPROCESS: FastQC (raw reads)"
+ info: "This section of the report shows FastQC results for the raw reads before adapter trimming."
+ path_filters:
+ - "./fastqc/*.zip"
+ - fastp:
+ name: "PREPROCESS: fastp (adapter trimming)"
+ info: "This section of the report shows fastp results for reads after adapter and quality trimming."
+ path_filters:
+ - "./fastp/*.json"
+ - nanostat:
+ name: "PREPROCESS: Nanoplot"
+ info: "This section of the report shows Nanoplot results for nanopore sequencing data."
+ path_filters:
+ - "./nanoplot/*.txt"
+ - porechop:
+ name: "PREPROCESS: Porechop"
+ info: "This section of the report shows Porechop results for reads after adapter trimming."
+ path_filters:
+ - "./porechop/*.log"
+ - pycoqc:
+ name: "PREPROCESS: PycoQC"
+ info: "This section of the report shows PycoQC results for quality control of long-read sequencing data."
+ path_filters:
+ - "./pycoqc/*.txt"
+ - kraken2:
+ name: "CONTAMINATION ANALYSIS: Kraken 2"
+ info: "This section of the report shows Kraken 2 classification results for reads after adapter trimming with fastp."
+ path_filters:
+ - ".*kraken2_*/*report.txt"
+ - quast:
+ name: "ASSEMBLY: Quast"
+ info: "This section of the report shows Quast QC results for assembled genomes with Unicycler."
+ path_filters:
+ - "./quast/*/report.tsv"
+ - prokka:
+ name: "ANNOTATION: Prokka"
+ info: "This section of the report shows Prokka annotation results for reads after adapter trimming and quality trimming."
+ path_filters:
+ - "./prokka/*.txt"
+ - bakta:
+ name: "ANNOTATION: Bakta"
+ info: "This section of the report shows Bakta mapping and annotation results for reads after adapter trimming."
+ path_filters:
+ - "./bakta/*.txt"
+
+report_section_order:
+ fastqc:
+ after: general_stats
+ fastp:
+ after: general_stats
+ nanostat:
+ after: general_stats
+ porechop:
+ before: nanostat
+ kraken2:
+ after: general_stats
+ quast:
+ after: general_stats
+ prokka:
+ before: nf-core-bacass-methods-description
+ bakta:
+ before: nf-core-bacass-methods-description
+ nf-core-bacass-methods-description:
+ order: -1000
+ software_versions:
+ order: -1001
+ nf-core-bacass-summary:
+ order: -1002
+
+custom_data:
+ summary_assembly_metrics:
+ section_name: "De novo assembly metrics (shorts & long reads)"
+ description: "generated by nf-core/bacass"
+ plot_type: "table"
+ headers:
+ "Sample":
+ description: "Input sample names"
+ format: "{:,.0f}"
+ "# Input short reads":
+ description: "Total number of input reads in raw fastq files"
+ format: "{:,.0f}"
+ "# Trimmed short reads (fastp)":
+ description: "Total number of reads remaining after adapter/quality trimming with fastp"
+ format: "{:,.0f}"
+ "# Input long reads":
+ description: "Total number of input reads in raw fastq files"
+ format: "{:,.0f}"
+ "# Median long reads lenght":
+ description: "Median read lenght (bp)"
+ format: "{:,.0f}"
+ "# Median long reads quality":
+ description: "Median read quality (Phred scale)"
+ format: "{:,.0f}"
+ "# Contigs (hybrid assembly)":
+ description: "Total number of contigs calculated by QUAST"
+ format: "{:,.0f}"
+ "# Largest contig (hybrid assembly)":
+ description: "Size of largest contig calculated by QUAST"
+ format: "{:,.0f}"
+ "# N50 (hybrid assembly)":
+ description: "N50 metric for de novo assembly as calculated by QUAST"
+ format: "{:,.0f}"
+ "# % Genome fraction (hybrid assembly)":
+ description: "% genome fraction calculated by QUAST"
+ format: "{:,.2f}"
+ "# Best hit (Kmerfinder)":
+ description: "Specie name of the best hit from Kmerfinder (using short reads)"
+ format: "{:,.0f}"
+ "# Best hit assembly ID (Kmerfinder)":
+ description: "Assembly ID of the best hit from Kmerfinder (using short reads)"
+ format: "{:,.0f}"
+ "# Best hit query coverage (Kmerfinder)":
+ description: "Query coverage value of the best hit from Kmerfinder (using short reads)"
+ format: "{:,.0f}"
+ "# Best hit depth (Kmerfinder)":
+ description: "Depth of the best hit from Kmerfinder (using short reads)"
+ format: "{:,.0f}"
+ "# Second hit (Kmerfinder)":
+ description: "Specie name of the second hit from Kmerfinder (using short reads)"
+ format: "{:,.0f}"
+ "# Second hit assembly ID (Kmerfinder)":
+ description: "Assembly ID of the second hit from Kmerfinder (using short reads)"
+ format: "{:,.0f}"
+ "# Second hit query coverage (Kmerfinder)":
+ description: "Query coverage value of the second hit from Kmerfinder (using short reads)"
+ format: "{:,.0f}"
+ "# Second hit depth (Kmerfinder)":
+ description: "Depth of the second hit from Kmerfinder (using short reads)"
+ format: "{:,.0f}"
+
+export_plots: true
+
+# # Customise the module search patterns to speed up execution time
+# # - Skip module sub-tools that we are not interested in
+# # - Replace file-content searching with filename pattern searching
+# # - Don't add anything that is the same as the MultiQC default
+# # See https://multiqc.info/docs/#optimise-file-search-patterns for details
+sp:
+ fastp:
+ fn: "*.fastp.json"
diff --git a/assets/multiqc_config_long.yml b/assets/multiqc_config_long.yml
new file mode 100644
index 00000000..51795ec6
--- /dev/null
+++ b/assets/multiqc_config_long.yml
@@ -0,0 +1,140 @@
+report_comment: >
+ This report has been generated by the nf-core/bacass
+ analysis pipeline. For information about how to interpret these results, please see the
+ documentation.
+
+data_format: "yaml"
+
+max_table_rows: 10000
+
+run_modules:
+ - custom_content
+ - nanostat
+ - porechop
+ - pycoqc
+ - kraken2
+ - quast
+ - prokka
+ - bakta
+
+exclude_modules:
+ - general_stats
+
+module_order:
+ - nanostat:
+ name: "PREPROCESS: Nanoplot"
+ info: "This section of the report shows Nanoplot results for nanopore sequencing data."
+ path_filters:
+ - "./nanoplot/*.txt"
+ - porechop:
+ name: "PREPROCESS: Porechop"
+ info: "This section of the report shows Porechop results for reads after adapter trimming."
+ path_filters:
+ - "./porechop/*.log"
+ - pycoqc:
+ name: "PREPROCESS: PycoQC"
+ info: "This section of the report shows PycoQC results for quality control of long-read sequencing data."
+ path_filters:
+ - "./pycoqc/*.txt"
+ - kraken2:
+ name: "CONTAMINATION ANALYSIS: Kraken 2"
+ info: "This section of the report shows Kraken 2 classification results for reads after adapter trimming with fastp."
+ path_filters:
+ - ".*kraken2_*/*report.txt"
+ - quast:
+ name: "ASSEMBLY: Quast"
+ info: "This section of the report shows Quast QC results for assembled genomes with Unicycler."
+ path_filters:
+ - "./quast/*/report.tsv"
+ - prokka:
+ name: "ANNOTATION: Prokka"
+ info: "This section of the report shows Prokka annotation results for reads after adapter trimming and quality trimming."
+ path_filters:
+ - "./prokka/*.txt"
+ - bakta:
+ name: "ANNOTATION: Bakta"
+ info: "This section of the report shows Bakta mapping and annotation results for reads after adapter trimming."
+ path_filters:
+ - "./bakta/*.txt"
+
+report_section_order:
+ nanostat:
+ after: general_stats
+ porechop:
+ before: nanostat
+ kraken2:
+ after: general_stats
+ quast:
+ after: general_stats
+ prokka:
+ before: nf-core-bacass-methods-description
+ bakta:
+ before: nf-core-bacass-methods-description
+ nf-core-bacass-methods-description:
+ order: -1000
+ software_versions:
+ order: -1001
+ nf-core-bacass-summary:
+ order: -1002
+
+custom_data:
+ summary_assembly_metrics:
+ section_name: "De novo assembly metrics (long-reads)"
+ description: "generated by nf-core/bacass"
+ plot_type: "table"
+ headers:
+ "Sample":
+ description: "Input sample names"
+ format: "{:,.0f}"
+ "# Input reads":
+ description: "Total number of input reads in raw fastq files"
+ format: "{:,.0f}"
+ "# Median read lenght":
+ description: "Median read lenght (bp)"
+ format: "{:,.0f}"
+ "# Median read quality":
+ description: "Median read quality (Phred scale)"
+ format: "{:,.0f}"
+ "# Contigs":
+ description: "Total number of contigs calculated by QUAST"
+ format: "{:,.0f}"
+ "# Largest contig":
+ description: "Size of largest contig calculated by QUAST"
+ format: "{:,.0f}"
+ "# N50":
+ description: "N50 metric for de novo assembly as calculated by QUAST"
+ format: "{:,.0f}"
+ "# % Genome fraction":
+ description: "% genome fraction calculated by QUAST"
+ format: "{:,.2f}"
+ "# Best hit (Kmerfinder)":
+ description: "Specie name of the best hit from Kmerfinder"
+ format: "{:,.0f}"
+ "# Best hit assembly ID (Kmerfinder)":
+ description: "Assembly ID of the best hit from Kmerfinder"
+ format: "{:,.0f}"
+ "# Best hit query coverage (Kmerfinder)":
+ description: "Query coverage value of the best hit from Kmerfinder"
+ format: "{:,.0f}"
+ "# Best hit depth (Kmerfinder)":
+ description: "Depth of the best hit from Kmerfinder"
+ format: "{:,.0f}"
+ "# Second hit (Kmerfinder)":
+ description: "Specie name of the second hit from Kmerfinder"
+ format: "{:,.0f}"
+ "# Second hit assembly ID (Kmerfinder)":
+ description: "Assembly ID of the second hit from Kmerfinder"
+ format: "{:,.0f}"
+ "# Second hit query coverage (Kmerfinder)":
+ description: "Query coverage value of the second hit from Kmerfinder"
+ format: "{:,.0f}"
+ "# Second hit depth (Kmerfinder)":
+ description: "Depth of the second hit from Kmerfinder"
+ format: "{:,.0f}"
+
+export_plots: true
+# # Customise the module search patterns to speed up execution time
+# # - Skip module sub-tools that we are not interested in
+# # - Replace file-content searching with filename pattern searching
+# # - Don't add anything that is the same as the MultiQC default
+# # See https://multiqc.info/docs/#optimise-file-search-patterns for details
diff --git a/assets/multiqc_config_short.yml b/assets/multiqc_config_short.yml
new file mode 100644
index 00000000..2ce2eca6
--- /dev/null
+++ b/assets/multiqc_config_short.yml
@@ -0,0 +1,135 @@
+report_comment: >
+ This report has been generated by the nf-core/bacass
+ analysis pipeline. For information about how to interpret these results, please see the
+ documentation.
+
+data_format: "yaml"
+
+max_table_rows: 10000
+
+run_modules:
+ - custom_content
+ - fastqc
+ - fastp
+ - kraken2
+ - quast
+ - prokka
+ - bakta
+
+exclude_modules:
+ - general_stats
+
+module_order:
+ - fastqc:
+ name: "PREPROCESS: FastQC (raw reads)"
+ info: "This section of the report shows FastQC results for the raw reads before adapter trimming."
+ path_filters:
+ - "./fastqc/*.zip"
+ - fastp:
+ name: "PREPROCESS: fastp (adapter trimming)"
+ info: "This section of the report shows fastp results for reads after adapter and quality trimming."
+ path_filters:
+ - "./fastp/*.json"
+ - kraken2:
+ name: "CONTAMINATION ANALYSIS: Kraken 2"
+ info: "This section of the report shows Kraken 2 classification results for reads after adapter trimming with fastp."
+ path_filters:
+ - ".*kraken2_*/*report.txt"
+ - quast:
+ name: "ASSEMBLY: Quast"
+ info: "This section of the report shows Quast QC results for assembled genomes with Unicycler."
+ path_filters:
+ - "./quast/*/report.tsv"
+ - prokka:
+ name: "ANNOTATION: Prokka"
+ info: "This section of the report shows Prokka annotation results for reads after adapter trimming and quality trimming."
+ path_filters:
+ - "./prokka/*.txt"
+ - bakta:
+ name: "ANNOTATION: Bakta"
+ info: "This section of the report shows Bakta mapping and annotation results for reads after adapter trimming."
+ path_filters:
+ - "./bakta/*.txt"
+
+report_section_order:
+ fastqc:
+ after: general_stats
+ fastp:
+ after: general_stats
+ kraken2:
+ after: general_stats
+ quast:
+ after: general_stats
+ prokka:
+ before: nf-core-bacass-methods-description
+ bakta:
+ before: nf-core-bacass-methods-description
+ nf-core-bacass-methods-description:
+ order: -1000
+ software_versions:
+ order: -1001
+ nf-core-bacass-summary:
+ order: -1002
+
+custom_data:
+ summary_assembly_metrics:
+ section_name: "De novo assembly metrics (short-reads)"
+ description: "generated by nf-core/bacass"
+ plot_type: "table"
+ headers:
+ "Sample":
+ description: "Input sample names"
+ format: "{:,.0f}"
+ "# Input reads":
+ description: "Total number of input reads in raw fastq files"
+ format: "{:,.0f}"
+ "# Trimmed reads (fastp)":
+ description: "Total number of reads remaining after adapter/quality trimming with fastp"
+ format: "{:,.0f}"
+ "# Contigs":
+ description: "Total number of contigs calculated by QUAST"
+ format: "{:,.0f}"
+ "# Largest contig":
+ description: "Size of largest contig calculated by QUAST"
+ format: "{:,.0f}"
+ "# N50":
+ description: "N50 metric for de novo assembly as calculated by QUAST"
+ format: "{:,.0f}"
+ "# % Genome fraction":
+ description: "% genome fraction calculated by QUAST"
+ format: "{:,.2f}"
+ "# Best hit (Kmerfinder)":
+ description: "Specie name of the best hit from Kmerfinder"
+ format: "{:,.0f}"
+ "# Best hit assembly ID (Kmerfinder)":
+ description: "Assembly ID of the best hit from Kmerfinder"
+ format: "{:,.0f}"
+ "# Best hit query coverage (Kmerfinder)":
+ description: "Query coverage value of the best hit from Kmerfinder"
+ format: "{:,.0f}"
+ "# Best hit depth (Kmerfinder)":
+ description: "Depth of the best hit from Kmerfinder"
+ format: "{:,.0f}"
+ "# Second hit (Kmerfinder)":
+ description: "Specie name of the second hit from Kmerfinder"
+ format: "{:,.0f}"
+ "# Second hit assembly ID (Kmerfinder)":
+ description: "Assembly ID of the second hit from Kmerfinder"
+ format: "{:,.0f}"
+ "# Second hit query coverage (Kmerfinder)":
+ description: "Query coverage value of the second hit from Kmerfinder"
+ format: "{:,.0f}"
+ "# Second hit depth (Kmerfinder)":
+ description: "Depth of the second hit from Kmerfinder"
+ format: "{:,.0f}"
+
+export_plots: true
+
+# # Customise the module search patterns to speed up execution time
+# # - Skip module sub-tools that we are not interested in
+# # - Replace file-content searching with filename pattern searching
+# # - Don't add anything that is the same as the MultiQC default
+# # See https://multiqc.info/docs/#optimise-file-search-patterns for details
+sp:
+ fastp:
+ fn: "*.fastp.json"
diff --git a/bin/csv_to_yaml.py b/bin/csv_to_yaml.py
new file mode 100755
index 00000000..2a14249b
--- /dev/null
+++ b/bin/csv_to_yaml.py
@@ -0,0 +1,59 @@
+#!/usr/bin/env python
+import sys
+import argparse
+import csv
+import yaml
+
+
+def parse_args(args=None):
+ Description = "Create a yaml file from csv input file grouping samples as keys and resting fields as their value pair."
+
+ Epilog = "Example usage: python csv_to_yaml.py -i myfile.csv -k 'sample_name' -o converted_file"
+ parser = argparse.ArgumentParser(description=Description, epilog=Epilog)
+ parser.add_argument(
+ "-i", "--input", type=str, dest="CSV_FILE", help="Input file in CSV format."
+ )
+
+ parser.add_argument(
+ "-k",
+ "--key_field",
+ type=str,
+ dest="KEY_FIELD",
+ help="Name of the key/column grupping field in the input csv.",
+ )
+
+ parser.add_argument(
+ "-op",
+ "--output_prefix",
+ type=str,
+ default="output_file",
+ dest="OUT_PREFIX",
+ help="Output file name",
+ )
+ return parser.parse_args(args)
+
+
+def parse_csv(csv_file):
+ with open(csv_file, "r") as c:
+ csv_reader = csv.DictReader(c)
+ data = [row for row in csv_reader]
+ return data
+
+
+def create_yaml(data, key, output_prefix):
+ yaml_data = {
+ entry[key]: {k: v for k, v in entry.items() if k != key} for entry in data
+ }
+ with open(output_prefix + ".yaml", "w") as yaml_file:
+ yaml.dump(yaml_data, yaml_file, default_flow_style=False)
+
+
+def main(args=None):
+ args = parse_args(args)
+ file_list = parse_csv(args.CSV_FILE)
+
+ create_yaml(data=file_list, key=args.KEY_FIELD, output_prefix=args.OUT_PREFIX)
+
+
+if __name__ == "__main__":
+ sys.exit(main())
diff --git a/bin/download_reference.py b/bin/download_reference.py
new file mode 100755
index 00000000..88e89364
--- /dev/null
+++ b/bin/download_reference.py
@@ -0,0 +1,161 @@
+#!/usr/bin/env python
+"""
+=============================================================
+HEADER
+=============================================================
+INSTITUTION: BU-ISCIII
+AUTHOR: Guillermo J. Gorines Cordero
+EDITED BY: Daniel VM
+VERSION: 0.1
+CREATED: Early 2022
+REVISED: 18-2-2022
+EDITED: 14-11-2023
+DESCRIPTION: 20-05-2024
+ Given a file with the kmerfinder results and frequencies (probably
+ created by find_common_reference.py), and the NCBI assembly sheet,
+ download the top-reference genome, gff and protein files from
+ the NCBI ftp.
+
+INPUT:
+ -FILE: file containing the ranking of references from kmerfinder created by the script find_common_references
+ -REFERENCE: file with the NCBI reference list
+ -OUTDIR: name of the output dir
+
+OUTPUT:
+ - *_fna.gz: file with the top-reference genome
+ - *_gff.gz: file with the top-reference gff
+ - *_protein.gz: file with the top-reference proteins
+
+USAGE:
+ python download_reference.py
+ -file [FILE]
+ -reference [REFERENCE]
+ -out_dir [OUTDIR]
+
+REQUIREMENTS:
+ -Python >= 3.6
+ -Python wget
+
+DISCLAIMER:
+ This script has been designed for the assembly pipeline of BU-ISCIII.
+ Feel free to use it at will, however we dont guarantee its success
+ outside its purpose.
+================================================================
+END_OF_HEADER
+================================================================
+"""
+
+import sys
+import argparse
+import os
+
+# import wget
+import requests
+
+
+# TODO: Generate report
+def parse_args(args=None):
+ Description = "download the reference files \
+ (fna, faa, gff)from the reference NCBI file."
+ Epilog = """Usage example: \
+ python download_reference.py \
+ -file \
+ -reference \
+ -out_dir