-
Notifications
You must be signed in to change notification settings - Fork 736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add module for TaxonKit name2taxid #4778
Changes from all commits
b3331c2
dc99d4a
9007d6e
1b0dfb7
49aca90
95cc685
92dce42
9f815df
337e75b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
--- | ||
# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json | ||
name: "taxonkit_name2taxid" | ||
channels: | ||
- conda-forge | ||
- bioconda | ||
- defaults | ||
dependencies: | ||
- "bioconda::taxonkit=0.15.1" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
process TAXONKIT_NAME2TAXID { | ||
tag "$meta.id" | ||
label 'process_low' | ||
|
||
conda "${moduleDir}/environment.yml" | ||
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? | ||
'https://depot.galaxyproject.org/singularity/taxonkit:0.15.1--h9ee0642_0': | ||
'biocontainers/taxonkit:0.15.1--h9ee0642_0' }" | ||
|
||
input: | ||
tuple val(meta), val(name), path(names_txt) | ||
path taxdb | ||
|
||
output: | ||
tuple val(meta), path("*.tsv"), emit: tsv | ||
path "versions.yml" , emit: versions | ||
|
||
when: | ||
task.ext.when == null || task.ext.when | ||
|
||
script: | ||
def args = task.ext.args ?: '' | ||
def prefix = task.ext.prefix ?: "${meta.id}" | ||
assert (!name && names_txt) || (name && !names_txt) | ||
""" | ||
taxonkit \\ | ||
name2taxid \\ | ||
$args \\ | ||
--data-dir $taxdb \\ | ||
--threads $task.cpus \\ | ||
--out-file ${prefix}.tsv \\ | ||
${name? "<<< '$name'": names_txt} | ||
|
||
cat <<-END_VERSIONS > versions.yml | ||
"${task.process}": | ||
taxonkit: \$( taxonkit version | sed 's/.* v//' ) | ||
END_VERSIONS | ||
""" | ||
|
||
stub: | ||
def args = task.ext.args ?: '' | ||
def prefix = task.ext.prefix ?: "${meta.id}" | ||
""" | ||
touch ${prefix}.tsv | ||
|
||
cat <<-END_VERSIONS > versions.yml | ||
"${task.process}": | ||
taxonkit: \$( taxonkit version | sed 's/.* v//' ) | ||
END_VERSIONS | ||
""" | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
--- | ||
# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/meta-schema.json | ||
name: "taxonkit_name2taxid" | ||
description: Convert taxon names to TaxIds | ||
keywords: | ||
- taxonomy | ||
- taxids | ||
- taxon name | ||
- conversion | ||
tools: | ||
- "taxonkit": | ||
description: "A Cross-platform and Efficient NCBI Taxonomy Toolkit" | ||
homepage: "https://bioinf.shenwei.me/taxonkit/" | ||
documentation: "https://bioinf.shenwei.me/taxonkit/usage/#name2taxid" | ||
tool_dev_url: "https://github.com/shenwei356/taxonkit" | ||
doi: "10.1016/j.jgg.2021.03.006" | ||
licence: ["MIT"] | ||
|
||
input: | ||
- meta: | ||
type: map | ||
description: | | ||
Groovy Map containing sample information | ||
e.g. `[ id:'sample1', single_end:false ]` | ||
- name: | ||
type: string | ||
description: Taxon name to look up (provide either this or names.txt, not both) | ||
- names_txt: | ||
type: file | ||
description: File with taxon names to look up, each on their own line (provide either this or name, not both) | ||
- taxdb: | ||
type: file | ||
description: Taxonomy database unpacked from ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe specify it should be the directory that get's supplied (presumably? from my knowledge of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In my case it's specifically that file that I use outside of this module. |
||
|
||
output: | ||
- meta: | ||
type: map | ||
description: | | ||
Groovy Map containing sample information | ||
e.g. `[ id:'sample1', single_end:false ]` | ||
- versions: | ||
type: file | ||
description: File containing software versions | ||
pattern: "versions.yml" | ||
- tsv: | ||
type: file | ||
description: TSV file of Taxon names and their taxon ID | ||
pattern: "*.tsv" | ||
|
||
authors: | ||
- "@mahesh-panchal" | ||
maintainers: | ||
- "@mahesh-panchal" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
nextflow_process { | ||
|
||
name "Test Process TAXONKIT_NAME2TAXID" | ||
script "../main.nf" | ||
process "TAXONKIT_NAME2TAXID" | ||
|
||
tag "modules" | ||
tag "modules_nfcore" | ||
tag "untar" | ||
tag "taxonkit" | ||
tag "taxonkit/name2taxid" | ||
|
||
setup { | ||
run("UNTAR"){ | ||
script "modules/nf-core/untar/main.nf" | ||
process { | ||
""" | ||
input[0] = [ | ||
[ id:'test' ], | ||
file("ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz", checkIfExists: true) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have a mini set of (working) taxdump files for createtaxdb: https://github.com/nf-core/test-datasets/tree/createtaxdb/data/taxonomy You're welcome to tar.gz them and add to test datasets if you want to speed up tests if slow? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The tests don't seem slow, but let me see if there's a difference. |
||
] | ||
""" | ||
} | ||
} | ||
} | ||
|
||
test("sarscov2 - name") { | ||
|
||
when { | ||
process { | ||
""" | ||
input[0] = [ | ||
[ id:'test', single_end:false ], // meta map | ||
"SARS-CoV-2", | ||
[] | ||
] | ||
input[1] = UNTAR.out.untar.map{ it[1] } | ||
""" | ||
} | ||
} | ||
|
||
then { | ||
assertAll( | ||
{ assert process.success }, | ||
{ assert snapshot(process.out).match() } | ||
) | ||
} | ||
|
||
} | ||
|
||
test("sarscov2 - list") { | ||
|
||
when { | ||
process { | ||
""" | ||
input[0] = Channel.of( [ | ||
[ id:'test', single_end:false ], // meta map | ||
'' | ||
] ).combine( Channel.of("SARS-CoV-2").collectFile( name:'names.txt', newLine: true ) ) | ||
input[1] = UNTAR.out.untar.map{ it[1] } | ||
""" | ||
} | ||
} | ||
|
||
then { | ||
assertAll( | ||
{ assert process.success }, | ||
{ assert snapshot(process.out).match() } | ||
) | ||
} | ||
|
||
} | ||
|
||
test("sarscov2 - name - stub") { | ||
|
||
options "-stub" | ||
|
||
when { | ||
process { | ||
""" | ||
input[0] = [ | ||
[ id:'test', single_end:false ], // meta map | ||
"SARS-CoV-2", | ||
[] | ||
] | ||
input[1] = UNTAR.out.untar.map{ it[1] } | ||
""" | ||
} | ||
} | ||
|
||
then { | ||
assertAll( | ||
{ assert process.success }, | ||
{ assert snapshot(process.out).match() } | ||
) | ||
} | ||
|
||
} | ||
|
||
} |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
taxonkit/name2taxid: | ||
- "modules/nf-core/taxonkit/name2taxid/**" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see why you would want the input like this, but I'm not sure if it follows the guidelines since the actual input is a file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file could be easily made using the
.collectFile()
operatorThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just following the docs https://bioinf.shenwei.me/taxonkit/usage/#name2taxid.
Although how do you mean with
collectFile()
. The output of that isfile
and not[ meta, file ]
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes true, but that's nothing some channel logic can't handle :).
@maintainers what are your opinions on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not particularly easy for a new developer though:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value is mandatory if there's no file ( and I don't plan on generating that file )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between
name
andnames_txt
? As in you can supply a string versus a file?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you can pipe in the name, or provide a list of names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could have two separate optional input channels, one for string only and one for file (but are mutually exclusive)?
Might look alittle ugly, would be relatively easy for for a pipeline dev to put an
[]
in the channel they don't want with a (multi)Map?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's just adding unnecessary complexity to have to use
multiMap
to make this input.Either way the input will likely need to be formed using a
map