-
Notifications
You must be signed in to change notification settings - Fork 736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add module for TaxonKit name2taxid #4778
Conversation
@nf-core-bot fix linting |
'biocontainers/taxonkit:0.15.1--h9ee0642_0' }" | ||
|
||
input: | ||
tuple val(meta), val(name), path(names_txt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see why you would want the input like this, but I'm not sure if it follows the guidelines since the actual input is a file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file could be easily made using the .collectFile()
operator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just following the docs https://bioinf.shenwei.me/taxonkit/usage/#name2taxid.
Although how do you mean with collectFile()
. The output of that is file
and not [ meta, file ]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes true, but that's nothing some channel logic can't handle :).
@maintainers what are your opinions on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not particularly easy for a new developer though:
Channel.of( [ id: 'test', taxid: 1234 ] )
.tap { ch_data }
.collectFile( newLine: true ) { meta -> [ "${meta.id}.tsv", "$meta.taxid" ] }
.map{ file -> [ file.baseName , file ] }
.join( ch_data.map { meta -> [ meta.id, meta ] }, by: 0 )
.map{ id, idfile, meta -> [ meta, idfile ] }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value is mandatory if there's no file ( and I don't plan on generating that file )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between name
and names_txt
? As in you can supply a string versus a file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you can pipe in the name, or provide a list of names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could have two separate optional input channels, one for string only and one for file (but are mutually exclusive)?
Might look alittle ugly, would be relatively easy for for a pipeline dev to put an []
in the channel they don't want with a (multi)Map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's just adding unnecessary complexity to have to use multiMap
to make this input.
Either way the input will likely need to be formed using a map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor things, but otherwise no blockers from my end now given the input channel structure guidelines appear to be a bit in flux
description: File with taxon names to look up, each on their own line (provide either this or name, not both) | ||
- taxdb: | ||
type: file | ||
description: Taxonomy database unpacked from ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe specify it should be the directory that get's supplied (presumably? from my knowledge of taxdump.tar.gz
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my case it's specifically that file that I use outside of this module.
""" | ||
input[0] = [ | ||
[ id:'test' ], | ||
file("ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz", checkIfExists: true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a mini set of (working) taxdump files for createtaxdb:
https://github.com/nf-core/test-datasets/tree/createtaxdb/data/taxonomy
You're welcome to tar.gz them and add to test datasets if you want to speed up tests if slow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests don't seem slow, but let me see if there's a difference.
Co-authored-by: James A. Fellows Yates <[email protected]>
* master: (47 commits) Update genomescope2 module and add nf-tests (#5399) PLINK/GWAS additional output (#5387) Plink linkage disequilibrium additional output (#5388) new module: Plink epistasis analysis (#5386) Add module for TaxonKit name2taxid (#4778) Update-cellranger-modules (#5016) Update: GTDBTK/CLASSIFYWF (migrate to nf-test and only execute mv when dirs actually present) (#5390) Added MetaPlAn subworkflow (#5356) port `vcftools` to nf-test (#5375) Update modules: rgi (#5383) Update condaforge/mambaforge Docker tag to v24 (#5381) Update gecco + migrate to nf-test (#5372) Update learnmsa module to work with compressed files. (#5276) New module: kraken2/build (#5212) Update kalign module to work with compressed files. (#5277) port `ensemblvep/download` to nf-test (#5373) port `manta-germline` to nf-test (#5362) port `tiddit-sv` to nf-test and add `--threads` (#5371) Decoupler module (#5317) fix optional cram support in modules and migrate to nf-test (#5329) ...
PR checklist
Adds the module TaxonKit name2taxid
Closes #XXX
versions.yml
file.label
PROFILE=docker pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware
PROFILE=singularity pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware
PROFILE=conda pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware