-
Notifications
You must be signed in to change notification settings - Fork 736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add compression to MSA modules #4754
Conversation
Tested locally on conda, let's see if singularity also works. |
…cle5 and mtmalign
For MTMalign, the file is written first to disk, and then compressed on disk, as I wasn't able to cleanly isolate MTMalign's output (it has a hardcoded outfile name for the FASTA output, and pollutes stdout with debug messages). |
This reverts commit 7a0e78d.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to update the stubs. You could add tests to check them as well.
def write_output = compress ? "--force -o >(pigz -cp ${task.cpus} > ${prefix}.aln.gz)" : "> ${prefix}.aln" | ||
// using >() is necessary to preserve the return value, | ||
// so nextflow knows to display an error when it failed | ||
// the --force -o is necessary, as clustalo expands the commandline input, | ||
// causing it to treat the pipe as a parameter and fail | ||
// this way, the command expands to /dev/fd/<id>, and --force allows writing output to an already existing file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a huge fan of this but I guess most pipeline developers will leave it on true
and forget about it, so why not?
stub: | ||
def args = task.ext.args ?: '' | ||
def prefix = task.ext.prefix ?: "${meta.id}" | ||
""" | ||
touch ${prefix}.aln | ||
touch ${prefix}.aln.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should change based on the compress
value. Something like this:
stub:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def output = compress ? "${prefix}.aln.gz" : "${prefix}.aln"
"""
touch ${output}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good catch, didn't change that since introducing the compress input channel. Might also affect some of the other modules, I'll have a look.
$args \\ | ||
-t ${task.cpus} \\ | ||
${fasta} \\ | ||
${prefix}.aln | ||
${prefix}.aln${compress ? '.gz':''} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I'm coming around to this idea a bit more, it seems to be cleaner and harder to mess up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it being the most clean and straightforward to understand/document (edit: compared to the other options we came up with) is I think the main advantage.
Especially for tools like FAMSA which natively support compression its also cleaner than the output format changing based on a parameter passed via ext.args.
* add pigz to clustalo * add compression to muscle5 * enabled compression flag for famsa * added compression to mafft * compression for mtmalign * set to mulled containers * more informative test name * change mtmalign test to search after unzipping * update mtmalign tests to work with gzip, fix typo * regenerate test snaps * muscle5: zip multiple output files, if present * Change MUSCLE5 tests to the same testcase TCOFFEE is using, also fix it * add tags requested by nf-core-lint * add full url to singularity/biocontainers * fix famsa * regenerated snapshots with nf-test 0.8.3. Reenabled snapshots for muscle5 and mtmalign * forgot to regenerate mafft, also mtmalign seems to still be nondeterministic * update metas * compression support for tcoffee modules * added pigz to tools in meta * fix typo * regenerate snaps, adjust test to gzip * added mulled containers for tcoffee * implement compression switching with channel * add tags wanted by lint * regenerate snapshots * whoops, regenerated using container this time * update meta.yml * update glob in meta.yml * support compressed input in irmsd * assign more precise type in meta.yml * add tag flagged by lint to tcoffee/irmsd * set tcoffee/irmsd to use mulled container * tcoffee/irmsd: do not compress template file, and correctly uncompress for irmsd * tcoffee/align: reimplement toggling compression * tcoffee/align: use new pipe name everywhere * tcoffee/align: reenable default html output, add comment * fix escaped line at end of comment... * tcoffee/align: make tcoffee write to stdout, avoid using fifo * clustalo/align: add optional compression * muscle5/super5: add optional compression, also expand tests * update snapshot * muscle5/super5: re-add empty config file * mafft: implement optional output compression, handle compressed input * muscle5/super5: better parallelization for compressed -perm all * mtmalign/align: implement optional compression * mtmalign/align: add pigz to versions.yml * mtmalign/align: fix * regenerate snapshot * famsa/align: implement optional compression * whoops, fix tests * clustalo/align: fix * update snapshots * generate different snapshots for compressed & uncompressed tests, prettify code * updated snapshots * mtmalign/align: update input pattern * tcoffee/alncompare,irmsd: implement jose's suggestion * tcoffee/irmsd: additional test for compressed input * tcoffee/irmsd: add tag required by lint * Revert "mtmalign/align: update input pattern" This reverts commit 7a0e78d. * incorporate adams suggestion, fix stub filename extensions * apparently this requires regenerating the snapshots? * try removing test match names, as per sateesh's suggestion * Revert "try removing test match names, as per sateesh's suggestion" This reverts commit 706d05f. * tcoffee/align change snapshot names * make snapshot names unique for nf-test 0.8.4 --------- Co-authored-by: Leon Rauschning <[email protected]>
* add pigz to clustalo * add compression to muscle5 * enabled compression flag for famsa * added compression to mafft * compression for mtmalign * set to mulled containers * more informative test name * change mtmalign test to search after unzipping * update mtmalign tests to work with gzip, fix typo * regenerate test snaps * muscle5: zip multiple output files, if present * Change MUSCLE5 tests to the same testcase TCOFFEE is using, also fix it * add tags requested by nf-core-lint * add full url to singularity/biocontainers * fix famsa * regenerated snapshots with nf-test 0.8.3. Reenabled snapshots for muscle5 and mtmalign * forgot to regenerate mafft, also mtmalign seems to still be nondeterministic * update metas * compression support for tcoffee modules * added pigz to tools in meta * fix typo * regenerate snaps, adjust test to gzip * added mulled containers for tcoffee * implement compression switching with channel * add tags wanted by lint * regenerate snapshots * whoops, regenerated using container this time * update meta.yml * update glob in meta.yml * support compressed input in irmsd * assign more precise type in meta.yml * add tag flagged by lint to tcoffee/irmsd * set tcoffee/irmsd to use mulled container * tcoffee/irmsd: do not compress template file, and correctly uncompress for irmsd * tcoffee/align: reimplement toggling compression * tcoffee/align: use new pipe name everywhere * tcoffee/align: reenable default html output, add comment * fix escaped line at end of comment... * tcoffee/align: make tcoffee write to stdout, avoid using fifo * clustalo/align: add optional compression * muscle5/super5: add optional compression, also expand tests * update snapshot * muscle5/super5: re-add empty config file * mafft: implement optional output compression, handle compressed input * muscle5/super5: better parallelization for compressed -perm all * mtmalign/align: implement optional compression * mtmalign/align: add pigz to versions.yml * mtmalign/align: fix * regenerate snapshot * famsa/align: implement optional compression * whoops, fix tests * clustalo/align: fix * update snapshots * generate different snapshots for compressed & uncompressed tests, prettify code * updated snapshots * mtmalign/align: update input pattern * tcoffee/alncompare,irmsd: implement jose's suggestion * tcoffee/irmsd: additional test for compressed input * tcoffee/irmsd: add tag required by lint * Revert "mtmalign/align: update input pattern" This reverts commit 7a0e78d. * incorporate adams suggestion, fix stub filename extensions * apparently this requires regenerating the snapshots? * try removing test match names, as per sateesh's suggestion * Revert "try removing test match names, as per sateesh's suggestion" This reverts commit 706d05f. * tcoffee/align change snapshot names * make snapshot names unique for nf-test 0.8.4 --------- Co-authored-by: Leon Rauschning <[email protected]>
Reopened, this time without duplicating commits.
This PR introduces output compression to the FAMSA, MTMalign, MAFFT, CLUSTALO and MUSCLE5 (EDIT: and TCOFFEE) modules.
This is required at 7. in https://nf-co.re/docs/contributing/modules#general
Behaviour of the modules is not otherwise changed.