Add compression to MSA modules #4754

lrauschning · 2024-01-17T15:38:06Z

Reopened, this time without duplicating commits.
This PR introduces output compression to the FAMSA, MTMalign, MAFFT, CLUSTALO and MUSCLE5 (EDIT: and TCOFFEE) modules.
This is required at 7. in https://nf-co.re/docs/contributing/modules#general
Behaviour of the modules is not otherwise changed.

lrauschning · 2024-01-17T15:39:14Z

Tested locally on conda, let's see if singularity also works.

…cle5 and mtmalign

…inistic

lrauschning · 2024-01-25T19:16:58Z

For MTMalign, the file is written first to disk, and then compressed on disk, as I wasn't able to cleanly isolate MTMalign's output (it has a hardcoded outfile name for the FASTA output, and pollutes stdout with debug messages).
I think it still makes sense to implement it, so that the modules have a (somewhat) standard interface.

…ttify code

This reverts commit 7a0e78d.

adamrtalbot

You need to update the stubs. You could add tests to check them as well.

adamrtalbot · 2024-02-09T16:02:47Z

modules/nf-core/clustalo/align/main.nf

+    def write_output = compress ? "--force -o >(pigz -cp ${task.cpus} > ${prefix}.aln.gz)" : "> ${prefix}.aln"
+    // using >() is necessary to preserve the return value,
+    // so nextflow knows to display an error when it failed
+    // the --force -o is necessary, as clustalo expands the commandline input,
+    // causing it to treat the pipe as a parameter and fail
+    // this way, the command expands to /dev/fd/<id>, and --force allows writing output to an already existing file


I'm not a huge fan of this but I guess most pipeline developers will leave it on true and forget about it, so why not?

adamrtalbot · 2024-02-09T16:05:18Z

modules/nf-core/clustalo/align/main.nf

    stub:
    def args = task.ext.args ?: ''
    def prefix = task.ext.prefix ?: "${meta.id}"
    """
-    touch ${prefix}.aln
+    touch ${prefix}.aln.gz


This should change based on the compress value. Something like this:

stub: def args = task.ext.args ?: '' def prefix = task.ext.prefix ?: "${meta.id}" def output = compress ? "${prefix}.aln.gz" : "${prefix}.aln" """ touch ${output}

Ah good catch, didn't change that since introducing the compress input channel. Might also affect some of the other modules, I'll have a look.

adamrtalbot · 2024-02-09T16:06:48Z

modules/nf-core/famsa/align/main.nf

        $args \\
        -t ${task.cpus} \\
        ${fasta} \\
-        ${prefix}.aln
+        ${prefix}.aln${compress ? '.gz':''}


Hmm I'm coming around to this idea a bit more, it seems to be cleaner and harder to mess up.

Yes, it being the most clean and straightforward to understand/document (edit: compared to the other options we came up with) is I think the main advantage.
Especially for tools like FAMSA which natively support compression its also cleaner than the output format changing based on a parameter passed via ext.args.

modules/nf-core/mafft/main.nf

This reverts commit 706d05f.

* add pigz to clustalo * add compression to muscle5 * enabled compression flag for famsa * added compression to mafft * compression for mtmalign * set to mulled containers * more informative test name * change mtmalign test to search after unzipping * update mtmalign tests to work with gzip, fix typo * regenerate test snaps * muscle5: zip multiple output files, if present * Change MUSCLE5 tests to the same testcase TCOFFEE is using, also fix it * add tags requested by nf-core-lint * add full url to singularity/biocontainers * fix famsa * regenerated snapshots with nf-test 0.8.3. Reenabled snapshots for muscle5 and mtmalign * forgot to regenerate mafft, also mtmalign seems to still be nondeterministic * update metas * compression support for tcoffee modules * added pigz to tools in meta * fix typo * regenerate snaps, adjust test to gzip * added mulled containers for tcoffee * implement compression switching with channel * add tags wanted by lint * regenerate snapshots * whoops, regenerated using container this time * update meta.yml * update glob in meta.yml * support compressed input in irmsd * assign more precise type in meta.yml * add tag flagged by lint to tcoffee/irmsd * set tcoffee/irmsd to use mulled container * tcoffee/irmsd: do not compress template file, and correctly uncompress for irmsd * tcoffee/align: reimplement toggling compression * tcoffee/align: use new pipe name everywhere * tcoffee/align: reenable default html output, add comment * fix escaped line at end of comment... * tcoffee/align: make tcoffee write to stdout, avoid using fifo * clustalo/align: add optional compression * muscle5/super5: add optional compression, also expand tests * update snapshot * muscle5/super5: re-add empty config file * mafft: implement optional output compression, handle compressed input * muscle5/super5: better parallelization for compressed -perm all * mtmalign/align: implement optional compression * mtmalign/align: add pigz to versions.yml * mtmalign/align: fix * regenerate snapshot * famsa/align: implement optional compression * whoops, fix tests * clustalo/align: fix * update snapshots * generate different snapshots for compressed & uncompressed tests, prettify code * updated snapshots * mtmalign/align: update input pattern * tcoffee/alncompare,irmsd: implement jose's suggestion * tcoffee/irmsd: additional test for compressed input * tcoffee/irmsd: add tag required by lint * Revert "mtmalign/align: update input pattern" This reverts commit 7a0e78d. * incorporate adams suggestion, fix stub filename extensions * apparently this requires regenerating the snapshots? * try removing test match names, as per sateesh's suggestion * Revert "try removing test match names, as per sateesh's suggestion" This reverts commit 706d05f. * tcoffee/align change snapshot names * make snapshot names unique for nf-test 0.8.4 --------- Co-authored-by: Leon Rauschning <[email protected]>

lrauschning and others added 12 commits January 17, 2024 16:25

add pigz to clustalo

2680830

add compression to muscle5

23f496e

enabled compression flag for famsa

2d704a7

added compression to mafft

f8ddf9f

compression for mtmalign

c0d1d50

set to mulled containers

3eeca26

more informative test name

d9666d7

change mtmalign test to search after unzipping

21f5f81

update mtmalign tests to work with gzip, fix typo

c1e26f3

regenerate test snaps

8eb641e

muscle5: zip multiple output files, if present

82ea41a

Change MUSCLE5 tests to the same testcase TCOFFEE is using, also fix it

15f5102

lrauschning requested review from alessiovignoli, MillironX, luisas and a team as code owners January 17, 2024 15:38

lrauschning requested a review from SPPearce January 17, 2024 15:38

lrauschning and others added 12 commits January 17, 2024 17:00

add tags requested by nf-core-lint

1915cc3

add full url to singularity/biocontainers

0839b98

fix famsa

b21623e

regenerated snapshots with nf-test 0.8.3. Reenabled snapshots for mus…

c790e44

…cle5 and mtmalign

forgot to regenerate mafft, also mtmalign seems to still be nondeterm…

5a6e78e

…inistic

update metas

d26dae7

compression support for tcoffee modules

083fc72

added pigz to tools in meta

2497b9f

fix typo

f6cfc4a

regenerate snaps, adjust test to gzip

9c6481d

added mulled containers for tcoffee

bd443c1

implement compression switching with channel

ea6d06e

lrauschning and others added 2 commits January 25, 2024 20:02

clustalo/align: fix

a5330ef

update snapshots

1f7791b

lrauschning and others added 7 commits January 25, 2024 20:59

generate different snapshots for compressed & uncompressed tests, pre…

838ec1b

…ttify code

updated snapshots

d2e05da

mtmalign/align: update input pattern

7a0e78d

tcoffee/alncompare,irmsd: implement jose's suggestion

103736a

tcoffee/irmsd: additional test for compressed input

d870e1d

tcoffee/irmsd: add tag required by lint

ddc6725

Revert "mtmalign/align: update input pattern"

e8ae161

This reverts commit 7a0e78d.

lrauschning requested a review from JoseEspinosa January 29, 2024 09:46

adamrtalbot reviewed Feb 9, 2024

View reviewed changes

incorporate adams suggestion, fix stub filename extensions

8184055

adamrtalbot reviewed Feb 9, 2024

View reviewed changes

modules/nf-core/mafft/main.nf Show resolved Hide resolved

Leon Rauschning and others added 3 commits February 9, 2024 19:44

apparently this requires regenerating the snapshots?

1d35d3e

try removing test match names, as per sateesh's suggestion

706d05f

Revert "try removing test match names, as per sateesh's suggestion"

b338910

This reverts commit 706d05f.

adamrtalbot approved these changes Feb 14, 2024

View reviewed changes

lrauschning added 4 commits February 15, 2024 10:05

Merge branch 'master' into msa-compression

4528520

tcoffee/align change snapshot names

1f89b92

make snapshot names unique for nf-test 0.8.4

a624c58

Merge branch 'master' into msa-compression

6f6da51

lrauschning added this pull request to the merge queue Feb 15, 2024

Merged via the queue into nf-core:master with commit faf557b Feb 15, 2024
39 checks passed

lrauschning deleted the msa-compression branch February 15, 2024 10:31

This was referenced Mar 19, 2024

Update learnmsa module to work with compressed files. #5276

Merged

Update kalign module to work with compressed files. #5277

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add compression to MSA modules #4754

Add compression to MSA modules #4754

lrauschning commented Jan 17, 2024 •

edited

Loading

lrauschning commented Jan 17, 2024

lrauschning commented Jan 25, 2024

adamrtalbot left a comment

adamrtalbot Feb 9, 2024

adamrtalbot Feb 9, 2024

lrauschning Feb 9, 2024

adamrtalbot Feb 9, 2024

lrauschning Feb 9, 2024 •

edited

Loading

Add compression to MSA modules #4754

Add compression to MSA modules #4754

Conversation

lrauschning commented Jan 17, 2024 • edited Loading

lrauschning commented Jan 17, 2024

lrauschning commented Jan 25, 2024

adamrtalbot left a comment

Choose a reason for hiding this comment

adamrtalbot Feb 9, 2024

Choose a reason for hiding this comment

adamrtalbot Feb 9, 2024

Choose a reason for hiding this comment

lrauschning Feb 9, 2024

Choose a reason for hiding this comment

adamrtalbot Feb 9, 2024

Choose a reason for hiding this comment

lrauschning Feb 9, 2024 • edited Loading

Choose a reason for hiding this comment

lrauschning commented Jan 17, 2024 •

edited

Loading

lrauschning Feb 9, 2024 •

edited

Loading