Bug in workflow output definitions with HTTP input file publishing #5480

nvnieuwk · 2024-11-07T11:56:58Z

Bug report

Expected behavior and actual behavior

When trying to publish an HTTP file that has not been staged in a process, the pipeline fails with the following error:

ERROR ~ assert path
       |
       null

Steps to reproduce the problem

Simply running this script is enough to reproduce the error:

nextflow.preview.output = true

workflow {
    main:
    def test_out = Channel.fromPath("https://github.com/nf-core/test-datasets/raw/refs/heads/modules/data/genomics/sarscov2/genome/genome.fasta")

    publish:
    test_out >> 'test'
}

output {
    'test' {
        path { _txt -> { file -> "test/${file}" }}
    }
}

It does seem to succesfully resolve itself when I omitted the output block, but it didn't publish the file

Program output

nextflow.log

Environment

Nextflow version: 24.10.0
Java version: 17.0.3
Operating system: Ubuntu

The text was updated successfully, but these errors were encountered:

bentsherman · 2024-12-06T19:56:07Z

Interesting. This is happening because the workflow output syntax uses the same publishing logic as publishDir under the hood, and publishDir assumes all source files to be in the work directory. But here you are trying to publish a file that was never staged into the work directory (because it was never passed into a process).

I think we can address this edge case. But in the meantime, why do you want to publish a file that wasn't actually used by the pipeline?

nvnieuwk · 2024-12-09T08:16:38Z

Thanks for your answer! The pipeline can take GVCF input data which gets merged into the same channel as the generated GVCF files for samples that only have CRAM input. I then output all files in that channel which causes this edge case. I temporarily fixed this by filtering out the GVCFs for which the file path does not start with the work dir.

bentsherman · 2024-12-09T13:27:04Z

I see. I think for now we'll just disable publishing for these files so that your pipeline can finish, while we consider a more long-term solution. Do you actually want these extra files to be published?

nvnieuwk · 2024-12-09T13:50:12Z

Not really since we already have these in another location, but it doesn't really hurt that much in this case since these are small files

Signed-off-by: Ben Sherman <[email protected]>

bentsherman added the bug label Nov 13, 2024

bentsherman linked a pull request Nov 13, 2024 that will close this issue

Fix bugs with workflow outputs #5502

Open

6 tasks

bentsherman added a commit that referenced this issue Dec 10, 2024

Don't publish external files (#5480)

bec0e93

Signed-off-by: Ben Sherman <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in workflow output definitions with HTTP input file publishing #5480

Bug in workflow output definitions with HTTP input file publishing #5480

nvnieuwk commented Nov 7, 2024

bentsherman commented Dec 6, 2024

nvnieuwk commented Dec 9, 2024

bentsherman commented Dec 9, 2024

nvnieuwk commented Dec 9, 2024

Bug in workflow output definitions with HTTP input file publishing #5480

Bug in workflow output definitions with HTTP input file publishing #5480

Comments

nvnieuwk commented Nov 7, 2024

Bug report

Expected behavior and actual behavior

Steps to reproduce the problem

Program output

Environment

bentsherman commented Dec 6, 2024

nvnieuwk commented Dec 9, 2024

bentsherman commented Dec 9, 2024

nvnieuwk commented Dec 9, 2024