Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in workflow output definitions with HTTP input file publishing #5480

Open
nvnieuwk opened this issue Nov 7, 2024 · 4 comments · May be fixed by #5502
Open

Bug in workflow output definitions with HTTP input file publishing #5480

nvnieuwk opened this issue Nov 7, 2024 · 4 comments · May be fixed by #5502
Labels

Comments

@nvnieuwk
Copy link

nvnieuwk commented Nov 7, 2024

Bug report

Expected behavior and actual behavior

When trying to publish an HTTP file that has not been staged in a process, the pipeline fails with the following error:

ERROR ~ assert path
       |
       null

Steps to reproduce the problem

Simply running this script is enough to reproduce the error:

nextflow.preview.output = true

workflow {
    main:
    def test_out = Channel.fromPath("https://github.com/nf-core/test-datasets/raw/refs/heads/modules/data/genomics/sarscov2/genome/genome.fasta")

    publish:
    test_out >> 'test'
}

output {
    'test' {
        path { _txt -> { file -> "test/${file}" }}
    }
}

It does seem to succesfully resolve itself when I omitted the output block, but it didn't publish the file

Program output

nextflow.log

Environment

  • Nextflow version: 24.10.0
  • Java version: 17.0.3
  • Operating system: Ubuntu
@bentsherman bentsherman linked a pull request Nov 13, 2024 that will close this issue
6 tasks
@bentsherman
Copy link
Member

Interesting. This is happening because the workflow output syntax uses the same publishing logic as publishDir under the hood, and publishDir assumes all source files to be in the work directory. But here you are trying to publish a file that was never staged into the work directory (because it was never passed into a process).

I think we can address this edge case. But in the meantime, why do you want to publish a file that wasn't actually used by the pipeline?

@nvnieuwk
Copy link
Author

nvnieuwk commented Dec 9, 2024

Thanks for your answer! The pipeline can take GVCF input data which gets merged into the same channel as the generated GVCF files for samples that only have CRAM input. I then output all files in that channel which causes this edge case. I temporarily fixed this by filtering out the GVCFs for which the file path does not start with the work dir.

@bentsherman
Copy link
Member

I see. I think for now we'll just disable publishing for these files so that your pipeline can finish, while we consider a more long-term solution. Do you actually want these extra files to be published?

@nvnieuwk
Copy link
Author

nvnieuwk commented Dec 9, 2024

Not really since we already have these in another location, but it doesn't really hurt that much in this case since these are small files

bentsherman added a commit that referenced this issue Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants