-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mkfastq fails on AWS Batch #281
Comments
Hello @KallyopeComp! I was not able to reproduce the error. Can you please try this: nextflow run nf-core/demultiplex -latest -profile test_mkfastq,docker --skip_tools samshee,falco,fastp --outdir <your_s3_directory> And let us know if the error persists. |
Thanks for looking into it. Running the pipeline locally via the command you sent works as expected (with a small modification, I needed to manually specify
with the following config file to reproduce the error:
The config file requires an AWS batch queue configured in your AWS account, and an AMI configured with the CLI path as described here: https://www.nextflow.io/docs/latest/aws.html Note that even when running locally, the output on s3 does not contain a |
Hi @KallyopeComp @alanmmobbs93 , This is an interesting error you found! I was able to reproduce it by running the pipeline on AWS with a setup similar to yours. I also see this error:
However, if I run the exact same command locally, the error doesn’t occur. I verified that this folder was indeed not generated in the S3 workdir, but it was generated in the local workdir. My guess is that even though the folder gets created locally, it contains only subdirectories with no files inside, so maybe it’s not being created in the S3 bucket because it’s empty. Since the input files are Illumina test files that only produce empty folders in the reports directory, it seems safe to mark this output as optional. |
It is a bit unusual that the directory is there in the workdir when running locally... Note that this isn't only an issue with the test files. I also get the error with real data. |
@KallyopeComp @alanmmobbs93 fixed with #283 |
Description of the bug
Thank you for preparing a helpful tool to standardize genomics workflows. There is a small issue with running the pipeline using AWS Batch with mkfastq.
Some of the required output directories from the CellRanger output are empty, leading the pipeline to raise an error. For the test data *_outs/outs/fastq_path/Reports is empty. Possibly *_outs/outs/fastq_path/Stats is empty in other cases (I'm not sure).
At least, line 13 should be modified to read:
tuple val(meta), path("*_outs/outs/fastq_path/Reports") , optional:true, emit: reports
Possibly line 14 should be modified as well to:
tuple val(meta), path("*_outs/outs/fastq_path/Stats") , optional:true, emit: stats
With these changes, the pipeline completes successfully.
Command used and terminal output
$ nextflow run demultiplex -profile docker -config ../nextflow.config --skip_tools samshee,falco,fastp --input test_pipeline_samplesheet.csv --demultiplexer mkfastq --outdir {private s3 directory}
Relevant part of the terminal output:
ERROR ~ Error executing process > 'NFCORE_DEMULTIPLEX:DEMULTIPLEX:MKFASTQ_DEMULTIPLEX:CELLRANGER_MKFASTQ (test_sample.1)'
Caused by:
Missing output file(s)
*_outs/outs/fastq_path/Reports
expected by processNFCORE_DEMULTIPLEX:DEMULTIPLEX:MKFASTQ_DEMULTIPLEX:CELLRANGER_MKFASTQ (test_sample.1)
Relevant files
Archive includes the pipeline samplesheet (which specifies test data from 10X), a nextflow.config (which specifies AWS Batch executor), and the nextflow log
Archive.zip
System information
Nextflow Version: 24.04.4
Hardware: Cloud (AWS batch with custom AMI built as described in the Nextflow documentation)
Executor: AWS BAtch
Container engine: Docker
OS: Launched from machine Ubuntu 22.04, AMI uses Amazon Linux 2
Version of nf-core/demultiplex: Latest master branch, commit ebefeef
The text was updated successfully, but these errors were encountered: