mkfastq fails on AWS Batch #281

KallyopeComp · 2024-10-25T14:30:14Z

Description of the bug

Thank you for preparing a helpful tool to standardize genomics workflows. There is a small issue with running the pipeline using AWS Batch with mkfastq.

Some of the required output directories from the CellRanger output are empty, leading the pipeline to raise an error. For the test data *_outs/outs/fastq_path/Reports is empty. Possibly *_outs/outs/fastq_path/Stats is empty in other cases (I'm not sure).

At least, line 13 should be modified to read:
tuple val(meta), path("*_outs/outs/fastq_path/Reports") , optional:true, emit: reports
Possibly line 14 should be modified as well to:
tuple val(meta), path("*_outs/outs/fastq_path/Stats") , optional:true, emit: stats

With these changes, the pipeline completes successfully.

Command used and terminal output

$ nextflow run demultiplex -profile docker -config ../nextflow.config --skip_tools samshee,falco,fastp --input test_pipeline_samplesheet.csv --demultiplexer mkfastq --outdir {private s3 directory}

Relevant part of the terminal output:
ERROR ~ Error executing process > 'NFCORE_DEMULTIPLEX:DEMULTIPLEX:MKFASTQ_DEMULTIPLEX:CELLRANGER_MKFASTQ (test_sample.1)'

Caused by:
Missing output file(s) *_outs/outs/fastq_path/Reports expected by process NFCORE_DEMULTIPLEX:DEMULTIPLEX:MKFASTQ_DEMULTIPLEX:CELLRANGER_MKFASTQ (test_sample.1)

Relevant files

Archive includes the pipeline samplesheet (which specifies test data from 10X), a nextflow.config (which specifies AWS Batch executor), and the nextflow log
Archive.zip

System information

Nextflow Version: 24.04.4
Hardware: Cloud (AWS batch with custom AMI built as described in the Nextflow documentation)
Executor: AWS BAtch
Container engine: Docker
OS: Launched from machine Ubuntu 22.04, AMI uses Amazon Linux 2
Version of nf-core/demultiplex: Latest master branch, commit ebefeef

The text was updated successfully, but these errors were encountered:

alanmmobbs93 · 2024-11-01T13:07:31Z

Hello @KallyopeComp! I was not able to reproduce the error. Can you please try this:

nextflow run nf-core/demultiplex -latest -profile test_mkfastq,docker --skip_tools samshee,falco,fastp --outdir <your_s3_directory>

And let us know if the error persists.

KallyopeComp · 2024-11-01T14:51:01Z

Thanks for looking into it. Running the pipeline locally via the command you sent works as expected (with a small modification, I needed to manually specify -r 1.5.1). But the issue is when running on AWS Batch:

nextflow run nf-core/demultiplex -r 1.5.1 -latest -profile test_mkfastq,docker -config /tmp/nextflow.config --skip_tools samshee,falco,fastp --outdir <s3_output_directory>

with the following config file to reproduce the error:

 process.executor = 'awsbatch'
 process.queue = '<aws_batch_queue>'
 aws.region = 'us-east-1'
 aws.batch.cliPath = '/home/ec2-user/miniconda/bin/aws'
 workDir = '<s3_work_bucket>'

The config file requires an AWS batch queue configured in your AWS account, and an AMI configured with the CLI path as described here: https://www.nextflow.io/docs/latest/aws.html

Note that even when running locally, the output on s3 does not contain a cellranger-tiny-bcl-simple/L001/Reports directory as would be expected from line 13 of modules/nf-core/cellranger/mkfastq/main.nf, which I why I suggested making this output optional.

nschcolnicov · 2024-11-01T20:42:32Z

Hi @KallyopeComp @alanmmobbs93 , This is an interesting error you found! I was able to reproduce it by running the pipeline on AWS with a setup similar to yours. I also see this error:

Missing output file(s) `*_outs/outs/fastq_path/Reports` expected by process `NFCORE_DEMULTIPLEX:DEMULTIPLEX:MKFASTQ_DEMULTIPLEX:CELLRANGER_MKFASTQ (test_sample.1)`

However, if I run the exact same command locally, the error doesn’t occur. I verified that this folder was indeed not generated in the S3 workdir, but it was generated in the local workdir. My guess is that even though the folder gets created locally, it contains only subdirectories with no files inside, so maybe it’s not being created in the S3 bucket because it’s empty.

Since the input files are Illumina test files that only produce empty folders in the reports directory, it seems safe to mark this output as optional.

FYI @apeltzer @grst @atrigila

KallyopeComp · 2024-11-03T19:26:23Z

It is a bit unusual that the directory is there in the workdir when running locally...

Note that this isn't only an issue with the test files. I also get the error with real data.

nschcolnicov · 2024-11-04T19:12:04Z

@KallyopeComp @alanmmobbs93 fixed with #283

KallyopeComp added the bug Something isn't working label Oct 25, 2024

alanmmobbs93 mentioned this issue Nov 4, 2024

Set mkfastq reports and stats outputs to optional nf-core/modules#6932

Merged

2 tasks

nschcolnicov closed this as completed in nf-core/modules#6932 Nov 4, 2024

nschcolnicov reopened this Nov 4, 2024

nschcolnicov mentioned this issue Nov 4, 2024

Update cellranges/mkfastq module #283

Merged

3 tasks

nschcolnicov closed this as completed Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mkfastq fails on AWS Batch #281

mkfastq fails on AWS Batch #281

KallyopeComp commented Oct 25, 2024

alanmmobbs93 commented Nov 1, 2024

KallyopeComp commented Nov 1, 2024

nschcolnicov commented Nov 1, 2024 •

edited

Loading

KallyopeComp commented Nov 3, 2024

nschcolnicov commented Nov 4, 2024

mkfastq fails on AWS Batch #281

mkfastq fails on AWS Batch #281

Comments

KallyopeComp commented Oct 25, 2024

Description of the bug

Command used and terminal output

Relevant files

System information

alanmmobbs93 commented Nov 1, 2024

KallyopeComp commented Nov 1, 2024

nschcolnicov commented Nov 1, 2024 • edited Loading

KallyopeComp commented Nov 3, 2024

nschcolnicov commented Nov 4, 2024

nschcolnicov commented Nov 1, 2024 •

edited

Loading