pyroe failed to return succesfully ExitStatus(unix_wait_status(256)) #191

lfperales · 2022-12-21T16:57:20Z

Description of the bug

I'm running scrna using nextflow tower, and I get this error. It can be related with this #152 (comment)

Command used and terminal output

No response

Relevant files

No response

System information

No response

rob-p · 2023-01-19T19:06:21Z

Hi @lfperales,

I'm happy to do what I can to help out with this, but I'm not entirely familiar with how I can/should test this out. The issue seems (perhaps?) to be related to a missing command line parameter that was added to pyroe some time back. Is the tower instance using the latest versions of all of the relevant tools? How can I easily reproduce this behavior to test and try to fix it?

Thanks,
Rob

lfperales · 2023-01-19T20:54:28Z

hello Rob, I don't know if tower is using the latest versions of all the tools? How can I do this? thank you

drpatelh · 2023-01-19T23:25:14Z

We definitely need more information to help here @lfperales . Are you able to share the .nextflow.log file which you can download from Tower?

lfperales · 2023-01-23T15:22:34Z

THIS IS THE COMPLETE LOG FROM BATCH

Downloading plugin [email protected]

N E X T F L O W ~ version 22.10.5
Pulling nf-core/scrnaseq ...
downloaded from https://github.com/nf-core/scrnaseq.git
Launching https://github.com/nf-core/scrnaseq [lethal_pesquet] DSL2 - revision: c86646e [2.1.0]
-�[2m----------------------------------------------------�[0m-
�[0;32m,--.�[0;30m/�[0;32m,-.�[0m
�[0;34m ___ __ __ __ ___ �[0;32m/,-..--~'�[0m
�[0;34m |\ | |__ __ / / \ \|__) \|__ �[0;33m} {�[0m �[0;34m \| \\| \| \__, \__/ \| \ \|___ �[0;32m\-.,--,�[0m �[0;32m.,.,'�[0m
�[0;35m nf-core/scrnaseq v2.1.0�[0m
-�[2m----------------------------------------------------�[0m-
�[1mCore Nextflow options�[0m
�[0;34mrevision : �[0;32m2.1.0�[0m
�[0;34mrunName : �[0;32mlethal_pesquet�[0m
�[0;34mlaunchDir : �[0;32m/�[0m
�[0;34mworkDir : �[0;32m/fsx�[0m
�[0;34mprojectDir : �[0;32m/.nextflow/assets/nf-core/scrnaseq�[0m
�[0;34muserName : �[0;32mroot�[0m
�[0;34mprofile : �[0;32mstandard�[0m
�[0;34mconfigFiles : �[0;32m/.nextflow/assets/nf-core/scrnaseq/nextflow.config, /nextflow.config�[0m
�[1mInput/output options�[0m
�[0;34minput : �[0;32mhttps://api.tower.nf/workspaces/166189419503828/datasets/4j9Fv2PcnYwVxh3SKXE6Gu/v/1/n/nash_ramachandran_one.csv�[0m
�[0;34moutdir : �[0;32ms3://pioneeringmedicines-data/Public/sc-RNAseq/nash/test_scrna/�[0m
�[1mMandatory arguments�[0m
�[0;34mprotocol : �[0;32m10XV3�[0m
�[1mReference genome options�[0m
�[0;34mgenome : �[0;32mGRCh38�[0m
�[0;34mfasta : �[0;32ms3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa�[0m
�[0;34mgtf : �[0;32ms3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf�[0m
�[1mKallisto/BUS Options�[0m
�[0;34mbustools_correct: �[0;32mtrue�[0m
!! Only displaying parameters that differ from the pipeline defaults !!
-�[2m----------------------------------------------------�[0m-
If you use nf-core/scrnaseq for your analysis please cite:

The pipeline
https://doi.org/10.5281/zenodo.3568187
The nf-core framework
https://doi.org/10.1038/s41587-020-0439-x
Software dependencies
https://github.com/nf-core/scrnaseq/blob/master/CITATIONS.md
-�[2m----------------------------------------------------�[0m-
Monitor the execution with Nextflow Tower using this URL: https://tower.nf/orgs/FSP_Labs/workspaces/Pioneering_Medicines/watch/2TFXgnzzulzuGm
[78/f9c7f2] Submitted process > NFCORE_SCRNASEQ:SCRNASEQ:INPUT_CHECK:SAMPLESHEET_CHECK (nash_ramachandran_one.csv)
Staging foreign file: s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa
Staging foreign file: s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf
[2e/af98c8] Submitted process > NFCORE_SCRNASEQ:SCRNASEQ:GTF_GENE_FILTER (genome.fa)
Staging foreign file: s3://pioneeringmedicines-data/public/scrna/nash/wang/raw/SRR10009435/cirrhotic1_cd45+_R1_001.fastq.gz
Staging foreign file: s3://pioneeringmedicines-data/public/scrna/nash/wang/raw/SRR10009435/cirrhotic1_cd45+_R2_001.fastq.gz
Staging foreign file: s3://pioneeringmedicines-data/public/scrna/nash/wang/raw/SRR10009436/cirrhotic1_cd45-A_R1_001.fastq.gz
Staging foreign file: s3://pioneeringmedicines-data/public/scrna/nash/wang/raw/SRR10009436/cirrhotic1_cd45-A_R2_001.fastq.gz
Staging foreign file: s3://pioneeringmedicines-data/public/scrna/nash/wang/raw/SRR10009437/cirrhotic1_cd45-B_R1_001.fastq.gz
Staging foreign file: s3://pioneeringmedicines-data/public/scrna/nash/wang/raw/SRR10009437/cirrhotic1_cd45-B_R2_001.fastq.gz
[c7/c5b7c4] Submitted process > NFCORE_SCRNASEQ:SCRNASEQ:SCRNASEQ_ALEVIN:SIMPLEAF_INDEX (genome_genes.gtf)
Error executing process > 'NFCORE_SCRNASEQ:SCRNASEQ:SCRNASEQ_ALEVIN:SIMPLEAF_INDEX (genome_genes.gtf)'
Caused by:
Essential container in task exited
Command executed:

export required var

export ALEVIN_FRY_HOME=.

prep simpleaf

simpleaf set-paths

run simpleaf index

simpleaf
index
--threads 6
--fasta genome.fa
--gtf genome_genes.gtf
--rlen 91
-o salmon

cat <<-END_VERSIONS > versions.yml
"NFCORE_SCRNASEQ:SCRNASEQ:SCRNASEQ_ALEVIN:SIMPLEAF_INDEX":
simpleaf: $(simpleaf -V | tr -d '\n' | cut -d ' ' -f 2)
salmon: $(salmon --version | sed -e "s/salmon //g")
END_VERSIONS
Command exit status:
1
Command output:
found salmon in the PATH at /usr/local/bin/salmon
found alevin-fry in the PATH at /usr/local/bin/alevin-fry
found pyroe in the PATH at /usr/local/bin/pyroe
Command error:
found salmon in the PATH at /usr/local/bin/salmon
found alevin-fry in the PATH at /usr/local/bin/alevin-fry
found pyroe in the PATH at /usr/local/bin/pyroe
Error: pyroe failed to return succesfully ExitStatus(unix_wait_status(256))
Work dir:
/fsx/c7/c5b7c4ce31c52974aec26822e1ce96
Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line
Execution cancelled -- Finishing pending tasks before exit
-�[0;35m[nf-core/scrnaseq]�[0;31m Pipeline completed with errors�[0m-
Waiting for file transfers to complete (6 files)
Waiting for file transfers to complete (4 files)
Waiting for file transfers to complete (3 files)
Waiting for file transfers to complete (1 files)
Saving cache: .nextflow/cache/98017613-1fc7-428d-8cbe-1fbd32410a22 => /fsx/.nextflow/cache/98017613-1fc7-428d-8cbe-1fbd32410a22

lfperales · 2023-01-23T16:27:06Z

@drpatelh and @rob-p this is the complete log. Thank you for your help!

rob-p · 2023-01-23T16:38:47Z

It looks to me like this is a version of pyroe so old that it doesn't even support the version flag -V. How is the execution environment defined here (i.e. the versions of programs installed etc.). The easiest thing to do would typically be to upgrade to the latest version if pyroe. It is available both via pip and via bioconda. Also, the usefulaf docker image includes the latest version of pyroe (along with alevin-fry and salmon).

lfperales · 2023-01-23T17:32:27Z

@rob-p what do you mean? within our aws? I thought tower will install and upgrade everything

rob-p · 2023-01-23T18:38:09Z

@drpatelh — I think this question is more for you! I am not aware of what the update/upgrade policy is on tower. I've only used it in the past to monitor jobs that were launched and run locally where I had control of the execution environment. If this is running on aws, then I would imagine the relevant images just need to bump the versions of these tools.

drpatelh · 2023-01-23T19:03:29Z

Tower just pulls the pipeline from GitHub source and runs using the versions of the containers defined for that particular version of the pipeline (unless they are overridden by a config). So it shouldn't be doing anything special. Would need to investigate this further using the specific parameters used and possibly replicating the settings used to create the Compute Environment.

rob-p · 2023-01-23T19:27:11Z

So maybe these lines are an issue?

scrnaseq/modules/local/simpleaf_index.nf

Line 5 in c86646e

conda (params.enable_conda ? 'bioconda::simpleaf=0.5.2' : null)

scrnaseq/modules/local/simpleaf_quant.nf

Line 5 in c86646e

conda (params.enable_conda ? 'bioconda::simpleaf=0.5.2' : null)

They pull in a specific simpleaf from bioconda, which I presume is itself using older versions of pyroe / alevin-fry / salmon?

drpatelh · 2023-01-23T19:43:18Z

Possibly. Nextflow will use the Docker container defined in the container definition below on AWSBatch:

scrnaseq/modules/local/simpleaf_index.nf

Line 8 in c86646e

'quay.io/biocontainers/simpleaf:0.5.2--h9f5acd7_0' }"

Quick test would be to pull that image and check the versions in the container. Clocked off for the day otherwise I would have done a quick test.

maxulysse · 2023-01-24T08:13:25Z

docker run quay.io/biocontainers/simpleaf:0.5.2--h9f5acd7_0 pyroe -v
pyroe 0.6.2

lfperales · 2023-01-24T20:27:22Z

is that the right version? @rob-p

rob-p · 2023-01-24T21:24:09Z

Hi @lfperales, This version of simpleaf is rather old. The current version is 0.8.1. Without the ability to easily reproduce the problem (i.e. without a tower setup) I can't say if the update would fix it, but that's certainly the first thing to try.

drpatelh · 2023-02-02T18:04:46Z

We have reproduced the issue on Tower (AWS Batch) and the pipeline fails with the same error even when using v0.8.1 of simpleaf. We are looking into it now and will try to reproduce locally.

rob-p · 2023-02-02T20:10:13Z

Interesting! Please keep me posted. I'm happy to help out however I can.

robsyme · 2023-02-03T03:31:15Z

Pulling this apart was hampered a little by simpleaf's reluctance to share with us the stderr/stdout from the failing pyroe process. I've created an issue here and will try to submit a quick simpleaf PR to address it over the weekend. The absence of debugging info was not a deal-breaker to understanding the bug, but it would have been a nice-to-have.

If we pull out the pyroe command from the simpleaf source code and run it in isolation:

pyroe make-splici genome.fa genome_genes.gtf 91 salmon/ref

We see that it is trying to call gene_id on a pyranges object that has no such attribute:

  File "/usr/local/bin/pyroe", line 195, in <module>
    make_splici_txome(
  File "/usr/local/lib/python3.10/site-packages/pyroe/make_splici_txome.py", line 265, in make_splici_txome
    introns.Name = introns.gene_id
  File "/usr/local/lib/python3.10/site-packages/pyranges/pyranges.py", line 269, in __getattr__
    return _getattr(self, name)
  File "/usr/local/lib/python3.10/site-packages/pyranges/methods/attr.py", line 67, in _getattr
    raise AttributeError("PyRanges object has no attribute", name)
AttributeError: ('PyRanges object has no attribute', 'gene_id')

If we look at the relevant regions in make_splici_txome.py:

# Read the GTF into a pyranges object
gr = pr.read_gtf(gtf_path)

# get introns
introns = gr.features.introns(by="transcript")
introns.Name = introns.gene_id

We see that it reads the gtf into a pyranges object gr, but the gr.features method relies on "transcript" features in the GTF (pyranges docs here).

The GTF loaded by the test_full profile s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtfdoes not have any transcript features:

❯ aws s3 cp s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf .
❯ awk '$3 == "transcript"' genes.gtf
❯

... which means that gr.features returns an empty genomic features object, which does not have a gene_id object.

This can be fixed by passing the gtf through something like gffread which will add the features back in. I think that it would be sensible for pyroe to warn the user if no transcript features were detected (they are not officially part of the GTF spec), but I'll also add in a gffread process to the scrnaseq pipeline to fix the issue silently.

robsyme · 2023-02-03T03:48:00Z

I have a branch gtf-fixing and a draft PR where I'm running tests on the fix overnight. Will update here tomorrow.

rob-p · 2023-02-03T04:49:12Z

Thanks for digging into this @robsyme! I'm tagging @DongzeHE here as well (the main developer of pyroe). For the reasons you state above, I also believe it would be reasonable to have pyroe act differently if only gene records and no transcript records are present — perhaps treat the gene annotation itself as a single transcript? However, it's worth getting @DongzeHE's read on this.

DongzeHE · 2023-02-03T17:05:54Z

This is a tricky question. Our spliced+intronic reference relies on the annotation of transcripts. If this doesn't exist, how can we extract the (expanded) transcriptome from the gene annotations and the genome build? I mean, if there are no "transcripts" annotations, we cannot even extract the spliced transcripts from them, right? Did I misunderstand the problem?

rob-p · 2023-02-03T18:31:34Z

From @robsyme's comment:

This can be fixed by passing the gtf through something like gffread which will add the features back in. I think that it would be sensible for pyroe to warn the user if no transcript features were detected (they are not officially part of the GTF spec)

I agree that in this case, we could just return an error code and an informative message that e.g. it doesn't make sense to attempt to build a spliced+intronic or spliced+unspliced transcriptome given that no transcripts are annotated in the GTF.

DongzeHE · 2023-02-04T03:54:48Z

Hi @robsyme, @lfperales and @maxulysse,

So we decided to make pyroe a little smarter to deal with missing transcript annotations, the rows in a GTF that defines the transcripts' range, or say bounds.

We did think about some marginal cases. For example, some but not all transcripts have their range defined in the GTF, or the transcript bounds defined in the GTF differ from those indicated by their exons.

Notice that when generating the reference, if it finds anything inappropriate, for example, the gene_name or gene_id metadata field is missing, pyroe writes a clean GTF file with imputed values, such as imputing gene names using gene ids.

Therefore, we proposed to do the followings. The overall idea is if we find anything inappropriate, then we fix them in the clean GTF file, but still use the annotations in the original bounds to extract introns if we can.

First, we check the transcript_id metadata field in the input GTF file. If it is missing, we just give up and report an error.

Then, we check if there are transcript annotations (the rows defining transcripts' range).

If there is no transcript annotation, we give a warning, get transcripts' bounds manually using their exons' bounds, write those bounds to the clean gtf file, and use them as the transcripts' bounds to extract introns.
If some of them are missing, we report a warning and use the transcripts' bounds in the original GTF file to extract introns, but impute the missing annotations in the clean GTF file. We will say in the warning message that if the users want to use the annotations we generated, they should rerun pyroe using the clean GTF file.
If the transcripts' bounds defined in the original GTF file and those found manually (using exons' bounds) are different, we report a warning and extract introns using the transcripts' annotation in the original GTF file, but use manually defined transcript annotations (from their exons' bounds) in the clean GTF file. We will say in the warning message that if the users want to use the annotations we generated, they should rerun pyroe using the clean GTF file.

As we are not very familiar with the GTF files people used in practice, f there are other marginal cases we did not consider, please let us know! Thanks!

Best,
Dongze

robsyme · 2023-02-04T21:07:08Z

That sounds like a perfectly reasonable plan, thanks @rob-p and @DongzeHE! For testing, the iGenomes gtf for GRCh38 would be a good candidate:

s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Annotation/Genes/genes.gtf

Are you planning on joining exons into transcripts by the transcript_id attribute? Am I right in assuming that the bounds of the exons that share a transcript_id should be equal to the start+end of the (missing) transcript feature?

DongzeHE · 2023-02-05T15:27:16Z

Hi @robsyme,

You are right. We plan to join only exons features. The reason is that although there are other feature types, for example, CDSs and UTRs, their intervals are always contained within some exon features in the GTF files we have processed so far. If this is not a universal rule, please let us know!

robsyme · 2023-02-05T18:40:37Z

The universe of GTF/GFF interpretation is vast, but I think that your plan of taking the exons will catch almost all sane annotation sets. Thanks again!

DongzeHE · 2023-02-06T21:19:25Z

Hi @robsyme,

I have made the changes and am now testing it. However, I found that the iGenomes gtf for GRCh38 you shared contains gene annotations in some "special" chromosomes. That is, chromosomes that are not in the genome FASTA file. For example, 'chr1_GL383518v1_alt'. If possible, could you please share the link to the genome FASTA file that matches the iGenomes GTF you shared?

The genome build I used was downloaded from

s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/

Best,
Dongze

robsyme · 2023-02-07T01:38:19Z

Ah, this looks to be iGenomes being an unreliable resource (which is why nf-core is considering moving away from these datasets).

I'd recommend filtering the gtf. Something like

aws s3 cp s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa .
samtools faidx genome.fa
grep -f <(awk '{printf("^%s\\t\n", $1)}' genome.fa.fai) genes.gtf > genes.filtered.gtf

rob-p · 2023-02-15T13:33:59Z

Hi @robsyme,

All of our changes to pyroe and simpleaf have now been upstreamed. Is it worth pulling in the latest versions and seeing if this is resolved?

robsyme · 2023-02-15T13:38:15Z

Yup, thanks for the prompt Rob. I'll pull them in today.

grst · 2023-02-21T12:27:07Z

Closed by #198

lfperales added the bug Something isn't working label Dec 21, 2022

grst closed this as completed Feb 21, 2023

jeremymsimon mentioned this issue Jul 19, 2023

Error at SIMPLEAF_INDEX for user supplied genome/annotation #253

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyroe failed to return succesfully ExitStatus(unix_wait_status(256)) #191

pyroe failed to return succesfully ExitStatus(unix_wait_status(256)) #191

lfperales commented Dec 21, 2022

rob-p commented Jan 19, 2023

lfperales commented Jan 19, 2023

drpatelh commented Jan 19, 2023

lfperales commented Jan 23, 2023

lfperales commented Jan 23, 2023

rob-p commented Jan 23, 2023 •

edited

Loading

lfperales commented Jan 23, 2023

rob-p commented Jan 23, 2023

drpatelh commented Jan 23, 2023 •

edited

Loading

rob-p commented Jan 23, 2023

drpatelh commented Jan 23, 2023

maxulysse commented Jan 24, 2023

lfperales commented Jan 24, 2023

rob-p commented Jan 24, 2023

drpatelh commented Feb 2, 2023

rob-p commented Feb 2, 2023

robsyme commented Feb 3, 2023 •

edited

Loading

robsyme commented Feb 3, 2023 •

edited

Loading

rob-p commented Feb 3, 2023

DongzeHE commented Feb 3, 2023

rob-p commented Feb 3, 2023

DongzeHE commented Feb 4, 2023 •

edited

Loading

robsyme commented Feb 4, 2023 •

edited

Loading

DongzeHE commented Feb 5, 2023 •

edited

Loading

robsyme commented Feb 5, 2023

DongzeHE commented Feb 6, 2023 •

edited

Loading

robsyme commented Feb 7, 2023

rob-p commented Feb 15, 2023

robsyme commented Feb 15, 2023

grst commented Feb 21, 2023

pyroe failed to return succesfully ExitStatus(unix_wait_status(256)) #191

pyroe failed to return succesfully ExitStatus(unix_wait_status(256)) #191

Comments

lfperales commented Dec 21, 2022

Description of the bug

Command used and terminal output

Relevant files

System information

rob-p commented Jan 19, 2023

lfperales commented Jan 19, 2023

drpatelh commented Jan 19, 2023

lfperales commented Jan 23, 2023

Downloading plugin [email protected]

export required var

prep simpleaf

run simpleaf index

lfperales commented Jan 23, 2023

rob-p commented Jan 23, 2023 • edited Loading

lfperales commented Jan 23, 2023

rob-p commented Jan 23, 2023

drpatelh commented Jan 23, 2023 • edited Loading

rob-p commented Jan 23, 2023

drpatelh commented Jan 23, 2023

maxulysse commented Jan 24, 2023

lfperales commented Jan 24, 2023

rob-p commented Jan 24, 2023

drpatelh commented Feb 2, 2023

rob-p commented Feb 2, 2023

robsyme commented Feb 3, 2023 • edited Loading

robsyme commented Feb 3, 2023 • edited Loading

rob-p commented Feb 3, 2023

DongzeHE commented Feb 3, 2023

rob-p commented Feb 3, 2023

DongzeHE commented Feb 4, 2023 • edited Loading

robsyme commented Feb 4, 2023 • edited Loading

DongzeHE commented Feb 5, 2023 • edited Loading

robsyme commented Feb 5, 2023

DongzeHE commented Feb 6, 2023 • edited Loading

robsyme commented Feb 7, 2023

rob-p commented Feb 15, 2023

robsyme commented Feb 15, 2023

grst commented Feb 21, 2023

rob-p commented Jan 23, 2023 •

edited

Loading

drpatelh commented Jan 23, 2023 •

edited

Loading

robsyme commented Feb 3, 2023 •

edited

Loading

robsyme commented Feb 3, 2023 •

edited

Loading

DongzeHE commented Feb 4, 2023 •

edited

Loading

robsyme commented Feb 4, 2023 •

edited

Loading

DongzeHE commented Feb 5, 2023 •

edited

Loading

DongzeHE commented Feb 6, 2023 •

edited

Loading