-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pyroe failed to return succesfully ExitStatus(unix_wait_status(256)) #191
Comments
Hi @lfperales, I'm happy to do what I can to help out with this, but I'm not entirely familiar with how I can/should test this out. The issue seems (perhaps?) to be related to a missing command line parameter that was added to Thanks, |
hello Rob, I don't know if tower is using the latest versions of all the tools? How can I do this? thank you |
We definitely need more information to help here @lfperales . Are you able to share the |
THIS IS THE COMPLETE LOG FROM BATCH Downloading plugin [email protected]N E X T F L O W ~ version 22.10.5
export required varexport ALEVIN_FRY_HOME=. prep simpleafsimpleaf set-paths run simpleaf indexsimpleaf |
It looks to me like this is a version of pyroe so old that it doesn't even support the version flag |
@rob-p what do you mean? within our aws? I thought tower will install and upgrade everything |
@drpatelh — I think this question is more for you! I am not aware of what the update/upgrade policy is on tower. I've only used it in the past to monitor jobs that were launched and run locally where I had control of the execution environment. If this is running on aws, then I would imagine the relevant images just need to bump the versions of these tools. |
Tower just pulls the pipeline from GitHub source and runs using the versions of the containers defined for that particular version of the pipeline (unless they are overridden by a config). So it shouldn't be doing anything special. Would need to investigate this further using the specific parameters used and possibly replicating the settings used to create the Compute Environment. |
So maybe these lines are an issue?
They pull in a specific |
Possibly. Nextflow will use the Docker container defined in the
Quick test would be to pull that image and check the versions in the container. Clocked off for the day otherwise I would have done a quick test. |
|
is that the right version? @rob-p |
Hi @lfperales, This version of simpleaf is rather old. The current version is 0.8.1. Without the ability to easily reproduce the problem (i.e. without a tower setup) I can't say if the update would fix it, but that's certainly the first thing to try. |
We have reproduced the issue on Tower (AWS Batch) and the pipeline fails with the same error even when using v0.8.1 of simpleaf. We are looking into it now and will try to reproduce locally. |
Interesting! Please keep me posted. I'm happy to help out however I can. |
Pulling this apart was hampered a little by simpleaf's reluctance to share with us the stderr/stdout from the failing pyroe process. I've created an issue here and will try to submit a quick simpleaf PR to address it over the weekend. The absence of debugging info was not a deal-breaker to understanding the bug, but it would have been a nice-to-have. If we pull out the pyroe command from the simpleaf source code and run it in isolation: pyroe make-splici genome.fa genome_genes.gtf 91 salmon/ref We see that it is trying to call
If we look at the relevant regions in # Read the GTF into a pyranges object
gr = pr.read_gtf(gtf_path)
# get introns
introns = gr.features.introns(by="transcript")
introns.Name = introns.gene_id We see that it reads the gtf into a pyranges object The GTF loaded by the test_full profile
... which means that This can be fixed by passing the gtf through something like |
I have a branch gtf-fixing and a draft PR where I'm running tests on the fix overnight. Will update here tomorrow. |
Thanks for digging into this @robsyme! I'm tagging @DongzeHE here as well (the main developer of |
This is a tricky question. Our spliced+intronic reference relies on the annotation of transcripts. If this doesn't exist, how can we extract the (expanded) transcriptome from the gene annotations and the genome build? I mean, if there are no "transcripts" annotations, we cannot even extract the spliced transcripts from them, right? Did I misunderstand the problem? |
From @robsyme's comment:
I agree that in this case, we could just return an error code and an informative message that e.g. it doesn't make sense to attempt to build a spliced+intronic or spliced+unspliced transcriptome given that no transcripts are annotated in the GTF. |
Hi @robsyme, @lfperales and @maxulysse, So we decided to make pyroe a little smarter to deal with missing transcript annotations, the rows in a GTF that defines the transcripts' range, or say bounds. We did think about some marginal cases. For example, some but not all transcripts have their range defined in the GTF, or the transcript bounds defined in the GTF differ from those indicated by their exons. Notice that when generating the reference, if it finds anything inappropriate, for example, the Therefore, we proposed to do the followings. The overall idea is if we find anything inappropriate, then we fix them in the clean GTF file, but still use the annotations in the original bounds to extract introns if we can. First, we check the Then, we check if there are transcript annotations (the rows defining transcripts' range).
As we are not very familiar with the GTF files people used in practice, f there are other marginal cases we did not consider, please let us know! Thanks! Best, |
That sounds like a perfectly reasonable plan, thanks @rob-p and @DongzeHE! For testing, the iGenomes gtf for GRCh38 would be a good candidate:
Are you planning on joining exons into transcripts by the |
Hi @robsyme, You are right. We plan to join only exons features. The reason is that although there are other feature types, for example, CDSs and UTRs, their intervals are always contained within some exon features in the GTF files we have processed so far. If this is not a universal rule, please let us know! |
The universe of GTF/GFF interpretation is vast, but I think that your plan of taking the exons will catch almost all sane annotation sets. Thanks again! |
Hi @robsyme, I have made the changes and am now testing it. However, I found that the iGenomes gtf for GRCh38 you shared contains gene annotations in some "special" chromosomes. That is, chromosomes that are not in the genome FASTA file. For example, 'chr1_GL383518v1_alt'. If possible, could you please share the link to the genome FASTA file that matches the iGenomes GTF you shared? The genome build I used was downloaded from
Best, |
Ah, this looks to be iGenomes being an unreliable resource (which is why nf-core is considering moving away from these datasets). I'd recommend filtering the gtf. Something like aws s3 cp s3://ngi-igenomes/igenomes/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa .
samtools faidx genome.fa
grep -f <(awk '{printf("^%s\\t\n", $1)}' genome.fa.fai) genes.gtf > genes.filtered.gtf |
Hi @robsyme, All of our changes to pyroe and simpleaf have now been upstreamed. Is it worth pulling in the latest versions and seeing if this is resolved? |
Yup, thanks for the prompt Rob. I'll pull them in today. |
Closed by #198 |
Description of the bug
I'm running scrna using nextflow tower, and I get this error. It can be related with this #152 (comment)
Command used and terminal output
No response
Relevant files
No response
System information
No response
The text was updated successfully, but these errors were encountered: