-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error with 'preprocess:hisat2' #116
Comments
Hi @Ahmed-Shibl, thanks for reporting! Interesting, because it seems that most of your samples were successfully running through the There is no more information in the
just to check if there was a somewhat valid
I also think that the whole mapping step failed. If so, it might be also good to check the FASTQ files after the trimming and rRNA depletion step:
Are they empty? |
Hello @hoelzer ,thanks for getting back to me on this.
and
Regarding the FASTQ files,
Shouldn't they have the same line numbers? do you think skipping the SortMeRNA step using |
The The BAM file has 1.4G so there is at least something mapped. That the FASTQ files have differing line numbers is a bit strange, yes. So it seems R1 has 1,327,764 reads and R2 has 1,313,837 reads (line numbers divided by 4). You could also check the integrity of your input FASTQ files via
just to be sure that your input files are fine. You can also try to |
Incredible... I checked all the *.other.fastq.gz files and they were all good - just this one. Is this something that |
Can you please also check your input files? So the raw FASTQ? Are they fine? If they are fine, I would suggest deleting the folder |
The raw FASTQ files were all also fine. I haven't yet deleted the folder
Now, I thought this might be an issue with headers in the
and
Looking back at previous attempts with featureCounts (stand alone) on the same dataset, I realized I used the
Thanks again! |
@Ahmed-Shibl thanks for keeping on! :) Okay, you are using annotation from Prokka. Actually, I think we did not test such an input until now @MarieLataretu . The default output from Prokka is in GFF format and not GTF I think and normally does not have Can you provide the annotation that you are trying to run as an input for RNAflow? I think you might have used the
If this is the problem, we might able to solve this by adding a
parameter to customize the counting for special input annotations. What do you think @MarieLataretu ? Ps: I think this might not solve your previous problem about the deprecated fastq files but maybe let's solve this one first |
Yes, I agree. It looks like you don't have the I'll implement an additional parameter for that. I'm not sure with |
@implementation of additional parameter: agree! If we use
so one would need to set |
Hi @hoelzer and @MarieLataretu, thanks for your patience :) Indeed my Prokka output was a GFF file and so I had to change it into a GTF file using and yes when I used it before, I used This is the GFF file I used to get the GTF file in annotations.csv (changed extension to allow attachment) Sorry if it's a bit messy - let me know if there's anything not clear. |
@Ahmed-Shibl okay. @MarieLataretu implemented a hotfix. Because we don't have your genome and read files it's difficult for us to test w/ your GTF file. But what you can try now is:
Maybe add |
partly fixes #116 - params for featurecounts
sorry, I meant: |
Alright, I ran the following as suggested:
And got the earlier error related to the deprecated fastq.gz file as a result of the sortmerna step I guess. This time, though, it pointed out the error in more detail:
And so, I re-ran the same command above but added
Got an error at
The directory
The last few lines in
|
Hi @Ahmed-Shibl ! [1]
I still dont know why SortMeRNA seems to report a deprecated FASTQ file here. I think you checked the corresponding input FASTQ and they were fine? I think fastp (trimming step, before SortMeRNA) can also work with the input FASTQ because then it is counted by featurecounts and passed to DESeq2 in your second call. [2]
I think that we can better investigate where we need to add changes to the scripts to work with your input. This also helps us to generalize the pipeline to other user inputs which is a good thing. thx! |
Hi! [1]
it returned this (also mentioned above):
All the other FASTQ files - including [2]
|
[1] and the raw input files were also fine (srry cant' remember but I think you checked this) ? [2] ah! no, they should not be empty. I think that's the problem. |
[1] Yes the raw FASTQ were also fine [2] Thanks for explaining this - makes sense, because I guess currently it's looking for Happy to run it again once you give me the greenlight. Thanks |
[1] Okay, then, for some reason, SortMeRNA destroys this sample as it seems. [2] @MarieLataretu can you implement a fix please? |
at [1] What happens in the SortMeRNA step is unpigz -f -p 36 lung_old_rep3.R1.trimmed.fastq.gz
unpigz -f -p 36 lung_old_rep3.R2.trimmed.fastq.gz
merge-paired-reads.sh lung_old_rep3.R1.trimmed.fastq lung_old_rep3.R2.trimmed.fastq lung_old_rep3.merged.fastq
sortmerna --ref ./rRNA_databases/silva-bac-16s-id90.fasta,./rRNA_databases/silva-bac-16s-id90:./rRNA_databases/silva-bac-23s-id98.fasta,./rRNA_databases/silva-bac-23s-id98:./rRNA_databases/silva-arc-16s-id95.fasta,./rRNA_databases/silva-arc-16s-id95:./rRNA_databases/silva-arc-23s-id98.fasta,./rRNA_databases/silva-arc-23s-id98:./rRNA_databases/silva-euk-18s-id95.fasta,./rRNA_databases/silva-euk-18s-id95:./rRNA_databases/silva-euk-28s-id98.fasta,./rRNA_databases/silva-euk-28s-id98:./rRNA_databases/rfam-5s-database-id98.fasta,./rRNA_databases/rfam-5s-database-id98:./rRNA_databases/rfam-5.8s-database-id98.fasta,./rRNA_databases/rfam-5.8s-database-id98 --reads lung_old_rep3.merged.fastq --paired_in --aligned lung_old_rep3.aligned --other lung_old_rep3.other_merged --fastx --log --num_alignments 1 -v -a 36
unmerge-paired-reads.sh lung_old_rep3.other_merged.fastq lung_old_rep3.R1.other.fastq lung_old_rep3.R2.other.fastq
pigz -p 36 lung_old_rep3.R1.other.fastq
pigz -p 36 lung_old_rep3.R2.other.fastq
rm lung_old_rep3.merged.fastq lung_old_rep3.aligned.fastq lung_old_rep3.other_merged.fastq lung_old_rep3.R1.trimmed.fastq lung_old_rep3.R2.trimmed.fastq so could be that the problem lies in the 1) paired-end merge step, 2) sortmerna command, 3) unmerge step. I don't think the pigz step could produce deprecated archives of the fastqs? I just deleted the nextflow work dir that holds the SortMeRNA output for the deprecated sample and |
Update: the same error occured again with the same input files (same sample). @MarieLataretu I'm using this sample:
and the SortMeRNA process seems to produce deprecated output FASTQs for some reason.
and for the R2 file:
Not sure what's going on and why sortmerna is failing here |
Hi @Ahmed-Shibl , with release 1.2.0 we tackled the issues regarding the non-Ensembl annotation. For the corrupted fastq files, we speculate that |
There might be some issues with the non-biomartr-compatible IDs. This should be fixed in release 1.2.1 |
Hi @MarieLataretu,
That's great, thanks - I'll use the latest v1.2.1 and re-run the command I started with above and keep you posted.
Yes, I definitely use |
great, thanks for re-running those analyses and we hope it works now!
We experienced now a few times issues when executing Nextflow in a In the meantime, maybe it is also possible for you to run it w/o |
I asked around and one of the Nextflow developers wrote:
|
Okay I think there is some progress but I still ran into an error - this time with deseq2. First I updated the pipeline with: The command I then ran (without The error read:
The difference between this time and the time before is that the directory The last few lines in
Let me know if there files or more information to share. Thanks!! |
Hmm, this DESeq error really sounds to be input-dependent... 🤔 Have you ever tried the small test example with |
Hi @MarieLataretu, So, I ran the command you suggested above and it went well. Here's the output:
How do you recommend I troubleshoot my data/files? Is there a way to get a template of the files used as input for DESeq2 in this test run? or would I have to trace further back? Thanks, |
Hi @Ahmed-Shibl thanks for catching up! Here you can find the test data and structure that is run via the Maybe that helps to figure out what's the difference in your data? You can also look in the work dir: |
Hi! I recently ran the pipeline and got an error where a *_summary.log file was not detected during the execution of HISAT2.
Brief background: I'm using a Prochlorococcus genome as input and RNAseq reads (4 replicates) that belong to the strain grown under two different temperatures.
Please find the details below.
This is the command I started with:
nextflow run hoelzer-lab/rnaflow --reads input.csv --genome fasta.csv --annotation gtf.csv --mode paired --cores 60 --memory 250 --permanentCacheDir ~/miniconda3/envs/rnaflow/nextflow-autodownload-databases --condaCacheDir ~/miniconda3/envs/rnaflow/conda
And this is the output/error:
More info:
The contents of
/tmp/nextflow-work-as11798/9f/06b698f84634f12668a460b0102939
are as follows, but there was not much more info about the error;I noticed that the file
22_rep4.sorted.bam
, which is the output of the command causing the error, is found in there.Please let me know if you need any more info! Thanks!!
The text was updated successfully, but these errors were encountered: