Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input data metrics explanation, % of mtDNA reads of the total sequence reads that mapped to the whole mtDNA #216

Closed
AliBasuony2022 opened this issue Dec 23, 2023 · 5 comments

Comments

@AliBasuony2022
Copy link

Dear friends,

I have got a question from a reviewers regarding % of mtDNA reads of the total sequence reads that mapped to the whole mtDNA in Novoplaty. Where I can find this information, in Novoplasty outputs, please. Is it 0.43 % (see Input data metrics below, please)

Can someone explain the "Input data metrics", please- I'm just confused?

Below is the log file.

Kind regards,
Ali


NOVOPlasty: The Organelle Assembler
Version 4.3.1
Author: Nicolas Dierckxsens, (c) 2015-2020

Input parameters from the configuration file: *** Verify if everything is correct ***

Project:

Project name = mito_1_375
Type = mito
Genome range = 15000-18000
K-mer = 33
Max memory = 64
Extended log = 1
Save assembled reads = yes
Seed Input = NC_008434.1_Vv_complete_mitogenome16813bp.fasta
Extend seed directly = no
Reference sequence =
Variance detection =
Chloroplast sequence =

Dataset 1:

Read Length = 151
Insert size = 350
Platform = illumina
Single/Paired = PE
Combined reads =
Forward reads = /mnt/scratch/c1845371/whole_genome/data/375_R1.fastq.gz
Reverse reads = /mnt/scratch/c1845371/whole_genome/data/375_R2.fastq.gz
Store Hash =

Heteroplasmy:

Heteroplasmy =
HP exclude list =
PCR-free =

Optional:

Insert size auto = yes
Use Quality Scores =
Output path = /mnt/scratch/c1845371/whole_genome/mitochondrial_genome/mito_12/

Subsampled fraction: 24.14 %
Forward reads without pair: 13259
Reverse reads without pair: 5025

Retrieve Seed...

Initial read retrieved successfully: TCTTACACCCGCCAGATCTTGCTGTCTATCTATAGATATCATTTCCTTGATATTTTATTTTTTACCGCCTCTATAGTTCGCACCAACAAAGCCAAAAACAAAAGTTAATGTAGCTTAATTAGTAAAGCAAGGCACTGAAAATGCCAAGATG

Start Assembly...

------------Assembly 1 finished: Contigs are automatically merged in Merged_contigs file------------

Contig 01 : 16521 bp
Contig 02 : 349 bp
Contig 03 : 992 bp
Contig 04 : 385 bp
Contig 05 : 881 bp

Total contigs : 5
Largest contig : 16521 bp
Smallest contig : 349 bp
Average insert size : 337 bp

-----------------------------------------Input data metrics-----------------------------------------

Total reads : 105400318
Aligned reads : 455762
Assembled reads : 418834
Organelle genome % : 0.43 %
Average organelle coverage : 4176


@ndierckx
Copy link
Owner

Hi,

Yes it is indeed 0.43%

@AliBasuony2022
Copy link
Author

Thanks so much,

But the the number of raw reads (pairs) for both mitochondrial and nuclear together is 216,237,628 . What the number 105400318 in the Input data metrics referes to? Is it the number of mitochondrial reads?

Sorry, I'm still confused.

Best regards,
Ali

@ndierckx
Copy link
Owner

105400318 is the total reads used. You have put a max memory, so it subsampled your data and only used 105400318 reads, it doesn't call the rest when you subsample. You have a large dataset so don't need to use the complete set

@AliBasuony2022
Copy link
Author

Good point.
Thanks so much, Nicolas.

@ndierckx ndierckx closed this as completed Jan 4, 2024
@AliBasuony2022
Copy link
Author

Dear Nicolas,

Just a follow up question for this issue.

How do I know the right % of mtDNA reads of the total sequence reads that mapped to the whole mtDNA? I'm doing a comparison between the performance of NOVOPlasty and other de novo assemblies and this information is so important.

When I used adifferent memmory settings (all other settings are fixed), I have got the same lenght of the largest contig, but with differnt number for assembled reads, aligned and total reads.

Does the subsampled fraction: 99.99 % when setting the Max memory= Null is right? if so, the number of total reads is over the number of reads in the raw data. I'm still confused, sorry.

max memory Null
log_mito_1_375_12_6_max memory Null.txt

max memory 100
log_mito_1_375_12_3_max memory 100.txt

memory 64
log_mito_1_375_12_max memory 64.txt

Kind regards,
Ali

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants