QUAST v. 4.5 incorrectly assigning unaligned contigs #208
-
Hi there. I'm an MSc student using a pipeline developed by another group to identify contigs in human genomes that don't align to the reference genome. One step in the pipeline utilises QUAST v. 4.5 (the pipeline was made in 2019, I think) to identify the fully and partially unaligned reads using NUCmer, but I recently started doing some analysis of the final set of unaligned or "non-reference" sequences identified by QUAST and found that there was lots (and I mean like over 20 Mbp) of sequence that QUAST had identified as fully unaligned but that actually had quite strong alignment to the reference. Some as much as 90% identity and 100% coverage - which I know should have been labelled as aligned by QUAST. I've gone through the logs and found that NUCmer is correctly identifying the alignment, but somewhere along the way, the QUAST software appears to incorrectly analyse the NUCmer output and therefore incorrectly assign some aligned sequences as fully unaligned. I tested this again using QUAST v. 5.0.2 and Minimap2, and in this version the problem seems to be fixed and all unaligned contigs are correctly identified. This is where I saw that my amount of sequence decreased by 20 Mbp, meaning v. 4.5 had incorrectly assigned 20 Mbp of sequence as unaligned. I'm just wondering if this was a known issue in v. 4.5 or whether there is potentially something wrong with my setup of QUAST? It's not too important as I'm just going to adapt the pipeline to utilise the latest version of QUAST to avoid the error, but I will need to explain the change in my dissertation and I'd like to know if this problem had been seen and identified before. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Hi! Still, if I remember correctly, the default cut-off is 95% since the very beginning (the option introduced in v.4.2; the default value was sometimes different, 90%, in the MetaQUAST mode, but it is not your case), so v.5.0.2 should also mark such contigs as "unaligned". If the real identity is close to 95%, than the switch from Nucmer to minimap2 may resulted in the dramatic change, e.g., the contig was 94% according to Nucmer and thus unaligned and 96% with minimap2 and thus aligned. There were some bugfixes regarding QUAST postprocessing of the raw alignments and calculation of their final identity value in v.5*; they are mostly minor and related to presence of long indels in the alignments. In summary, v.5.* is a substantial update of v.4.5, so it could cause the difference. For nearly 100% IDY alignments there is no much difference, but for ~90% IDY the change of the core aligner inside QUAST could be really important. |
Beta Was this translation helpful? Give feedback.
Hi!
We filter out alignments with identity below
--min-identity
(see here), the default cut-off is 95%. I assume that is why your 90% identity alignments were filtered out. Please check whether the pipeline explicitly sets this option or relies on the default value.Still, if I remember correctly, the default cut-off is 95% since the very beginning (the option introduced in v.4.2; the default value was sometimes different, 90%, in the MetaQUAST mode, but it is not your case), so v.5.0.2 should also mark such contigs as "unaligned". If the real identity is close to 95%, than the switch from Nucmer to minimap2 may resulted in the dramatic change, e.g., the contig was 94% according to Nucmer…