Explanation of how --cov-cutoff works #18

tseemann · 2017-05-09T06:42:13Z

The cov-cutoff parameter remains a mystery to the Spades user community. It used to be auto and now it is off.

Would it be possible to add an explaination to the document explaining it?

Common results are getting contigs with coverages of < 1.0

The text was updated successfully, but these errors were encountered:

wangyugui · 2017-05-22T06:04:00Z

'a positive float number' is explaned in help output.

what is the meaning of 0--1.0 and over 1.0?

asl · 2017-05-22T06:06:06Z

what is the meaning of 0--1.0 and over 1.0?

See http://cab.spbu.ru/files/release3.10.1/manual.html#sec3.5 that explains what is the coverage reported.

wangyugui · 2017-05-22T06:12:04Z

Does cov-cutoff is used to filter contig ouput? It is not used for filter fastq inut by kmer coverage?

tseemann · 2017-06-01T10:58:04Z

Here is what the manual section 3.5 says:

Contigs/scaffolds names in SPAdes output FASTA files have the following format: 

>NODE_3_length_237403_cov_243.207_ID_45

Here 3 is the number of the contig/scaffold, 237403 is the sequence length in nucleotides and 243.207 is the k-mer coverage for the last (largest) k value used. 

Note that the k-mer coverage is always lower than the read (per-base) coverage.

The only way to get k-mer coverage < 1 is to have a contig which is less than the k_max ?

(which can happen in a section of a de bruijn graph when breaking into contigs)

wangyugui · 2017-06-01T12:59:03Z

the --cov-cutoff of SPAdes is after assembly?
people may want a low k-mer coverage filter before assembly and to speed up the assembly.

kmer-mask is the tool that I wanted, but there are some problems
a) meryl is slower than Jellfish and it uses too much memory( when much threads).
b)some bugs need to fix for big fastq/fasta files .(I have the dirty patch(uint32->uint64), but it seems not active)

snurk · 2017-06-01T21:57:03Z

Dear @tseemann
SPAdes uses iteratively increases value of K and additinaly tries to glue together potentially broken regions using paired read mapping and searching for small overlaps.
Both these procedures add kmers, which have coverage 0 since they are not present in the reads.
Also if Ns are introduced to scaffolds, then the total length of the scaffold might increase with its average kmer coverage decreasing.
I hope this explains appearance of average kmer coverage <1.0 in the results.
On the other hand, as far as I know SPAdes should not produce contigs shorter than k_max.

snurk · 2017-06-01T22:34:43Z

Dear @wangyugui

the --cov-cutoff of SPAdes is after assembly?

Yes and no. It happens after the assembly graph is constructed (and most graph simplification procedures finished). But the low covered edges are actually removed from the graph, leading to the compression of remaining unambiguous paths and not interfering with subsequent repeat resolution and scaffolding.
The value auto is compatible only with uniform coverage model (no --meta or --mda flags).
In this case the threshold is set automatically from the probabilistic model trained on kmer frequency histogram. In this case the value is chosen independently for every iteration.
If the value is provided manually, it is interpreted as an "average nucleotide coverage" and will be multiplied by (RL - K)/RL to get a threshold on average kmer coverage for assembly iteration with kmer size K.

Dear @tseemann, I hope this answers your initial question, and I would be glad to provide any clarifications.

people may want a low k-mer coverage filter before assembly and to speed up the assembly.

We are considering adding this option in future, but currently you would have to set up your own pre-processing pipeline.

tseemann · 2017-06-06T08:21:17Z

@snurk thank you very much for responding with such detail to our questions. I'll pass this page onto the bacterial genomics community. And thank you for continuing to develop spades.

asl · 2017-06-06T08:38:52Z

@tseemann We will try to explain the k-mer coverage model SPAdes uses, if time permits. Though it's already used inside kmergenie :)

jacarrico · 2017-06-06T13:58:28Z

Yes thanks a lot for the answers and for developing Spades! This is fundamental for the kind of work we have been doing that includes certification of pipelines using spades

snurk · 2017-06-06T14:54:31Z

@tseemann, @jacarrico you are welcome!

Some cleanup

snurk self-assigned this May 19, 2017

snurk added the question label May 19, 2017

snurk assigned asl May 24, 2017

snurk closed this as completed Jun 9, 2017

asl added a commit that referenced this issue May 15, 2018

Merge pull request #18 from ablab/cleanup_for_3.12

69f6552

Some cleanup

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explanation of how --cov-cutoff works #18

Explanation of how --cov-cutoff works #18

tseemann commented May 9, 2017

wangyugui commented May 22, 2017

asl commented May 22, 2017 •

edited

Loading

wangyugui commented May 22, 2017

tseemann commented Jun 1, 2017 •

edited

Loading

wangyugui commented Jun 1, 2017 •

edited

Loading

snurk commented Jun 1, 2017

snurk commented Jun 1, 2017 •

edited

Loading

tseemann commented Jun 6, 2017

asl commented Jun 6, 2017

jacarrico commented Jun 6, 2017

snurk commented Jun 6, 2017 •

edited

Loading

Explanation of how --cov-cutoff works #18

Explanation of how --cov-cutoff works #18

Comments

tseemann commented May 9, 2017

wangyugui commented May 22, 2017

asl commented May 22, 2017 • edited Loading

wangyugui commented May 22, 2017

tseemann commented Jun 1, 2017 • edited Loading

wangyugui commented Jun 1, 2017 • edited Loading

snurk commented Jun 1, 2017

snurk commented Jun 1, 2017 • edited Loading

tseemann commented Jun 6, 2017

asl commented Jun 6, 2017

jacarrico commented Jun 6, 2017

snurk commented Jun 6, 2017 • edited Loading

asl commented May 22, 2017 •

edited

Loading

tseemann commented Jun 1, 2017 •

edited

Loading

wangyugui commented Jun 1, 2017 •

edited

Loading

snurk commented Jun 1, 2017 •

edited

Loading

snurk commented Jun 6, 2017 •

edited

Loading