Suggestion - allow a reference to be set #17

BiologicalScientist · 2019-10-01T04:39:08Z

Hi - really liking the tool! One suggested improvement would be to have an option to control which of the sequences gets set as your reference rather than defaulting to the largest sequence. If there is a reason this is a bad idea please let me know. I had a look at the code and from what I could tell the determination of the reference is using the snippet below. The $biggest_size variable looked to be only used to decide the $ref_index so it should be possible to select a reference sequence as the $ref_idx if provided. I wasn't sure if any of the other tools required the reference to be the largest sequence though.

  if ($size > $biggest_size) {
    $ref_idx = $N;
    $biggest_size = $size;
  }

The main reason for this suggestion is for when there is a reference sequence that has better annotations/ phenotypic data so understanding what has changed in relation to that sequence is useful.

The text was updated successfully, but these errors were encountered:

tseemann · 2019-10-01T05:27:24Z

Ultimately it should noty matter what reference you use, because the SNPs it generates are "core" only. But that said, it could still be useful.

This project is very early stages, but I hope to work on it this month.

BiologicalScientist · 2019-10-01T07:17:52Z

Thanks for letting me know. I've currently made a bit of a quick workaround by just reversing the logic (making the reference be the smaller of the two sequences) and it seems to run fine.

The other advantage of doing the analysis with different references is you get the regions present only in the reference from the uncov.bed (if I am understanding the output correctly) which can be useful for finding where phage etc might be integrated.

Looking forward to the later stages when they come.

tseemann · 2019-10-01T23:40:05Z

It's important to realise ekidna is not a variant calling pipeline. It is an experiment to see how fast a SNP/alignment based phylogeny can be; to fit between sketch based methods (mashtree) and read/SNP methods (snippy).

I recently came across and packaged this new tool: https://github.com/hsinnan75/MapCaller
It is very fast because it doesn't go via BAM files, and works from reads.

As for uncov.bed I can see how that would be useful with the correct reference. I will add the feature.

chrisgulvik · 2019-10-30T11:56:53Z

Ultimately it should not matter what reference you use, because the SNPs it generates are "core" only.

One reason I'd like to specify a ref is faster processing of subclustering. A sample of interest specified as ref against a large panel enables the reuse of alignments with ekidna -k. Once ekidna shows the primary cluster the unknown isolate is in, a subset of the samples could be analyzed with the same alignments for a refined (larger) core genome.

tseemann · 2019-10-30T22:28:18Z

Ok, that's a good use case. Thank you!

tseemann self-assigned this Oct 1, 2019

tseemann added the enhancement New feature or request label Oct 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion - allow a reference to be set #17

Suggestion - allow a reference to be set #17

BiologicalScientist commented Oct 1, 2019

tseemann commented Oct 1, 2019

BiologicalScientist commented Oct 1, 2019

tseemann commented Oct 1, 2019

chrisgulvik commented Oct 30, 2019 •

edited

Loading

tseemann commented Oct 30, 2019

Suggestion - allow a reference to be set #17

Suggestion - allow a reference to be set #17

Comments

BiologicalScientist commented Oct 1, 2019

tseemann commented Oct 1, 2019

BiologicalScientist commented Oct 1, 2019

tseemann commented Oct 1, 2019

chrisgulvik commented Oct 30, 2019 • edited Loading

tseemann commented Oct 30, 2019

chrisgulvik commented Oct 30, 2019 •

edited

Loading