Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion - allow a reference to be set #17

Open
BiologicalScientist opened this issue Oct 1, 2019 · 5 comments
Open

Suggestion - allow a reference to be set #17

BiologicalScientist opened this issue Oct 1, 2019 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@BiologicalScientist
Copy link

Hi - really liking the tool! One suggested improvement would be to have an option to control which of the sequences gets set as your reference rather than defaulting to the largest sequence. If there is a reason this is a bad idea please let me know. I had a look at the code and from what I could tell the determination of the reference is using the snippet below. The $biggest_size variable looked to be only used to decide the $ref_index so it should be possible to select a reference sequence as the $ref_idx if provided. I wasn't sure if any of the other tools required the reference to be the largest sequence though.

  if ($size > $biggest_size) {
    $ref_idx = $N;
    $biggest_size = $size;
  }

The main reason for this suggestion is for when there is a reference sequence that has better annotations/ phenotypic data so understanding what has changed in relation to that sequence is useful.

@tseemann tseemann self-assigned this Oct 1, 2019
@tseemann tseemann added the enhancement New feature or request label Oct 1, 2019
@tseemann
Copy link
Owner

tseemann commented Oct 1, 2019

Ultimately it should noty matter what reference you use, because the SNPs it generates are "core" only. But that said, it could still be useful.

This project is very early stages, but I hope to work on it this month.

@BiologicalScientist
Copy link
Author

Thanks for letting me know. I've currently made a bit of a quick workaround by just reversing the logic (making the reference be the smaller of the two sequences) and it seems to run fine.

The other advantage of doing the analysis with different references is you get the regions present only in the reference from the uncov.bed (if I am understanding the output correctly) which can be useful for finding where phage etc might be integrated.

Looking forward to the later stages when they come.

@tseemann
Copy link
Owner

tseemann commented Oct 1, 2019

It's important to realise ekidna is not a variant calling pipeline. It is an experiment to see how fast a SNP/alignment based phylogeny can be; to fit between sketch based methods (mashtree) and read/SNP methods (snippy).

I recently came across and packaged this new tool: https://github.com/hsinnan75/MapCaller
It is very fast because it doesn't go via BAM files, and works from reads.

As for uncov.bed I can see how that would be useful with the correct reference. I will add the feature.

@chrisgulvik
Copy link

chrisgulvik commented Oct 30, 2019

Ultimately it should not matter what reference you use, because the SNPs it generates are "core" only.

One reason I'd like to specify a ref is faster processing of subclustering. A sample of interest specified as ref against a large panel enables the reuse of alignments with ekidna -k. Once ekidna shows the primary cluster the unknown isolate is in, a subset of the samples could be analyzed with the same alignments for a refined (larger) core genome.

@tseemann
Copy link
Owner

Ok, that's a good use case. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants