Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iterative search for missing genes #58

Open
milnus opened this issue May 8, 2020 · 3 comments
Open

iterative search for missing genes #58

milnus opened this issue May 8, 2020 · 3 comments
Labels
enhancement New feature or request

Comments

@milnus
Copy link
Contributor

milnus commented May 8, 2020

The 'identify missing gene' feature of Panaroo is great! What is you thoughts on doing a search for missing genes iteratively? For example, if you have a region that for some reason is missing annotations, but a similar region is annotated in a seperate genome used in Panaroo. At the moment only the genes in the ends of the regions are 'refound', but is it possible to refind them all by iteratively searching until no neighbour is found or the entire region is annotated?
If this is computationally too heavy for normal use, could it then be an argument, that enables the feature?

@gtonkinhill
Copy link
Owner

I am not sure I follow completely. At the moment we search for missing genes in the neighbourhood of genes that match between genomes. The size of this neighbourhood can be controlled with the --search_radius flag.

Are you aiming to search for genes where there are no matching neighbours? This gets a little trickier as you would not have contextual support for the calls so it would be getting back to the initial annotation problem. One strategy might be to take the centroids of the genes already found by Panaroo and provide them to prokka so that it could match its annotation calls.

@milnus
Copy link
Contributor Author

milnus commented May 9, 2020

I am not trying to search with no missing neighbours.
To rephrase, would panaroo be able to identify the three missing genes below by just increasing the search-radius or does Panaroo search for a specific gene, and thus is only able to refind that gene? I am curious if it would be possible in any way to refind all three genes.

---'gene-1'------'missing-gene'------'missing-gene'------'missing-gene'------ 'gene-2'---

@gtonkinhill
Copy link
Owner

Ah I understand a bit better now. So yes at the moment Panaroo will search for a gene in genome B if a neigbouring gene appears in both genomes A and B. So in the case you have described the two outer missing genes would be re-found but the central one would not. Your suggestion of iteratively searching is a good one and is our approach in the gene family collapsing stage. We could add this to the re-finding step but would need to ensure that in the later iterations the algorithm does not search for the same genes twice. I'll keep this open as an enhancement and try and add it as an option in the next release.

@gtonkinhill gtonkinhill added the enhancement New feature or request label May 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants