Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new genome to a Panaroo graph #153

Open
fabgenomics opened this issue Apr 8, 2022 · 3 comments
Open

Add a new genome to a Panaroo graph #153

fabgenomics opened this issue Apr 8, 2022 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@fabgenomics
Copy link

fabgenomics commented Apr 8, 2022

Hi,
I'm using a lot panaroo for a research project. I generated a gene_presence_absence matrix from 1500 bacterial genomes.
Then I used this matrix to train a supervised machine learning model. I got a decent accuracy so I wanted to dig in.
My interest now is to use the generated graph to include a new genome, extract the data from the new genome and do some prediction.
The problem behind the panaroo-integrate command is that all the groups are renamed in an order that is different from the original matrix.
For instance, in the whole matrix, the group_5637 represent the gene hemB but when I add a genome with panaroo-integrate the hemB gene is now in group_695.
As I'm only keeping some of the group for my trainning and predicting process, I would like to keep them identical when adding a genome. I want the group_5637 to represent the same hemB gene.
Is it dificult for you to implement this in the panaroo-integrate code ?
Thanks for all your work,
Fabien

@gtonkinhill
Copy link
Owner

Hi,

This is a good point and should hopefully not be too difficult to implement. I will try and get to it as soon as I can and add it to the next release.

In the meantime you may be able to use the geneIDs to keep track of the same clusters between runs. The panaroo-integrate command should maintain the existing clusters which can be identified by the geneIDs within them.

@gtonkinhill gtonkinhill added the enhancement New feature or request label Apr 13, 2022
@gtonkinhill gtonkinhill self-assigned this Apr 13, 2022
@fabgenomics
Copy link
Author

Hi,
The problem with the geneID is that I can have 2 different groups with the same geneID. I used the defaut parameters for cluster thresholds plus --clean-mode strict --remove-invalid-genes --merge_paralogs.

@daisy238
Copy link

Hello, is there any update on this please? I would also like to use panaroo on a reference genome using the gene cluster ids from a previous panaroo run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants