You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I'm using a lot panaroo for a research project. I generated a gene_presence_absence matrix from 1500 bacterial genomes.
Then I used this matrix to train a supervised machine learning model. I got a decent accuracy so I wanted to dig in.
My interest now is to use the generated graph to include a new genome, extract the data from the new genome and do some prediction.
The problem behind the panaroo-integrate command is that all the groups are renamed in an order that is different from the original matrix.
For instance, in the whole matrix, the group_5637 represent the gene hemB but when I add a genome with panaroo-integrate the hemB gene is now in group_695.
As I'm only keeping some of the group for my trainning and predicting process, I would like to keep them identical when adding a genome. I want the group_5637 to represent the same hemB gene.
Is it dificult for you to implement this in the panaroo-integrate code ?
Thanks for all your work,
Fabien
The text was updated successfully, but these errors were encountered:
This is a good point and should hopefully not be too difficult to implement. I will try and get to it as soon as I can and add it to the next release.
In the meantime you may be able to use the geneIDs to keep track of the same clusters between runs. The panaroo-integrate command should maintain the existing clusters which can be identified by the geneIDs within them.
Hi,
The problem with the geneID is that I can have 2 different groups with the same geneID. I used the defaut parameters for cluster thresholds plus --clean-mode strict --remove-invalid-genes --merge_paralogs.
Hello, is there any update on this please? I would also like to use panaroo on a reference genome using the gene cluster ids from a previous panaroo run.
Hi,
I'm using a lot panaroo for a research project. I generated a gene_presence_absence matrix from 1500 bacterial genomes.
Then I used this matrix to train a supervised machine learning model. I got a decent accuracy so I wanted to dig in.
My interest now is to use the generated graph to include a new genome, extract the data from the new genome and do some prediction.
The problem behind the panaroo-integrate command is that all the groups are renamed in an order that is different from the original matrix.
For instance, in the whole matrix, the group_5637 represent the gene hemB but when I add a genome with panaroo-integrate the hemB gene is now in group_695.
As I'm only keeping some of the group for my trainning and predicting process, I would like to keep them identical when adding a genome. I want the group_5637 to represent the same hemB gene.
Is it dificult for you to implement this in the panaroo-integrate code ?
Thanks for all your work,
Fabien
The text was updated successfully, but these errors were encountered: