-
Notifications
You must be signed in to change notification settings - Fork 4
FAQ
Yes and no. In theory every type of pan-genome is compatible with Corekaburra, as it in some ways is just a glorified gff and pan-genome parser. "All" you need to do is convert whatever format you have into a 'Roary' gene_presence_absence.csv format. In this format the essential columns are:
- Column 1 indicating the name of the pan-genome cluster.
- Column 4 indicating the number of isolates represented in the cluster.
- Column 5 number of sequences in the cluster
- Column ≥15 which is presence and absence of genes across clusters for one isolate per column.
One additional tip to make Corekaburra accept your pan-genome is to have the ID
and locus_tag
be the same for each CDS
feature.
Some core genes may not appear in the core_gene_segments.csv
file due to them being connected to many other core genes that themselves are connected to many other genes. This means that the gene is not able to be confidently called, as being 'stable' in its connection to other genes.
GFF files are not an easy animal to handle, when it comes to indication of 'complete' and/or 'closed' genomes. Convention seems to be to add a line in the beginning of a specific segment's (contig/chromosome/plasmid/prophage/etc.) annotations with the feature column being region
, coordinates being the length of the contig, and a flag for is_circlular=True
in the attributes column.
Why not handle complete genomes in corekaburra this way?
Good question, and it is something we may add in the future. As a starting point a single text file, with semi-flexible naming allowed, is easy to produce, stable (can be referred to later on), and can easily have genomes added or removed for additional runs.
Yes, Corekaburra is designed so that you are not required to have a complete genome. If you use only draft genomes let us know if you get a closed 'pseudo' structure of your genome! It would be awsome to see!
Yes, we have tested Corekaburra on Burkholderia cenocepacia, a three chromosome species often with a additional plasmid. This worked well and can give information about interactions across chromosomes.