Skip to content
milnus edited this page Sep 19, 2022 · 4 revisions

Can I use X-pan_genome_tool with Corekaburra

Yes and no. In theory every type of pan-genome is compatible with Corekaburra, as it in some ways is just a glorified gff and pan-genome parser. "All" you need to do is convert what ever format you have into a 'Roary' gene_presence_absence.csv format. In this format the essential columns are:

  • Column 1 indicating the name of the pan-genome cluster.
  • Column 4 indicating the number of isolates represented in the cluster.
  • Column 5 number of sequences in the cluster
  • Column ≥15 which is presence and absence of genes across clusters for one isolate per column.

One additional tip to make Corekaburra accept you pan-genome is to have the ID and locus_tag be the same for each CDS feature.

Core genes not appearing in core_gene_segments.csv

Some core genes may not appear in the core_gene_segments.csv file due to them being connected to many other core genes, that themselves are connected to many other genes. This means that the gene is not able to be confidently called, as being 'stable' in its connection to other genes.

Handling of complete genomes

GFF files are not an easy animal to handle, when it comes to indication of 'complete' and/or 'closed' genomes. Convention seems to be to add a line in the beggining of a specific segment's (contig/chromosome/plasmid/prophage/etc.) annotations with the feature column being region, coordinates being the lentgh of the contig, and a flag for is_circlular=True in the attributes column.
Why not handle complete genomes in corekaburra this way?
Good question, and it is something we may add in the future. As a starting point a single text file, with semi-flexible naming allowed, is easy to produce, stable (can be reffered to later on), and can easilly have genomes added or removed for additional runs.

Can Corekaburra be used without a complete genome?

Yes, Corekaburra is designed so that you are not required to have a complete genome. If you use only draft genomes let us know if you get a closed 'pseudo' structure of your genome! It would be awsome to see!

Is it possible to use multi chromosome species?

Yes, we have tested Corekaburra on Burkholderia cenocepacia, a three chromosome species often with a additional plasmid. This worked well and can give information about interactions across chromosomes.