scripts/convert_refseq_to_prokka_gff.py produces only 1 chromosome output #147

martinastoycheva · 2022-03-17T17:29:35Z

Hello,

I have a refseq gff that contains two chromosomes in it which I wanted to use in the panaroo pipeline. I tried using the script to convert it to a prokka gff but I get only the second chromosome in the output gff. Is this inteded behaviour?

Cheers,
Martina

gtonkinhill · 2022-03-24T05:50:28Z

Hi Martina,

Sorry for the slow reply. This is not intended behaviour. Without looking at the GFF file it is a bit challenging to work out what might be going wrong. Is it possible you could send me a small example that reproduces the problem?

martinastoycheva · 2022-03-28T10:24:50Z

Hello,

Thanks for your reply! I have solved the issue by providing the gff and fasta separetely.

fwhelan · 2024-07-02T15:36:34Z

Hi gtonkinhill,

I have had something similar happen to me with a new dataset. Of the 441 input genomes, 146 are missing >=1 chromosome after using conver_refseq_to_prokka_gff.py. I can't give you a reproducible example at the moment, but there doesn't seem to be any inconsistency in chromosome order (e.g. last chromosome being omitted), length, or content. No error message is output when this occurs.

Thank you,
Fiona

gtonkinhill · 2024-07-03T05:02:24Z

Hi Fiona,

The conver_refseq_to_prokka_gff script can be pretty strict in throwing out annotations that don't fit within the expected output of Prokka. My guess is that this might be causing the issue. Unfortunately, it doesn't currently print which genes it's ignoring but essentially it will ignore

Genes that have a premature stop codon
Genes that have a length less than 34nt or which is not a multiple of 3
Anything that is not classed as a 'CDS'

As an alternative you should be able to run Panaroo with the --remove-invalid-genes option directly which is what I would recommend. It should then print which genes are being ignored.

If this doesn't fix things, let me know and I'll see if I can work out what's going on.

Cheers,

Gerry

martinastoycheva closed this as completed Mar 28, 2022

gtonkinhill reopened this Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts/convert_refseq_to_prokka_gff.py produces only 1 chromosome output #147

scripts/convert_refseq_to_prokka_gff.py produces only 1 chromosome output #147

martinastoycheva commented Mar 17, 2022

gtonkinhill commented Mar 24, 2022

martinastoycheva commented Mar 28, 2022

fwhelan commented Jul 2, 2024

gtonkinhill commented Jul 3, 2024

scripts/convert_refseq_to_prokka_gff.py produces only 1 chromosome output #147

scripts/convert_refseq_to_prokka_gff.py produces only 1 chromosome output #147

Comments

martinastoycheva commented Mar 17, 2022

gtonkinhill commented Mar 24, 2022

martinastoycheva commented Mar 28, 2022

fwhelan commented Jul 2, 2024

gtonkinhill commented Jul 3, 2024