Is there a way to deactivate the overlap detection so bakta does not filter my input proteins? #295

Daniel-Tichy · 2024-06-19T23:30:50Z

The issue is related to the user-provided proteins feature and its associated issues.
I am trying to use bakta to perform annotation on a phage predicted protein file that used Phanotate. I was expecting an annotation to every protein of my input file but it seems that overlapped proteins are being filtered by bakta.

-I would like to deactivate the overlap detection so bakta does not filter the previously predicted proteins that I am using as input.

Example: this is my input gbk for bakta.

 CDS             3417..3809
                 /ID="WARQSXNU_CDS_9"
                 /phrog="786"
                 /top_hit="p65745 VI_07030"
                 /locus_tag="WARQSXNU_9"
                 /function="unknown function"
                 /product="hypothetical protein"
                 /source="PHANOTATE"
                 /score="-22.41946661155013"
                 /phase="0"
                 /translation="MAAPTPEELVSQMASRGMTITTTDASGILCLVASISECLELNYPN
                 DECRQNAIMLWASILISANTAGRYVTSQSAPSGASQSFAYGSKPWVALYNQMKLLDSAG
                 CTGDLVEDPDGSGKPWFAVVRGSKCK"
 CDS             3806..4147
                 /ID="WARQSXNU_CDS_10"
                 /phrog="797"
                 /top_hit="p299466 VI_10274"
                 /locus_tag="WARQSXNU_10"
                 /function="unknown function"
                 /product="hypothetical protein"
                 /source="PHANOTATE"
                 /score="-111.69024253224252"
                 /phase="0"
                 /translation="MTSLARFSYTQPCTIWHKSGTDKYGKPTFDAPVSIMCDYGFNDDV
                 STDAKGNEIVQKNTFWTEYTGAKVGDYIMIGTMIEADPLVAGANQILNVINYGNTFQRS
                 EPPDFALVT"
 CDS             4147..4545
                 /ID="WARQSXNU_CDS_11"
                 /phrog="No_PHROG"
                 /top_hit="No_PHROG"
                 /locus_tag="WARQSXNU_11"
                 /function="unknown function"
                 /product="hypothetical protein"
                 /source="PHANOTATE"
                 /score="-52.42604964159676"
                 /phase="0"
                 /translation="MPAKLRGVRKAVERTSQIVDEIIATKAVRALKSATYIIRTESATL
                 TPIDTSTLINSQFDTVEVSGTRITGKVGYSAKYALYVHNASGKLAGKPRSNGNGTYWSP
                 GGEPQFLTKAAQRTKDLVDGVIKKEMKL"

I parse it and input it in the following format to bakta.

WARQSXNU_9 ~~~hypothetical protein~~~
MAAPTPEELVSQMASRGMTITTTDASGILCLVASISECLELNYPNDECRQNAIMLWASIL
ISANTAGRYVTSQSAPSGASQSFAYGSKPWVALYNQMKLLDSAGCTGDLVEDPDGSGKPW
FAVVRGSKCK
WARQSXNU_10 ~~~hypothetical protein~~~
MTSLARFSYTQPCTIWHKSGTDKYGKPTFDAPVSIMCDYGFNDDVSTDAKGNEIVQKNTF
WTEYTGAKVGDYIMIGTMIEADPLVAGANQILNVINYGNTFQRSEPPDFALVT
WARQSXNU_11 ~~~hypothetical protein~~~
MPAKLRGVRKAVERTSQIVDEIIATKAVRALKSATYIIRTESATLTPIDTSTLINSQFDT
VEVSGTRITGKVGYSAKYALYVHNASGKLAGKPRSNGNGTYWSPGGEPQFLTKAAQRTKD
LVDGVIKKEMKL

But I get this output, the protein for WARQSXNU_10 is missing probably because of the overlap in the genome.

 gene            complement(40007..40405)
                 /locus_tag="MKOBIG_00315"
 CDS             complement(40007..40405)
                 /db_xref="SO:0001217"
                 /db_xref="UniRef:UniRef50_W7P0V4"
                 /db_xref="UniRef:UniRef90_A0A1B1W263"
                 /db_xref="UserProtein:WARQSXNU_11"
                 /product="hypothetical protein"
                 /locus_tag="MKOBIG_00315"
                 /protein_id="gnl|Bakta|MKOBIG_00315"
                 /translation="MPAKLRGVRKAVERTSQIVDEIIATKAVRALKSATYIIRTESATL
                 TPIDTSTLINSQFDTVEVSGTRITGKVGYSAKYALYVHNASGKLAGKPRSNGNGTYWSP
                 GGEPQFLTKAAQRTKDLVDGVIKKEMKL"
                 /codon_start=1
                 /transl_table=11
                 /inference="ab initio prediction:Prodigal:2.6"
                 /inference="similar to AA
                 sequence:UniRef:UniRef90_A0A1B1W263"
 gene            complement(40743..41135)
                 /locus_tag="MKOBIG_00320"
 CDS             complement(40743..41135)
                 /db_xref="SO:0001217"
                 /db_xref="UniRef:UniRef50_A0A173GBZ4"
                 /db_xref="UniRef:UniRef90_A0A1B1W265"
                 /db_xref="UserProtein:WARQSXNU_9"
                 /product="hypothetical protein"
                 /locus_tag="MKOBIG_00320"
                 /protein_id="gnl|Bakta|MKOBIG_00320"
                 /translation="MAAPTPEELVSQMASRGMTITTTDASGILCLVASISECLELNYPN
                 DECRQNAIMLWASILISANTAGRYVTSQSAPSGASQSFAYGSKPWVALYNQMKLLDSAG
                 CTGDLVEDPDGSGKPWFAVVRGSKCK"
                 /codon_start=1
                 /transl_table=11
                 /inference="ab initio prediction:Prodigal:2.6"
                 /inference="similar to AA
                 sequence:UniRef:UniRef90_A0A1B1W265"

I am currently running bakta with this line within a docker.
bakta --db $bakta_db/ --protein $faa_input_bakta --skip-trna --skip-tmrna --skip-rrna --skip-ncrna --skip-ncrna-region --skip-crispr --skip-pseudo --skip-gap --skip-ori --skip-plot --output ${assembly_input_bakta.simpleName}_bakta/ --threads ${params.threads} $assembly_input_bakta

The text was updated successfully, but these errors were encountered:

oschwengers · 2024-06-26T13:18:44Z

Hi, thanks for reaching out. To make sure that I correctly understand what you're finally trying to achieve: you would like to annotate a phage genome sequence with Bakta using a user-provided proteins file with functional annotations from Phanotate? Is this correct?

Daniel-Tichy · 2024-07-04T20:25:30Z

Hi! yes, I want to perform the bakta annotation over a user-provided proteins file with functional annotations from Phanotate

oschwengers · 2024-07-11T14:39:37Z

Hmm, in principle, you can do this. However, Bakta was designed to annotate bacterial genomes, hence the overlap filters. I could add an option to deactivate all overlap filters in the next release. But I cannot make any promises when this will be. Meanwhile, you could try pharokka?

oschwengers · 2024-10-10T16:25:10Z

Hey @Daniel-Tichy , I just added a new --skip-filter option to Bakta which is now available in the main branch, and will be public with the upcoming v1.10.0, soon.

I hope this fits your needs in this case. I'll close this for now. If there are any further comments, ideas, suggestions, please do not hesitate to re-open this (or a new one). Thanks again an best regards!

Daniel-Tichy · 2024-10-29T03:20:44Z

Thank you!

Daniel-Tichy added the enhancement New feature or request label Jun 19, 2024

StefDiV mentioned this issue Jul 25, 2024

Option for trusted HMMs #309

Closed

oschwengers self-assigned this Sep 24, 2024

oschwengers added this to the v1.10.0 milestone Oct 10, 2024

oschwengers added a commit that referenced this issue Oct 10, 2024

introduce skip-filter option #295

2d1ca75

oschwengers closed this as completed Oct 10, 2024

oschwengers added feature and removed enhancement New feature or request labels Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a way to deactivate the overlap detection so bakta does not filter my input proteins? #295

Is there a way to deactivate the overlap detection so bakta does not filter my input proteins? #295

Daniel-Tichy commented Jun 19, 2024 •

edited

Loading

oschwengers commented Jun 26, 2024

Daniel-Tichy commented Jul 4, 2024

oschwengers commented Jul 11, 2024

oschwengers commented Oct 10, 2024

Daniel-Tichy commented Oct 29, 2024

Is there a way to deactivate the overlap detection so bakta does not filter my input proteins? #295

Is there a way to deactivate the overlap detection so bakta does not filter my input proteins? #295

Comments

Daniel-Tichy commented Jun 19, 2024 • edited Loading

oschwengers commented Jun 26, 2024

Daniel-Tichy commented Jul 4, 2024

oschwengers commented Jul 11, 2024

oschwengers commented Oct 10, 2024

Daniel-Tichy commented Oct 29, 2024

Daniel-Tichy commented Jun 19, 2024 •

edited

Loading