Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a way to deactivate the overlap detection so bakta does not filter my input proteins? #295

Closed
Daniel-Tichy opened this issue Jun 19, 2024 · 5 comments
Assignees
Labels
Milestone

Comments

@Daniel-Tichy
Copy link

Daniel-Tichy commented Jun 19, 2024

  • The issue is related to the user-provided proteins feature and its associated issues.

  • I am trying to use bakta to perform annotation on a phage predicted protein file that used Phanotate. I was expecting an annotation to every protein of my input file but it seems that overlapped proteins are being filtered by bakta.

-I would like to deactivate the overlap detection so bakta does not filter the previously predicted proteins that I am using as input.

Example: this is my input gbk for bakta.

 CDS             3417..3809
                 /ID="WARQSXNU_CDS_9"
                 /phrog="786"
                 /top_hit="p65745 VI_07030"
                 /locus_tag="WARQSXNU_9"
                 /function="unknown function"
                 /product="hypothetical protein"
                 /source="PHANOTATE"
                 /score="-22.41946661155013"
                 /phase="0"
                 /translation="MAAPTPEELVSQMASRGMTITTTDASGILCLVASISECLELNYPN
                 DECRQNAIMLWASILISANTAGRYVTSQSAPSGASQSFAYGSKPWVALYNQMKLLDSAG
                 CTGDLVEDPDGSGKPWFAVVRGSKCK"
 CDS             3806..4147
                 /ID="WARQSXNU_CDS_10"
                 /phrog="797"
                 /top_hit="p299466 VI_10274"
                 /locus_tag="WARQSXNU_10"
                 /function="unknown function"
                 /product="hypothetical protein"
                 /source="PHANOTATE"
                 /score="-111.69024253224252"
                 /phase="0"
                 /translation="MTSLARFSYTQPCTIWHKSGTDKYGKPTFDAPVSIMCDYGFNDDV
                 STDAKGNEIVQKNTFWTEYTGAKVGDYIMIGTMIEADPLVAGANQILNVINYGNTFQRS
                 EPPDFALVT"
 CDS             4147..4545
                 /ID="WARQSXNU_CDS_11"
                 /phrog="No_PHROG"
                 /top_hit="No_PHROG"
                 /locus_tag="WARQSXNU_11"
                 /function="unknown function"
                 /product="hypothetical protein"
                 /source="PHANOTATE"
                 /score="-52.42604964159676"
                 /phase="0"
                 /translation="MPAKLRGVRKAVERTSQIVDEIIATKAVRALKSATYIIRTESATL
                 TPIDTSTLINSQFDTVEVSGTRITGKVGYSAKYALYVHNASGKLAGKPRSNGNGTYWSP
                 GGEPQFLTKAAQRTKDLVDGVIKKEMKL"

I parse it and input it in the following format to bakta.

WARQSXNU_9 ~~~hypothetical protein~~~
MAAPTPEELVSQMASRGMTITTTDASGILCLVASISECLELNYPNDECRQNAIMLWASIL
ISANTAGRYVTSQSAPSGASQSFAYGSKPWVALYNQMKLLDSAGCTGDLVEDPDGSGKPW
FAVVRGSKCK
WARQSXNU_10 ~~~hypothetical protein~~~
MTSLARFSYTQPCTIWHKSGTDKYGKPTFDAPVSIMCDYGFNDDVSTDAKGNEIVQKNTF
WTEYTGAKVGDYIMIGTMIEADPLVAGANQILNVINYGNTFQRSEPPDFALVT
WARQSXNU_11 ~~~hypothetical protein~~~
MPAKLRGVRKAVERTSQIVDEIIATKAVRALKSATYIIRTESATLTPIDTSTLINSQFDT
VEVSGTRITGKVGYSAKYALYVHNASGKLAGKPRSNGNGTYWSPGGEPQFLTKAAQRTKD
LVDGVIKKEMKL

But I get this output, the protein for WARQSXNU_10 is missing probably because of the overlap in the genome.

 gene            complement(40007..40405)
                 /locus_tag="MKOBIG_00315"
 CDS             complement(40007..40405)
                 /db_xref="SO:0001217"
                 /db_xref="UniRef:UniRef50_W7P0V4"
                 /db_xref="UniRef:UniRef90_A0A1B1W263"
                 /db_xref="UserProtein:WARQSXNU_11"
                 /product="hypothetical protein"
                 /locus_tag="MKOBIG_00315"
                 /protein_id="gnl|Bakta|MKOBIG_00315"
                 /translation="MPAKLRGVRKAVERTSQIVDEIIATKAVRALKSATYIIRTESATL
                 TPIDTSTLINSQFDTVEVSGTRITGKVGYSAKYALYVHNASGKLAGKPRSNGNGTYWSP
                 GGEPQFLTKAAQRTKDLVDGVIKKEMKL"
                 /codon_start=1
                 /transl_table=11
                 /inference="ab initio prediction:Prodigal:2.6"
                 /inference="similar to AA
                 sequence:UniRef:UniRef90_A0A1B1W263"
 gene            complement(40743..41135)
                 /locus_tag="MKOBIG_00320"
 CDS             complement(40743..41135)
                 /db_xref="SO:0001217"
                 /db_xref="UniRef:UniRef50_A0A173GBZ4"
                 /db_xref="UniRef:UniRef90_A0A1B1W265"
                 /db_xref="UserProtein:WARQSXNU_9"
                 /product="hypothetical protein"
                 /locus_tag="MKOBIG_00320"
                 /protein_id="gnl|Bakta|MKOBIG_00320"
                 /translation="MAAPTPEELVSQMASRGMTITTTDASGILCLVASISECLELNYPN
                 DECRQNAIMLWASILISANTAGRYVTSQSAPSGASQSFAYGSKPWVALYNQMKLLDSAG
                 CTGDLVEDPDGSGKPWFAVVRGSKCK"
                 /codon_start=1
                 /transl_table=11
                 /inference="ab initio prediction:Prodigal:2.6"
                 /inference="similar to AA
                 sequence:UniRef:UniRef90_A0A1B1W265"

I am currently running bakta with this line within a docker.
bakta --db $bakta_db/ --protein $faa_input_bakta --skip-trna --skip-tmrna --skip-rrna --skip-ncrna --skip-ncrna-region --skip-crispr --skip-pseudo --skip-gap --skip-ori --skip-plot --output ${assembly_input_bakta.simpleName}_bakta/ --threads ${params.threads} $assembly_input_bakta

@Daniel-Tichy Daniel-Tichy added the enhancement New feature or request label Jun 19, 2024
@oschwengers
Copy link
Owner

Hi, thanks for reaching out. To make sure that I correctly understand what you're finally trying to achieve: you would like to annotate a phage genome sequence with Bakta using a user-provided proteins file with functional annotations from Phanotate? Is this correct?

@Daniel-Tichy
Copy link
Author

Hi! yes, I want to perform the bakta annotation over a user-provided proteins file with functional annotations from Phanotate

@oschwengers
Copy link
Owner

Hmm, in principle, you can do this. However, Bakta was designed to annotate bacterial genomes, hence the overlap filters. I could add an option to deactivate all overlap filters in the next release. But I cannot make any promises when this will be. Meanwhile, you could try pharokka?

@oschwengers oschwengers self-assigned this Sep 24, 2024
@oschwengers oschwengers added this to the v1.10.0 milestone Oct 10, 2024
oschwengers added a commit that referenced this issue Oct 10, 2024
@oschwengers
Copy link
Owner

Hey @Daniel-Tichy , I just added a new --skip-filter option to Bakta which is now available in the main branch, and will be public with the upcoming v1.10.0, soon.

I hope this fits your needs in this case. I'll close this for now. If there are any further comments, ideas, suggestions, please do not hesitate to re-open this (or a new one). Thanks again an best regards!

@oschwengers oschwengers added feature and removed enhancement New feature or request labels Oct 10, 2024
@Daniel-Tichy
Copy link
Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants