-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different expected and conceptual translations for some partial CDS #340
Comments
Hi Trestan, thanks for reporting. |
Here are the Thank you, |
OK, I took a deeper look into this. Indeed, for partial CDS that are truncated at both sides, the trailing amino acids is accidentally clipped as this is perceived as a trailing I think this is a crucial bug, but occurs only in such rare edge cases that for most applications might not be that relevant (at least I hope so), as most analyses probably do not take into account annotated protein sequences that are truncated at both sides, anyway. I committed a fix which will be available soon with the upcoming version Again, thank you very much for reporting this! I'll close this for now, but please do not hesitate to re-open it if you have any further questions or feedback. |
Hi @oschwengers Lines 406 to 408 in 518253a
The origin of that specific problem was the wrong "`--complete" flag, but chromosomes and plasmids can be linear, is it really safe to try to guess the topology from the content of the header? Best |
Hi,
I'm trying to submit embl flatfiles generated by bakta to the EBI but I am confronted with multiple errors related to the translation of some partial CDS. It seems that some partial CDS have unexpected translated sequences. The message raise by the webin validation tool is:
ERROR: Expected and conceptual translations are different. [ line: 167841 of 4_GEN12896.embl.gz]
If I translate those CDS with biopython, I systematically get one extra amino acid as compared to the translation reported by bakta. Here are few examples (from this embl file):
contig_113 <0:1356
biopython LRNVESDVDALFAKTDITIGQSLTLLNNEITKFVGEAGKGSGAAQVLAGSVQTLASNLDLIADGALVVGIGYITRAILMKSAAIKEGMASTLASRQASVLNAQAEYAEATAALNAAKAHLANVRATNAETQAKFGATAAATRYAQAQAAVTAATNAQTAAQIKLNTATSIAGRLAKGAFGLIGGWAGVATLGVMGLAAAYSYFNNKAEGAKQKLAEQAKVAEKADEELKKLTGNDKAKAVNDLTTAFNAQNKALEKSSRAVGSALIDIENYARGNREVEKISQEARTGTISYTEAIERLNKIKLPTDLYENLKKQAAQYDDNASKASLSAEKLKLLRVEVKLGGNEAQNAAIQHQKQADALGNTATEAEKATKALQDYQAKQKDSVIDSIYKSGWLDKGYTVAQANAILELQKAKGMSAILSKDEIDSALRNLKIIEEQQEREDKLTEAKRK
bakta LRNVESDVDALFAKTDITIGQSLTLLNNEITKFVGEAGKGSGAAQVLAGSVQTLASNLDLIADGALVVGIGYITRAILMKSAAIKEGMASTLASRQASVLNAQAEYAEATAALNAAKAHLANVRATNAETQAKFGATAAATRYAQAQAAVTAATNAQTAAQIKLNTATSIAGRLAKGAFGLIGGWAGVATLGVMGLAAAYSYFNNKAEGAKQKLAEQAKVAEKADEELKKLTGNDKAKAVNDLTTAFNAQNKALEKSSRAVGSALIDIENYARGNREVEKISQEARTGTISYTEAIERLNKIKLPTDLYENLKKQAAQYDDNASKASLSAEKLKLLRVEVKLGGNEAQNAAIQHQKQADALGNTATEAEKATKALQDYQAKQKDSVIDSIYKSGWLDKGYTVAQANAILELQKAKGMSAILSKDEIDSALRNLKIIEEQQEREDKLTEAKR
contig_127 <0:831
biopython LQKQQELGLLKLAQEQRLFQAEQFMLGEMERIKKRYALEYDEISKITDLEERRRKMSAFQADFIRNGVGNPTIDQYDTSSQFLKSTNYTKPKQTNMQVLDEDYAQTYQKLKDNLAAVLESEKASYQERLEAERVFKEARQQMDNEYHLKAIDARKADHDSQLQLYSQMISSASSTWGGLTQIVKDARGENSRSFKAMFIAQQSFAIASAIISAHLAATQVAADATIPFFGAKIAASTAMLAMGYANAGLIAGQTIAGFSDGGFTGSGGKYQPAGIVH
bakta LQKQQELGLLKLAQEQRLFQAEQFMLGEMERIKKRYALEYDEISKITDLEERRRKMSAFQADFIRNGVGNPTIDQYDTSSQFLKSTNYTKPKQTNMQVLDEDYAQTYQKLKDNLAAVLESEKASYQERLEAERVFKEARQQMDNEYHLKAIDARKADHDSQLQLYSQMISSASSTWGGLTQIVKDARGENSRSFKAMFIAQQSFAIASAIISAHLAATQVAADATIPFFGAKIAASTAMLAMGYANAGLIAGQTIAGFSDGGFTGSGGKYQPAGIV
contig_129 <1:808
biopython TQEIEKQAKLTKRLVGISGQSGIGTGPHLDVRYGGSLSGQKVSNEHLARLQAGGKPLTSYKISSNYGPRKAPTKGASSFHKGIDFSMPEGTPITTNVAVKDIKTWYDSKGGGYVSEVIFEDGVSLKLLHQSPKMQSKVKGGASKGSDKAAGDIQSQLERQQDLQRSLENEVASEVGRINNNRKARLEDVDKANFSPERTAEIKAEINRRADNDIAIAKQALRTKLEDYKEFQKTEEQLLEESFNRKKFNAAHDLELSKFEQKQAVELLE
bakta TQEIEKQAKLTKRLVGISGQSGIGTGPHLDVRYGGSLSGQKVSNEHLARLQAGGKPLTSYKISSNYGPRKAPTKGASSFHKGIDFSMPEGTPITTNVAVKDIKTWYDSKGGGYVSEVIFEDGVSLKLLHQSPKMQSKVKGGASKGSDKAAGDIQSQLERQQDLQRSLENEVASEVGRINNNRKARLEDVDKANFSPERTAEIKAEINRRADNDIAIAKQALRTKLEDYKEFQKTEEQLLEESFNRKKFNAAHDLELSKFEQKQAVELL
contig_133 <2:638
biopython QNWGGIQADMNGTGEFFRQDQERFSRLNAANDLADSQFAATDLNEQNSLDGLNAQFEAGLIKQQDYENQKTAIIQAAQDQRNQIAAEYAQNAQDIEDKYQQDRLNTIIAFGGNMMGSLTSMFGSMFGEQSKAYKIMFAADKAYAIAAAGIAIQQNIAAASKVGFPLNLPLIAGAVAQGASIIANIRAIKDQGFAEGGYTGRGGKYEVAGAVH
bakta QNWGGIQADMNGTGEFFRQDQERFSRLNAANDLADSQFAATDLNEQNSLDGLNAQFEAGLIKQQDYENQKTAIIQAAQDQRNQIAAEYAQNAQDIEDKYQQDRLNTIIAFGGNMMGSLTSMFGSMFGEQSKAYKIMFAADKAYAIAAAGIAIQQNIAAASKVGFPLNLPLIAGAVAQGASIIANIRAIKDQGFAEGGYTGRGGKYEVAGAV
Biopython code to get the translation:
trans = feature.translate(record.seq, cds=False, stop_symbol="")
Any idea what might be causing this problem?
Best,
Trestan
The text was updated successfully, but these errors were encountered: