You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi Oliver,
I performed quite a few runs now on metagenomes of various sizes with the modified Bakta version from #289 and had barely any problems except for one sample and specifically with one contig.
There, a PILER-CR predicted spacer seq doesnt match the seq in the fasta at the position parsed by Bakta.
Here the stdout from a run with regular Bakta and just the one contig:
Bakta v1.9.3
Options and arguments:
input: .../bakta_devel/test/data/piler_test_contig.fna
db: .../databases/imp3/bakta/db, version 5.1, full
output: .../bakta_devel/test/sanity_check_piler_base
force: True
tmp directory: .../bakta_devel/test/tmp/tmp5fici64f
prefix: annotation
threads: 128
debug: True
meta mode: True
translation table: 11
keep contig headers: True
Bakta runs in DEBUG mode! Temporary data will not be destroyed at: .../bakta_devel/test/tmp/tmp5fici64f
parse genome sequences...
imported: 1
filtered & revised: 1
contigs: 1
start annotation...
predict tRNAs...
found: 0
predict tmRNAs...
found: 0
predict rRNAs...
found: 0
predict ncRNAs...
found: 0
predict ncRNA regions...
found: 0
predict CRISPR arrays...
array_id=1, contig_id=contig_0001, position=19605, repeat_length=47, repeat_seq=...........................T..................., spacer_seq=TTCCGGTG, spacer_length=8, gap_count=0, spacer_genome_seq=TGCGGACT
Traceback (most recent call last):
File ".../bin/bakta", line 10, in <module>
sys.exit(main())
File ".../lib/python3.10/site-packages/bakta/main.py", line 210, in main
genome['features'][bc.FEATURE_CRISPR] = crispr.predict_crispr(genome, contigs_path)
File ".../lib/python3.10/site-packages/bakta/features/crispr.py", line 108, in predict_crispr
assert spacer_seq == spacer_genome_seq # assure PILER-CR provided sequence equals sequence extracted from genome
AssertionError
I do find the seq in the fasta at index -6 from crispr_spacer['start'], but am not sure how/if its relevant:
Hi @ohickl ,
thank you very much for reporting! I just took a deeper look into this and found a wrong assumption that I made for the CRISPR parser of PILER-CR.
I wrongly assumed that all CRISPR spacers are of length N>=10. But, in your example, the CRISPR spacer is of length 8. Because of this, the regex didn't match to the result file. I pushed a PR with a fix. I'll run some tests and release a patch release, soon - hopefully by the end of this or next week.
* fix CRISPR parser #299
The assumed minimal length of CRISPR spacers was reduced from at least 10 down to at least 1 in the regex.
* polish code
* relaxe CRISPR spacer length regex to 1 or 2 digits
Hi Oliver,
I performed quite a few runs now on metagenomes of various sizes with the modified Bakta version from #289 and had barely any problems except for one sample and specifically with one contig.
There, a PILER-CR predicted spacer seq doesnt match the seq in the fasta at the position parsed by Bakta.
Here the stdout from a run with regular Bakta and just the one contig:
I do find the seq in the fasta at index -6 from
crispr_spacer['start']
, but am not sure how/if its relevant:Im not sure, if the problem is PILER itself or the result parsing, but I get this with all Bakta versions.
I attached the fasta of the contig and a manual PILER run with the same params used in Bakta.
Let me know, if I should get in touch with the PILER people directly for this.
Best
Oskar
piler_test_out.txt
piler_test_contig.txt
The text was updated successfully, but these errors were encountered: