Midas DB compatibility #12

CreatorOfMoon · 2021-06-17T13:47:58Z

Hi and thank you for your software,

I'm trying to make your Kalamari Database match the requirement for https://github.com/snayfach/MIDAS/blob/master/docs/build_db.md

It would be nice if you were downloading :
"<genome_id>.faa:" the protein sequence in FASTA format

"<genome_id>.ffn": the gene sequence in FASTA format

"<genome_id>.genes": a tab delimited file with genomic coordinates of genes. The file should be tab-delimited file with a header and the following fields.

I don't know if those are available on esearch :

Kalamari/bin/downloadKalamari.pl

Line 85 in aa25b5e

    
           my $command = "esearch -db nuccore -query '$acc' | efetch -format fasta > $outfile.tmp";

But it would be a nice addition to your pipeline :)

thank you !

lskatz · 2021-06-17T23:22:58Z

It might be helpful to know what the edirect commands would be for these. I don't think I have this exactly right. Do I need to have a step through elink -db assembly? Alternatively, would it be helpful to simply run each assembly through prokka instead and just get all these files through consistent annotation? Thank you for your feedback.

esearch -db nuccore -query '$acc' | elink -target protein | efetch -format fasta > $outfile.tmp

lskatz · 2021-06-18T02:29:07Z

I think I figured it out in branch more-formats. Can you try it out?

CreatorOfMoon · 2021-06-26T07:45:10Z

So, i tested it and it seems working like a charm. For midas it seems perfect for me to work with, i'll probably include a script to format correctly the data which i've been working on.

I must say that the process of downloading faa, ffn and genes file however slow down a lot the downloading ( more than 2 hours to download v3.9 here compared to only 20 minute with only fna.)
It might be worth it to add an argument wherever the user wants or not the other files to be downloaded.

lskatz · 2021-06-28T18:43:57Z

It might be worth it to add an argument wherever the user wants or not the other files to be downloaded.

Good point!

For midas it seems perfect for me to work with, i'll probably include a script to format correctly the data which i've been working on.

The data should be formatted correctly from the start with the Kalamari script. Could you let me know the right way to format it for Midas?

lskatz · 2021-06-28T19:10:48Z

I made an option to download optional files with a0ac20a..522c9be and with --and

CreatorOfMoon · 2021-06-30T13:05:36Z

The data should be formatted correctly from the start with the Kalamari script. Could you let me know the right way to format it for Midas?

Well i don't know if by formatting it the correct way, you'll lose your compatibility with Kraken for example.

Here is what you have to do :

create a mapfile with 3 column :
genome_id (CHAR): corresponds to subdirectory within INDIR
species_id (CHAR): : species identifier for genome_id
rep_genome (0 or 1): indicator if genome_id should be used for SNP calling

And then name each file and folder with the good name :

And this should work.

lskatz · 2021-07-22T17:03:18Z

I think I'll leave this open for now but it would be interesting to come back to.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Midas DB compatibility #12

Midas DB compatibility #12

CreatorOfMoon commented Jun 17, 2021

lskatz commented Jun 17, 2021

lskatz commented Jun 18, 2021

CreatorOfMoon commented Jun 26, 2021

lskatz commented Jun 28, 2021

lskatz commented Jun 28, 2021

CreatorOfMoon commented Jun 30, 2021

lskatz commented Jul 22, 2021

Midas DB compatibility #12

Midas DB compatibility #12

Comments

CreatorOfMoon commented Jun 17, 2021

lskatz commented Jun 17, 2021

lskatz commented Jun 18, 2021

CreatorOfMoon commented Jun 26, 2021

lskatz commented Jun 28, 2021

lskatz commented Jun 28, 2021

CreatorOfMoon commented Jun 30, 2021

lskatz commented Jul 22, 2021