Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Phigaro? #88

Closed
chris-rands opened this issue Jul 27, 2020 · 18 comments · Fixed by #98
Closed

Add Phigaro? #88

chris-rands opened this issue Jul 27, 2020 · 18 comments · Fixed by #98
Assignees
Labels
enhancement New feature or request

Comments

@chris-rands
Copy link
Contributor

Nice tool and idea. Perhaps our phigaro tool could also be included (although I appreciate that adding new tools is probably not trivial):
https://academic.oup.com/bioinformatics/article/36/12/3882/5822875

https://github.com/bobeobibo/phigaro

@chris-rands
Copy link
Contributor Author

Ah closing as a duplicate of #1 and #75

@replikation
Copy link
Owner

ah yep, you were faster, thanks for the interest and suggestion :)

@replikation
Copy link
Owner

@chris-rands the most time-consuming part is writing a parser for the tools and check/validate the tool output
so we would need:

  1. a working dockerfile / container (for both docker and singularity)
  2. a parser which gets us a simple list of contigs(one contig per line) that are considered "phage positive"
    • and can be used to extract the contigs via samtools
    • some tools rename contigs which make things complicated ;)
  3. (if applicable) database download location for the tool
  • if we have these code snippets and information an inclusion is quite trivial on our side

@replikation replikation added the duplicate This issue or pull request already exists label Jul 27, 2020
@PollyTikhonova
Copy link

PollyTikhonova commented Jul 30, 2020

Hello, I am the Phigaro dev. I wanted to ask, what do you mean when you say contigs? Is it something that you transform initial data in? or is it a prophage region which phigaro detects?
I was trying to understand it on my own, but now I am meeting with some issues with your tool, so maybe it would be faster if you could consult me)))

@replikation
Copy link
Owner

replikation commented Jul 30, 2020

hi, thank you for the interest.
the error you reported (before editing) was probably from the newest nextflow release, please use for Release v0.8.1 [NEXTFLOW version 20.01.0 or 20.04.1]. nextflow 20.07.x will be supported in our next release (currently testing that).

I mean fasta entries/names in a multifasta file. WtP reports which of these fasta entries are phages.

It would help if you could give us a good "best practice" command to identify which fasta contains phages/prophages.
And of which result file we could extract easily the names of these positive phages (the fasta header, e.g. >fasta_1).

we could then include phigaro a bit more conveniently for us :)

@replikation replikation reopened this Jul 30, 2020
@PollyTikhonova
Copy link

  1. About the error - it was my local mistake - I found it and now, it seems as it's working.
  2. The way Phigaro works: it finds the coordinates of prophages in a fasta sequence: so, theoretically, Phigaro can find several phages in one fasta or detect more narrow phage region than the whole fasta. Do I understand correctly that you do not need any of this information (neither coordinates nor the number of phages for each fasta seq)?
  3. I can produce any output you need, and I would be glad if you could provide the example of the file for a specific fasta file (maybe from one of those in your test example https://github.com/mult1fractal/WtP_test-data) - so you won't need to reparse it or transform anyhow. Because now, I have so many little questions, like do you need the whole header or only id\name, should the symbol > be removed, do you need a specidfic name/extension of the file or maybe you need stdout?

@PollyTikhonova
Copy link

PollyTikhonova commented Jul 30, 2020

Oh! WTP just finished the work (with a ruin((( ) but still, it managed to produce some data, so do I understand correctly that you need a file like those in a folder identified_contigs_by_tools ?

@replikation
Copy link
Owner

replikation commented Jul 30, 2020

ad.1 glad to hear its working

ad.2 this information is currently not needed but as it seems useful, I think we will add that too at some point (i keep that in mind for now). the complete "raw tool output" is either way completely stored too so the User can check specifically the output from your tool if they want to
also we are currently discussing how to best implement prophage detection for the user to be as convenient as possible, so the coordinates of prophages in a fasta sequence would be useful for that.

ad.3. no, we don't need a specific output files, can be any file e.g. output.txt ( we need to rename it either way). This needs to contain a list of header names, without ">", one fasta header name per line (unmodified from the original input)

e.g.:

fasta_1
fasta_2
fasta_5

yep you can use https://github.com/mult1fractal/WtP_test-data

Thank you very much

@PollyTikhonova
Copy link

Ok, I understand! Thank you very much for your responses)))

@replikation replikation added enhancement New feature or request and removed duplicate This issue or pull request already exists labels Aug 1, 2020
@replikation replikation self-assigned this Aug 1, 2020
@PollyTikhonova
Copy link

PollyTikhonova commented Aug 13, 2020

Hello,
I did everything you asked for:

  1. Dockerfile and Singularity file.
    1.1 About Singularity:
    For Singularity a couple of paths (PATH and HOME) should be exported, you can find them in the Singularity file. So, you can built a Singularity image from the file or from docker repository and export the paths after. Or you can omit the exporting path step and use the full path Phigaro calling (with specifying a full path to a config file) - I represent the codes below.
  2. Parser
  • You can call Phigaro in a short way (if all paths (PATH and HOME) are installed correctly)
    phigaro -f [PATH_TO_INPUT_MULTIFASTA] -o [PATH_TO_OUTPUT_FOLDER] -t [THREADS (OPTIONAL)] --wtp
  • Or you can call Phigaro in a full path way
    /root/miniconda3/bin/phigaro -f [PATH_TO_INPUT_MULTIFASTA] -o [PATH_TO_OUTPUT_FOLDER] -t [THREADS (OPTIONAL)] --wtp --config /root/.phigaro/config.yml
    It will create the folder that you specify with -o (if it doesn't exist), and there you will find the phigaro.txt file with the names of the phage containing fasta sequences and the folder with standart phigaro outputs.

By the way, I noticed that What_The_Phage changes the names of the phage positive sequences - at least it replaces the last dot with a dash. But Phigaro does not change anything, as you asked.

  1. Database download location
    All external files, Phigaro downloads during the setup process and stores in ~/.phigaro/pvog, are available by the link.
    But I am not sure if you really need this because Phigaro is fully setuped in a Docker container with all external files.

Please, let me know if you have any problems or you need something)

@mult1fractal
Copy link
Collaborator

Hej,
many thanks for your great effort!
We will take care of the rest 👍
I will start integrating Phigaro on Monday

@mult1fractal
Copy link
Collaborator

mult1fractal commented Aug 20, 2020

@PollyTikhonova : Phigaro is basically fully integrated into WtP
I just have to update some minor issues in my pull request #98 and we will release a new version of WtP with your tool 👍
Thanks to your effort it was really easy!

considerations:

  • We are currently using your Dockerfile tikhonovapolly/phigaro:v2.2.6
  • Is it possible to send us the "Raw Dockerfile"[with all the installation commands] so we can build and maintain it?
    • why? because Docker Hub will soon have a Container Image Retention Policy (1 November 2020 ) and we have a pro account for Dockerhub
    • I was also thinking about reducing the size of of the Dockerfile by excluding the databases which we could make a one time download
    • If the Phigaro process is executed in WtP we can link the databases to the correct location in the Phigaro process

@PollyTikhonova
Copy link

PollyTikhonova commented Aug 20, 2020

@mult1fractal, Very happy to hear that! We'll wait for your release)))
Sorry, I am not a very experienced Docker user, so I need to ask what do you mean "raw docker" ? How raw should it be?
As far as I remember, the initial phigaro image was built based on somebody's hmmer docker container and I think that's not you are asking for, am I right?

  • So, should I generate a raw docker from Ubuntu image?
  • And one more question: is it urgent? or you'll release now as is and do all this Docker staff a bit later? (I just need to plan my work processes)
  • Good point about databases! But wouldn't it be really time consuming to download them every time the user runs WtP? or you will download them once with the WtP installation process and then volume them to the docker container?

@mult1fractal
Copy link
Collaborator

No Problem I can take care of it 👍
For the moment, it's fine with your Dockerfile.
When I updated my PR I think we will release the new Version
The Dockerfile we will maintain is a long term goal when Dockerhub will update its policy

  • I basically just need the commands to build the working Dockerfile eg.:
From xyz
RUN apt-get install -y wget procps python

RUN pip install phigaro

Database:

  • We download the database once.

    • First execution of WtP - download all databases
    • every other execution databases are available and you can tell WtP where the databases are located on your system
  • eg. : the database for Phigaro will be downloaded once

    • in nextflow we can basically "mount" the database into the Phigaro module which is the Dockerfile in general
  • But this is something I have to test (Dockerfile with database or Dockerfile without database and "mounting the database" )

  • A --wtp /path/to/database would be ideal for the Dockerfile without database in terms of speed (but this is sth. I have to discuss with @replikation )

@PollyTikhonova
Copy link

@mult1fractal

  • I have a thought about reducing the size of docker: the thing is that for the work with raw data Phigaro demands hmmer and Prodigal preinstallled, but there's an option to provide already computed data by hmmer and prodigal as an input for Phigaro. As I remember, WtP uses this tools too, so, theoretically, we can exclude this tools from docker and left just Phigaro. But, of course, this could affect the scalability and uniformity of your tool, that is maybe not good. But you can just take into account this option.
  • about --wtp /path/to/database: Phigaro has an option to provide a text configuration file with any custom paths. I think, this could work. But if it is not I'll add your UX scenario if you ask)

@mult1fractal
Copy link
Collaborator

@PollyTikhonova
Hey I'm currently preparing our WtP preprint for submission.
I was wondering how you filter the phage positive sequences for the --wtp output
(above which (and what) cut-off it is a phage-sequence)

e.g.

Tool Criteria Filter
Virsorter Category of detection (1, 2 or 3: intact, incomplete or questionable) Category 1 & 2
Phigaro ? ?

@PollyTikhonova
Copy link

While the wtp option Phigaro is working in its basic mode, so the thresolds for the final score are
• Minimum score threshold: 45.39
• Maximum score threshold: 46.0

Briefly, the final score is computed as a product of phage concentration score and triangled GC content score [each of the scores were filtered by indicator function].

@mult1fractal
Copy link
Collaborator

Thank you very much:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants