Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW]: TipToft: detecting plasmids contained in uncorrected long read sequencing data #1021

Closed
54 tasks done
whedon opened this issue Oct 15, 2018 · 62 comments
Closed
54 tasks done
Assignees
Labels
accepted published Papers published in JOSS recommend-accept Papers recommended for acceptance in JOSS. review

Comments

@whedon
Copy link

whedon commented Oct 15, 2018

Submitting author: @andrewjpage (Andrew Page)
Repository: https://github.com/andrewjpage/tiptoft
Version: v1.0.1
Editor: @Kevin-Mattheus-Moerman
Reviewer: @ctb, @kapsakcj, @azneto
Archive: 10.5281/zenodo.2561192

Status

status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/94219c0e71b803fc9b5a523c37d16600"><img src="http://joss.theoj.org/papers/94219c0e71b803fc9b5a523c37d16600/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/94219c0e71b803fc9b5a523c37d16600/status.svg)](http://joss.theoj.org/papers/94219c0e71b803fc9b5a523c37d16600)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@ctb & @kapsakcj & @azneto, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.theoj.org/about#reviewer_guidelines. Any questions/concerns please let @Kevin-Mattheus-Moerman know.

Please try and complete your review in the next two weeks

Review checklist for @ctb

Conflict of interest

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Version: v1.0.1
  • Authorship: Has the submitting author (@andrewjpage) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Authors: Does the paper.md file include a list of authors with their affiliations?
  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

Review checklist for @kapsakcj

Conflict of interest

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Version: v1.0.1
  • Authorship: Has the submitting author (@andrewjpage) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Authors: Does the paper.md file include a list of authors with their affiliations?
  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

Review checklist for @azneto

Conflict of interest

Code of Conduct

General checks

  • Repository: Is the source code for this software available at the repository url?
  • License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
  • Version: v1.0.1
  • Authorship: Has the submitting author (@andrewjpage) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

  • Installation: Does installation proceed as outlined in the documentation?
  • Functionality: Have the functional claims of the software been confirmed?
  • Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
  • Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
  • Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
  • Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

  • Authors: Does the paper.md file include a list of authors with their affiliations?
  • A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?
@whedon
Copy link
Author

whedon commented Oct 15, 2018

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @ctb, it looks like you're currently assigned as the reviewer for this paper 🎉.

⭐ Important ⭐

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

@whedon
Copy link
Author

whedon commented Oct 15, 2018

Attempting PDF compilation. Reticulating splines etc...

@whedon
Copy link
Author

whedon commented Oct 15, 2018

@kapsakcj
Copy link

kapsakcj commented Nov 9, 2018

Hi everyone, I apologize for the delay in reviewing, but I've finally found a chance to give TipToft a proper review. I'm very impressed with TipToft and will likely eventually incorporate it into the software pipelines at where I work (public health lab). Thank you for developing it! I'm your target audience! We'll be doing nanopore sequencing soon and TipToft will be perfect.

It's not so often that you find a piece of bioinformatics software that is well-documented, easy to install(!), easy to use, and runs fast like the way TipToft does. I've installed/ran TipToft via pip, bioconda, docker, and even bioconda via Ubuntu via Windows Subsystem for Linux on Win10. Each worked on the first time too 👍

I tested TipToft on the supplied test data (got the expected output), some of my own nanopore data, and some random nanopore data I pulled off of the SRA. It ID'd all replicons that I expected except for one. I'd previously used PlasmidFinder to ID an IncA/C2 replicon off of a nanopore-read-only assembly of a plasmid, but was unable to do so using those same raw reads and TipToft. I'm thinking this may because these aren't the highest quality nanopore reads and isn't the fault of TipToft. @andrewjpage if you'd like I can probably supply you with the reads and assembly (need to check with PI) if you're curious to figure out what happened.

I do have one recommendation for the paper. I see the citation of the PlasmidFinder database in the repo, but I think it would be worth mentioning and citing the paper there too.

I think that's all I can suggest for right now. Thanks for developing the tool and publishing it in an open-access way. It was fun to review! I'll leave an issue on github if I have any, and potentially a PR if there's any way I can improve it. That's an ACCEPT from me.

@Kevin-Mattheus-Moerman
Copy link
Member

Thanks @kapsakcj for your review contribution! 🎉

@Kevin-Mattheus-Moerman
Copy link
Member

@ctb, @azneto, can you give an update as to when you are able to review this work? Thanks! 🤖

@azneto
Copy link

azneto commented Nov 9, 2018 via email

@azneto
Copy link

azneto commented Nov 9, 2018

@andrewjpage, congratulations! This is a very very clever method for finding the sequences of interests in datasets with considerable error rates. TipToft relies on the kmer matching approach to identify perfect hits, without the hassle of dealing with base quality and sequence assembly. I've ran the software a few times during the last weeks using both Fedora and Ubuntu. The software is well documented and runs really fast.

Here are a few recommendations:

Though the documentation for the end users is really good and comprehensive, the code itself is not so inviting. I tried to follow the algorithm step by step, but it was really hard since it doesn't follow the PEP8 (Style Guide for Python Code) and the PEP20 (The Zen of Python).

I don't understand why would someone need to create aliases to functions or have duplicate functions in the code. For example:

Fasta.sequence_to_kmers is an alias to Fasta.sequence_kmers
Fasta.kmers_to_genes is an alias to Fasta.all_kmers_to_sq_in_file
The functions Fasta.sequence_kmers_vals and Fasta.sequence_kmers are exactly the same

There are variables and libraries that were created/loaded and never used (eg. time, operator, retcode). The software flake8 can find those and also check if the code is compatible with the PEP8. I also recommend using typing (https://docs.python.org/3/library/typing.html) for your functions to make sure the parameters and returns are correct and avoid implicitly passing of parameters using global variables.

Maybe it's worth mentioning in the paper that if the user doesn't have at least 20x sequencing coverage, it's almost certain TipToft will not find all kmers. In anyways, the sequencing companies recommend at least 40x coverage to guarantee full error correction and rarely someone will have such problem.

I agree with @hapsakcj regarding to the citation of PlasmidFinder and created an issue in the TipToft repo.

Thanks for contributing to the open source software and bioinformatics communities! I'm glad I got the chance to learn about a new useful tool. That's an ACCEPT for me too.

@andrewjpage
Copy link

Thank you @kapsakcj and @azneto for the kind reviews and for taking the time to use and review the code.

I have added in the PlasmidFinder reference and mentioned it in the text.

I have reformatted the code and removed anything thats redundant or duplicated, so it now passes flake8 and pycodestyle without any errors. Thank you for the pointer to PEP20, its the first time I've come across it and I most definitely will follow it for all new Python projects. I'm relatively new to Python and have unfortunatly carried over bad habits from other languages.

I don't understand why would someone need to create aliases to functions. For example:
Fasta.sequence_to_kmers is an alias to Fasta.sequence_kmers
Fasta.kmers_to_genes is an alias to Fasta.all_kmers_to_sq_in_file

In this instance I wanted to run a computationally intensive method exactly once and store the result. Happy to discuss this offline if there is a more appropriate pattern to use in Python.

I also recommend using typing (https://docs.python.org/3/library/typing.html) for your functions to make sure the parameters and returns are correct and avoid implicitly passing of parameters using global variables.

Type hints look really awesome, thank for the suggestion. As this software currently works on 3.4 and above, adding type hints would increase this to 3.5 (probably 3.6 given the provisional status of the api). I think I will wait for it to stabilise before making the switch to minimise any potential barriers for researchers.

Maybe it's worth mentioning in the paper that if the user doesn't have at least 20x sequencing coverage, it's almost certain TipToft will not find all kmers. In anyways, the sequencing companies recommend at least 40x coverage to guarantee full error correction and rarely someone will have such problem.

Yes you are absolutely right, in the real world you may not see all kmers if the coverage is too low. As its essentially an alignment based approach rather than de novo assembly it can detect a signal at a low coverage. I have added text to explain the depth of coverage. Unfortunatly there is not enough room in a JOSS paper to explain the algorithm indepth.

@Kevin-Mattheus-Moerman
Copy link
Member

@ctb, can you give an update as to when you are able to review this work? Thanks.

@Kevin-Mattheus-Moerman
Copy link
Member

@ctb, are you able to add your review comments? Thanks 🐰

@ctb
Copy link

ctb commented Nov 20, 2018 via email

@andrewjpage
Copy link

@Kevin-Mattheus-Moerman Just a gentle reminder that this is still pending. Hope everyone had a nice holiday

@Kevin-Mattheus-Moerman
Copy link
Member

Thanks for the reminder and apologies for the delay with your submission.

@Kevin-Mattheus-Moerman
Copy link
Member

@ctb are you able to review this work at this point? Please let us know if you are no longer able to. Thanks.

@ctb
Copy link

ctb commented Jan 16, 2019

Review done! Agree with all the nice comments above - good work!

I did not look at the code at all, since @azneto did such a nice job :)

The only two issues that I saw that should be addressed are indicated in my unchecked boxes above --

  • the version has advanced to 1.0! so I left unchecked the "version matches the released version" box; minor technicality since the version has surpassed that :).
  • the example data file mentioned in the README no longer exists in the repo. See issue cannot find example data file andrewjpage/tiptoft#18.

but I did file another two issues that are merely suggestions,

Once the example data file issue is resolved (one way or another) I'm definitely an ACCEPT!

@Kevin-Mattheus-Moerman
Copy link
Member

Thanks @ctb!

@Kevin-Mattheus-Moerman
Copy link
Member

@andrewjpage please reply to @ctb 's comments. Please do add a CONTRIBUTING.md file (see also https://help.github.com/articles/setting-guidelines-for-repository-contributors/).

@ctb feel free to tick that version box. We will update the final version number upon acceptance (future review issues will not have the version box any more as versions often move on).

@andrewjpage
Copy link

Thanks @ctb for taking the time to review the software. I have resolved all the issues raised, adding a CITATION.cff file, added back the example data file, adding a CONTRIBUTING.md and a CoC.

Thanks @Kevin-Mattheus-Moerman for editing.

@Kevin-Mattheus-Moerman
Copy link
Member

@ctb thanks again for your review work, are you happy with how @andrewjpage has responded to your comments?

@ctb
Copy link

ctb commented Jan 31, 2019 via email

@Kevin-Mattheus-Moerman
Copy link
Member

Kevin-Mattheus-Moerman commented Jan 31, 2019

@ctb great, please tick the remaining box. Thanks for your help!

@ctb
Copy link

ctb commented Jan 31, 2019

done.

@Kevin-Mattheus-Moerman
Copy link
Member

@whedon generate pdf

@whedon
Copy link
Author

whedon commented Jan 31, 2019

Attempting PDF compilation. Reticulating splines etc...

@whedon
Copy link
Author

whedon commented Jan 31, 2019

@Kevin-Mattheus-Moerman
Copy link
Member

@whedon check references

@Kevin-Mattheus-Moerman
Copy link
Member

@whedon set 10.5281/zenodo.2561192 as archive

@whedon
Copy link
Author

whedon commented Mar 1, 2019

OK. 10.5281/zenodo.2561192 is the archive.

@Kevin-Mattheus-Moerman
Copy link
Member

@whedon set v1.0.1 as version

@whedon
Copy link
Author

whedon commented Mar 1, 2019

OK. v1.0.1 is the version.

@danielskatz
Copy link

Thanks @Kevin-Mattheus-Moerman for editing, and @ctb & @kapsakcj & @azneto for reviewing!

@danielskatz
Copy link

@whedon accept

@whedon
Copy link
Author

whedon commented Mar 1, 2019

Attempting dry run of processing paper acceptance...

@whedon
Copy link
Author

whedon commented Mar 1, 2019

PDF failed to compile for issue #1021 with the following error:

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed

0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 15 0 15 0 0 187 0 --:--:-- --:--:-- --:--:-- 187
sh: 0: getcwd() failed: No such file or directory
sh: 0: getcwd() failed: No such file or directory
pandoc: 10.21105.joss.01021.pdf: openBinaryFile: does not exist (No such file or directory)
Looks like we failed to compile the PDF

@whedon
Copy link
Author

whedon commented Mar 1, 2019


OK DOIs

- http://doi.org/10.1038/nature22401 is OK
- http://doi.org/10.1186/s13059-015-0677-2 is OK
- http://doi.org/10.1128/jcm.02483-16 is OK
- http://doi.org/10.1038/nrg.2017.88 is OK
- http://doi.org/10.1101/gr.215087.116 is OK
- http://doi.org/10.1128/aac.02412-14 is OK

MISSING DOIs

- None

INVALID DOIs

- None

@danielskatz
Copy link

danielskatz commented Mar 1, 2019

I suspect an error in the whedon's accept processing. I will check by trying to generate the pdf by itself

@danielskatz
Copy link

@whedon generate pdf

@whedon
Copy link
Author

whedon commented Mar 1, 2019

Attempting PDF compilation. Reticulating splines etc...

@whedon
Copy link
Author

whedon commented Mar 1, 2019

@danielskatz
Copy link

👋 @arfon - please check this and see what's off in whedon's accept processing.

@danielskatz
Copy link

👋 @arfon - also, the archive link on the left of the PDF itself doesn't seem to be correct.

@arfon
Copy link
Member

arfon commented Mar 1, 2019

@whedon accept

@whedon
Copy link
Author

whedon commented Mar 1, 2019

Attempting dry run of processing paper acceptance...

@whedon
Copy link
Author

whedon commented Mar 1, 2019


OK DOIs

- http://doi.org/10.1038/nature22401 is OK
- http://doi.org/10.1186/s13059-015-0677-2 is OK
- http://doi.org/10.1128/jcm.02483-16 is OK
- http://doi.org/10.1038/nrg.2017.88 is OK
- http://doi.org/10.1101/gr.215087.116 is OK
- http://doi.org/10.1128/aac.02412-14 is OK

MISSING DOIs

- None

INVALID DOIs

- None

@whedon
Copy link
Author

whedon commented Mar 1, 2019

Check final proof 👉 openjournals/joss-papers#536

If the paper PDF and Crossref deposit XML look good in openjournals/joss-papers#536, then you can now move forward with accepting the submission by compiling again with the flag deposit=true e.g.

@whedon accept deposit=true

@danielskatz
Copy link

👋 @arfon - my issues seem to be resolved - should I go ahead and finalize the accept?
(Also, any idea what happened with the generate process as part of accept, and the faulty PDF?)

@arfon
Copy link
Member

arfon commented Mar 1, 2019

👋 @arfon - my issues seem to be resolved - should I go ahead and finalize the accept

Yes, please do.

Sometimes Whedon fails to compile the PDF properly - this bug is something I've never managed to replicate.

As for the faulty PDF, @whedon generate pdf doesn't attempt to make the correct link for the archive (apologies, this is undocumented).

@danielskatz
Copy link

@whedon accept deposit=true

@whedon whedon added the accepted label Mar 1, 2019
@whedon
Copy link
Author

whedon commented Mar 1, 2019

Doing it live! Attempting automated processing of paper acceptance...

@whedon
Copy link
Author

whedon commented Mar 1, 2019

🚨🚨🚨 THIS IS NOT A DRILL, YOU HAVE JUST ACCEPTED A PAPER INTO JOSS! 🚨🚨🚨

Here's what you must now do:

  1. Check final PDF and Crossref metadata that was deposited 👉 Creating pull request for 10.21105.joss.01021 joss-papers#537
  2. Wait a couple of minutes to verify that the paper DOI resolves https://doi.org/10.21105/joss.01021
  3. If everything looks good, then close this review issue.
  4. Party like you just published a paper! 🎉🌈🦄💃👻🤘

Any issues? notify your editorial technical team...

@whedon
Copy link
Author

whedon commented Mar 1, 2019

🎉🎉🎉 Congratulations on your paper acceptance! 🎉🎉🎉

If you would like to include a link to your paper from your README use the following code snippets:

Markdown:
[![DOI](http://joss.theoj.org/papers/10.21105/joss.01021/status.svg)](https://doi.org/10.21105/joss.01021)

HTML:
<a style="border-width:0" href="https://doi.org/10.21105/joss.01021">
  <img src="http://joss.theoj.org/papers/10.21105/joss.01021/status.svg" alt="DOI badge" >
</a>

reStructuredText:
.. image:: http://joss.theoj.org/papers/10.21105/joss.01021/status.svg
   :target: https://doi.org/10.21105/joss.01021

This is how it will look in your documentation:

DOI

We need your help!

Journal of Open Source Software is a community-run journal and relies upon volunteer effort. If you'd like to support us please consider doing either one (or both) of the the following:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted published Papers published in JOSS recommend-accept Papers recommended for acceptance in JOSS. review
Projects
None yet
Development

No branches or pull requests

8 participants