Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PRE REVIEW]: ELECTRIC: Electric fields Leveraged from multipoleExpansion Calculations in Tinker Rapid Interface Code #2429

Closed
whedon opened this issue Jul 2, 2020 · 38 comments

Comments

@whedon
Copy link

whedon commented Jul 2, 2020

Submitting author: @valeriewelborn (Valerie Vaissier)
Repository: https://github.com/WelbornGroup/ELECTRIC
Version: v1.0.0
Editor: @richardjgowers
Reviewers: @amandadumi, @govarguz
Managing EiC: Daniel S. Katz

⚠️ JOSS reduced service mode ⚠️

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Author instructions

Thanks for submitting your paper to JOSS @valeriewelborn. Currently, there isn't an JOSS editor assigned to your paper.

@valeriewelborn if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). In addition, this list of people have already agreed to review for JOSS and may be suitable for this submission (please start at the bottom of the list).

Editor instructions

The JOSS submission bot @whedon is here to help you find and assign reviewers and start the main review. To find out what @whedon can do for you type:

@whedon commands
@whedon
Copy link
Author

whedon commented Jul 2, 2020

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks.

⚠️ JOSS reduced service mode ⚠️

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

@whedon
Copy link
Author

whedon commented Jul 2, 2020

Software report (experimental):

github.com/AlDanial/cloc v 1.84  T=0.34 s (800.4 files/s, 104413.8 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
HTML                           103            639            903          12948
TeX                             30           1245             81           4588
C                                7            506           1137           2488
Python                          31            599            539           1949
CSS                              3            335             80           1621
JavaScript                      45            101            143           1350
Fortran 90                       6            296            123            966
C/C++ Header                    11             96            117            657
CMake                           19            155             93            454
YAML                             4             30             28            243
Markdown                         4            108              0            233
C++                              4             59             33            202
Bourne Shell                     2             14             19             23
make                             1              5              0             18
-------------------------------------------------------------------------------
SUM:                           270           4188           3296          27740
-------------------------------------------------------------------------------


Statistical information for the repository '2429' was gathered on 2020/07/02.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
Jessica A. Nash                  1             1              1            0.00
Jessica Nash                    27           890            432            1.93
Taylor Barnes                   26         28554          18948           69.24
taylor-a-barnes                  2             9              6            0.02
valeriewelborn                   1          9881           9881           28.81

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
Jessica Nash                502           56.4          3.4               14.74
Taylor Barnes              9565           33.5          4.9               22.30

@whedon
Copy link
Author

whedon commented Jul 2, 2020

Reference check summary:

OK DOIs

- 10.1146/annurev-biophys-070317-033349 is OK
- 10.1021/ct4003702 is OK
- 10.1021/acs.jctc.8b00529 is OK
- 10.1021/acs.jctc.7b01169 is OK
- 10.1021/jacs.6b12265 is OK
- 10.1021/acscatal.7b03151 is OK
- 10.1038/s41929-018-0109-2 is OK
- 10.5281/zenodo.3659285 is OK

MISSING DOIs

- None

INVALID DOIs

- None

@whedon
Copy link
Author

whedon commented Jul 2, 2020

@danielskatz
Copy link

👋 @jgostick, @richardjgowers, @dhhagan - Would one of you be willing to edit this submission?

@jgostick
Copy link

jgostick commented Jul 3, 2020

Sorry, I have 2 on the go and still haven't figured it all out yet.

@danielskatz
Copy link

👋 @richardjgowers, @dhhagan - Would one of you be willing to edit this submission?

@richardjgowers
Copy link

@whedon assign @richardjgowers as editor

@whedon
Copy link
Author

whedon commented Jul 19, 2020

OK, the editor is @richardjgowers

@richardjgowers
Copy link

@valeriewelborn how modified is the patched version of TINKER (https://github.com/WelbornGroup/Tinker_ELECTRIC) from the original? Also, I'm not sure but is the rehosting of modified source code allowed according to the license? (in particular points 5&6 of https://dasher.wustl.edu/tinker/downloads/license.pdf)

@danielskatz
Copy link

@whedon query scope

@whedon
Copy link
Author

whedon commented Jul 19, 2020

Submission flagged for editorial review.

@whedon whedon added the query-scope Submissions of uncertain scope for JOSS label Jul 19, 2020
@janash
Copy link

janash commented Jul 20, 2020

Hi @richardjgowers - The Tinker software has switched in recent years to be distributed via GitHub (https://github.com/TinkerTools). The modified Tinker code included here is part of a larger project at The Molecular Sciences Software Institute called the MolSSI Driver Interface (MDI). You can see the MDI (which has the same code as Tinker-ELECTRIC) here - https://github.com/MolSSI-MDI/Tinker/tree/mdi-ef . This repo has a commit history compatible with the main Tinker repository. We can change the link in the README for this repository to this fork if that is helpful.

We have had discussions with the Tinker core developers about the Tinker-MDI project and plan to eventually have this code incorporated into the main Tinker repo. We can contact the Tinker developers to ok this repo if necessary.

As far as changes to Tinker, the main changes are the addition of the MDI library (in folder mdi in Tinker-ELECTRIC). We also have modified some of the Tinker source code. We have added the file mdiserv.f to enable movement of information out of Tinker. The Tinker software calculates the electric field for each atom in order to calculate dipoles for atoms. This is usually internal and not saved. We have added the file efield.f and modified induce.f in order to capture this information. Currently, Tinker is the only software which calculates the electric field in this way. However, the software submitted here (the ELECTRIC driver), may be used eventually with other codes that can output electric field information.

Tagging @taylor-a-barnes in case I've forgotten anything.

@janash
Copy link

janash commented Aug 2, 2020

Hi @danielskatz , @richardjgowers - tagging to get the conversation going again. Is there anything you need from us to help speed up this process?

@danielskatz
Copy link

@janash - sorry for the delay - after some editor discussion, we have decided that this submission is not a large enough contribution for JOSS to review it - see https://joss.readthedocs.io/en/latest/submitting.html#substantial-scholarly-effort. We encourage you to consider JOSS at a later time if more effort and code goes into this software, or alternatively, consider publishing the software to Zenodo to get a DOI that users can cite.

@danielskatz
Copy link

@whedon reject

@whedon whedon added the rejected label Aug 3, 2020
@whedon
Copy link
Author

whedon commented Aug 3, 2020

Paper rejected.

@whedon whedon closed this as completed Aug 3, 2020
@janash
Copy link

janash commented Aug 3, 2020

Hi @danielskatz and @richardjgowers - Thank you for your consideration on this paper. In your requirements for publication, I see that you have guidelines about code length. I understand that in general a smaller number of lines of code may represent less effort. However, for this project, a small(er) number of lines indicates careful consideration and design (or effort).

I disagree that the number of lines of code for a project is reflective of the amount of scholarly effort. This project could have been much larger (more lines of code) if it did not use an efficient and systematic approach to the problem. We implemented this is a clean and compact way, and in my opinion that should not negatively reflect on the software.

This is a useful analysis tool for researchers working in molecular simulation, and enables an analysis ability that is not possible in any other existing open source software. The closed source version of the code is not openly available, actively maintained, updated, or tested.

This project, despite its appearance of small size, represents a significant effort and is not something that could not be easily replicated. This project required the use of multiple tools, programming languages, and knowledge of scientific concepts. I am outlining notable points below, though there are several other concepts and tools at play:

  1. Understanding the structure of the Tinker software package (written in Fortran). We had to decide what information was relevant, where it was stored, and how to retrieve it. The Tinker source code was modified so that information that was previously discarded was recorded and made available in a useful way.
  2. Modifying the Tinker software package to use the MolSSI Driver Interface (MDI) which allowed efficient communication of data and structures inside of the Tinker software to be available.
  3. Efficient analysis of retrieved data using pandas and parallelization of the analysis using MPI4py.

This tool adds the ability to customize that analysis by atom, molecule (as counted by Tinker) or residue (as specified in a pdb file) without the user editing code. Our implementation is robust, public, and much faster than the previous closed package which was used for analysis. Our code is also parallelized with very high efficiency, so the achievable wall times are dramatically lower than what you could get from any serial code. Our benchmarks of our parallelized code showed an order of magnitude improvement over the serial version.

In summary, we believe that this project represents significant effort appropriate for JOSS. Again, thank you for your consideration, and we hope for productive future collaboration.

@valeriewelborn
Copy link

Hi all,

I would like to second Jessica's comments and I would also argue that it follows the rule of thumb "no less than 3 months of work for an individual" (which translates to different real time depending on the skills of the coders - a good graduate student would have taken at least 6 months to produce a working code but likely not of the same quality). I also want to emphasize, as Jessica mentioned, that the calculation of electric fields from Tinker Amoeba calculations is not trivial and currently not available to the community. This would be the first time such a tool would be made open source. Using electric fields to analyze MD trajectories would benefit the community tremendously as I have demonstrated in Nature Catalysis (https://doi.org/10.1038/s41929-018-0109-2), JACS (https://doi.org/10.1021/jacs.9b05323) and ACS Catalysis (https://doi.org/10.1021/acscatal.7b03151). I also explained the underlying principles in a Chemical Reviews (https://doi.org/10.1021/acs.chemrev.8b00399).

@danielskatz
Copy link

Let's make sure we are talking about the same thing. In this repository, I see

What did you intend to submit for review?

@janash
Copy link

janash commented Aug 6, 2020

The ELECTRIC repository is what we intended to submit for review.

https://github.com/WelbornGroup/ELECTRIC/tree/master/ELECTRIC/mdi is a copy of https://github.com/MolSSI-MDI/MDI_Library. We do not intend for this to be considered in the review.

The parts of the repository specific to the ELECTRIC driver in this directory (https://github.com/WelbornGroup/ELECTRIC/tree/master/ELECTRIC), excluding MDI.

This analysis requires use of MDI-enabled Tinker (Tinker-ELECTRIC), linked in the submitted repository. You can see our version with commit history here - https://github.com/MolSSI-MDI/Tinker/tree/mdi-ef . This modification to Tinker was a large part of the work for this project. Once we were able to get the information we needed via this MDI enabled version of Tinker, the driver itself (in the ELECTRIC repository) was not too much trouble to write. We did work to make the implementation more convenient (via use of argparse, and ability to specify molecule, residue, or atom analysis) and efficient (parallelization using MPI4Py).

Concerning mentioning 'design' and LOC in my earlier comment - When we started this project, we knew in general what we wanted to calculate (the electric field), and that electric fields are calculated (in some cases) in Tinker in order to calculate multipoles. What was less clear was at what stage we retrieve information from Tinker or what calculations we should do ourselves. One option would have been to get multipole information and coordinates and do the mathematically intensive calculation outside of the software in our own code. This would have resulted in a much more complicated piece of software (probably less efficient, more prone to error, more lines of code, and yes, much more work). However, there is an existing subroutine in Tinker that does calculate the electric field as we needed it to be calculated, but it is not visited in a typical molecular dynamics run. We decided to send coordinates from a trajectory to Tinker using MDI and to force the calculation to go through this subroutine. The information we needed to retrieve was still not readily available to be passed out of Tinker using MDI as it is discarded (the electric field is typically only used as a means to calculate multipoles and not used for analysis, and is not calculated on a per atom basis under typical conditions). So, we also had to alter the Tinker source code to visit the relevant subroutine when we needed, retrieve, and pass out this information.

At this point we also had to choice to either make this into something that ran alongside the MD run or a post processing method. We chose to make this a post processing methods so that we could parallelize this extra calculation.

Once we could retrieve the electric field, it was a matter of summing the information correctly in the driver (so I understand how ELECTRIC.py doesn't look like much). Considering the size of trajectories and amount of information we retrieve from Tinker, we also made some choices in ELECTRIC.py. We read in trajectories in chunks by frame since some trajectories can be quite large. Since this is a post processing method and the trajectory frames are already generated, calculation of the electric field for each frame is independent (making it 'embarrassingly parallel'), so we have used MPI4Py to parallelize. This is, of course, much faster than doing the calculation in tandem with the simulation. The use of argparse was also strategic as these type of calculations will typically be submitted to a queuing system in a script similar to tcp.sh (included in the repository).

Just wanted to explain some of the design choices and considerations we went through during implementation that are not reflected in the LOC of the final product. None of the choices we made were clear initially. What we ended up with is a relatively small piece of code which runs quickly and delivers novel analysis for researchers in this field. As @valeriewelborn mentioned, this type of analysis has already been shown to be useful in a number of peer reviewed publications, and was previously not available to most researchers. We felt this was notable and worthy of a JOSS submission. Thanks again for your time and consideration.

@taylor-a-barnes
Copy link

@danielskatz Thank you very much for the time you’ve devoted to clarifying the nature of this submission. As the above discussion suggests, the boundaries defining exactly what constitutes this submission are not entirely trivial. To be clear, any calculation involving the ELECTRIC code involves participation of the following components:

  1. The ELECTRIC code, which we originally submitted for review.
  2. A version of the Tinker code which was modified as part of the ELECTRIC project (https://github.com/MolSSI-MDI/Tinker/tree/mdi-ef); we intend to create a PR for inclusion of these changes into the official Tinker repository. Out of the (perhaps misguided) expectation that including those modifications in the review would complicate the process, we did not originally intend for these contributions to be included in the review; however, if doing so would be useful for clarifying the scope of the project, we would be willing to have the Tinker contributions reviewed as well.
  3. The MDI Library (https://github.com/MolSSI-MDI/MDI_Library), which we use to handle efficient communication between ELECTRIC and Tinker. An internal distribution of the MDI Library is included in the ELECTRIC code, but it should not be considered as part of this submission.

I can understand that the submission, as originally presented, might not appear to fulfill the JOSS requirements for “substantial scholarly effort” (although, as @janash points out, the scholarly effort associated with the ELECTRIC code is much larger than the LOC count might imply). Nonetheless, I believe that if the ELECTRIC code and the Tinker contributions are considered in combination, we unambiguously meet the JOSS standard. Our contributions to Tinker easily exceed 1000 lines (https://github.com/MolSSI-MDI/Tinker/blob/mdi-ef/source/mdiserv.f alone is 921 lines), and both @janash and I have made meaningful contributions to the project since January, 2020. Neither the ELECTRIC code nor the Tinker contributions could be considered “one-off” modifications; in particular, the Tinker contributions have been made with the intention of supporting general communication with external codes and are not limited to supporting ELECTRIC.

The primary questions I have are as follows:

  1. Is it possible for our Tinker contributions to be considered in addition to ELECTRIC in a JOSS review? If so, are there any additional requirements we should be aware of? For example, would the Tinker contributions need to be presented in the form of a PR?
  2. Would you consider a submission that included both ELECTRIC and our Tinker contributions to be sufficient to meet the “substantial scholarly effort” requirement?

@amandadumi
Copy link

Hello @richardjgowers. Yes, this looks very interesting and well suited for my background!

@govarguz
Copy link

Hi @richardjgowers. Yes, I will be available to review this Ms./implementation. Best,

@richardjgowers
Copy link

@whedon add @amandadumi as reviewer

@whedon
Copy link
Author

whedon commented Aug 15, 2020

OK, @amandadumi is now a reviewer

@richardjgowers
Copy link

@whedon add @govaguz as reviewer

@whedon
Copy link
Author

whedon commented Aug 15, 2020

OK, @govaguz is now a reviewer

@richardjgowers
Copy link

@whedon remove @govaguz as reviewer

@whedon
Copy link
Author

whedon commented Aug 15, 2020

OK, @govaguz is no longer a reviewer

@richardjgowers
Copy link

@whedon add @govarguz as reviewer

@whedon
Copy link
Author

whedon commented Aug 15, 2020

OK, @govarguz is now a reviewer

@richardjgowers
Copy link

@whedon start review

@whedon
Copy link
Author

whedon commented Aug 15, 2020

OK, I've started the review over in #2576.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants