Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSoC Blog Post : Improved Distance search methods #90

Merged
merged 13 commits into from
Nov 29, 2018

Conversation

ayushsuhane
Copy link
Contributor

Resolves #89

This is the initial draft of the blog post. I have tried to motivate the necessity to include functions and template to extend the usage of these functions which were implemented during GSoC. While for all other information, I have directed the reader to the blogs that were written during the time of GSoC. We can change the draft, once I receive feedback from the core developers.

@richardjgowers richardjgowers self-assigned this Sep 7, 2018
@richardjgowers
Copy link
Member

Hey sorry @ayushsuhane just saw this, I'll review it soon!

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayushsuhane sorry for the long delay. This is already looking very nice and I'd be happy to get this up soon, before 0.19.0 comes out. Please see my inline comments for some minor revisions.


One of the major bottleneck in various analysis routines in MDAnalysis (and typically in Molecular Dynamics studies) is the evaluation of pairwise distances among the particles. The primary problem revolves around fixed radius neighbor search algorithms. MDAnalysis offers a suite of algorithms including brute force method, tree-based binary search algorithms to solve such problems. While these methods are suitable for a variety of analysis functions using pairwise distances in MDAnalysis, one of the question was whether one can improve the performance of distance calculations using other established neighbor search methods.

This question led to the inception of Google Summer of Code [project][] with [NumFOCUS][]. [Ayush Suhane][] completed the project and was able to demonstrate performance improvements for specific cases of distance selections, identification of bonds and Radial distribution function in the analysis module of MDAnalysis. More details on the commit history, PR's and blog posts can be found in the final [report][] submitted to GSoC. Real-time benchmarks for specific modules in MDAnalysis can be found [here](https://www.mdanalysis.org/benchmarks/).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayushsuhane Could you add a summary of the scientific results from the report, basically what did you find out? A key graph would be great here, e.g. something that shows how the different algorithms scale with problem size. What is the most important result from your work? This should be here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, you want to lead your readers to the conclusion that capped_distances is a valuable contribution.


This question led to the inception of Google Summer of Code [project][] with [NumFOCUS][]. [Ayush Suhane][] completed the project and was able to demonstrate performance improvements for specific cases of distance selections, identification of bonds and Radial distribution function in the analysis module of MDAnalysis. More details on the commit history, PR's and blog posts can be found in the final [report][] submitted to GSoC. Real-time benchmarks for specific modules in MDAnalysis can be found [here](https://www.mdanalysis.org/benchmarks/).

The major highlight of the project is the introduction of ``capped_distance`` which allows automatic selection of methods based on predefined set of rules to evaluate pairs of atoms in the neighborhood of any particle. It allows a user-friendly interface for the developers to quickly implement any new algorithm throughout MDAnalysis modules. To test any new algorithm, one must comply with the following protocol:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to the docs – use the devdocs https://www.mdanalysis.org/mdanalysis/documentation_pages/lib/distances.html#MDAnalysis.lib.distances.capped_distance

The major highlight of the project is the introduction of capped_distance(), which allows automatic selection ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, briefly describe the methods that are available and how they are chosen, and what their performances are. That's more interesting to the user than how to add new ones.

```
That's it. The new method is ready to be tested across functions which use ``capped_distance``. For any specific application, it can be called as ``capped_distance(ref, conf, max_dist, method=newmethod)`` from the function.

As mentioned above, MDAnalysis offers support of three different algorithms namely [bruteforce][] which is a naive pairwise distance calculation algorithm and implemented in MDAnalysis even for parallel execution, [pkdtree][] is a wrapper method around binary tree search algorithm, [nsgrid][] is an implementation of cell-list algorithm. During the tenure of GSoC'18, an additional method ``nsgrid`` is implemented in MDAnalysis with the help of [Sebastien Buchoux][]. For more information, the reader is encouraged to read the [blog], which include detailed information about different algorithms and their implementation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move it up, summarize key results, then link to details


As mentioned above, MDAnalysis offers support of three different algorithms namely [bruteforce][] which is a naive pairwise distance calculation algorithm and implemented in MDAnalysis even for parallel execution, [pkdtree][] is a wrapper method around binary tree search algorithm, [nsgrid][] is an implementation of cell-list algorithm. During the tenure of GSoC'18, an additional method ``nsgrid`` is implemented in MDAnalysis with the help of [Sebastien Buchoux][]. For more information, the reader is encouraged to read the [blog], which include detailed information about different algorithms and their implementation.

While implementing any new algorithm for Molecular dynamics trajectories, one additional requirement is to handle the periodic boundary conditions. A combination of versatile function ``augment_coordinates`` and ``undo_augment`` can be used with any algorithm to handle PBC. The main idea is to extend the box by generating duplicate particles in the vicinity of the box by ``augment_coordinates``. These duplicates, as well as the original particles, can now be used with any algorithm to evaluate the nearest neighbors. After the operation, the duplicate particles can be reverted back to their original particle indices using ``undo_augment``. These functions are available in ``MDAnalysis.lib._augment``. We encourage the interested readers to try different algorithms using these functions. Hopefully, you can help us improve the performance further with your feedback. As a starting point, the skeleton to enable PBC would take the following form:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, first summarize what's there, only then talk about extending.


This was a flavor of what work was done during GSoC'18. Apart from performance improvements, it is envisioned that this internal functionality will reduce the burden from the user to understand all the technical details of distance search algorithms and instead allow a user to focus on their analysis, as well as allow future developers to easily implement any new algorithm which can exceed the present performance benchmarks.

As a final note, we managed to get an improvement of ~ 2-3 times in Radial Distribution Function computation, ~ 10 times in identification of bonds, and ~ 10 times in distance based selections for the already existing benchmarks in MDAnalysis. The performance is also found to improve with larger datasets but is not reported in benchmarks. Any motivated reader is welcome to submit their feedbacks about the performance of the above-mentioned functions on their data, and/or a benchmark which we would be happy to showcase to the world.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to the InterRDF module and function

@orbeckst
Copy link
Member

orbeckst commented Oct 8, 2018

@richardjgowers can I please ask you to shepherd the write-up to completion? Thanks!

@orbeckst orbeckst mentioned this pull request Oct 9, 2018
21 tasks
@orbeckst
Copy link
Member

@ayushsuhane could you please try to find a little bit of time to finish the post here? It is almost there and we need it to get the 0.19.0 post out.

Have a look at the comments. You can address them by either doing what is suggested or adding a comment why you think that this is not necessary/not useful.

Many thanks!

@orbeckst
Copy link
Member

orbeckst commented Nov 7, 2018

ping @ayushsuhane --- could you please spare half an hour of your time to finish the post? We really want to get a release announcement out and your post should precede it.

Thanks!

@ayushsuhane
Copy link
Contributor Author

Hi @orbeckst, message received. Sorry for the delay. When are you planning to announce the release? I am away for a few more days. However, I will try to finish it ASAP.

@orbeckst
Copy link
Member

orbeckst commented Nov 9, 2018 via email

richardjgowers and others added 2 commits November 16, 2018 10:53
- added sub-headings
- added links
- minor text changes
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some smallish changes.

Whatever I said before is still valid and the blog post could definitely be improved but this has been sitting here for so long that I'd rather publish it as is than wait longer.

@ayushsuhane if you want to do something then do it in the next two days.

_posts/2018-11-16-gsoc18-distancesearch.md Outdated Show resolved Hide resolved
@orbeckst
Copy link
Member

I am finalizing and updating the date to today and will post it in a few minutes.

@orbeckst orbeckst merged commit 8c02a3f into MDAnalysis:master Nov 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants