Skip to content

Commit

Permalink
finalized distances GSOC2018 blog post
Browse files Browse the repository at this point in the history
  • Loading branch information
orbeckst committed Nov 29, 2018
1 parent 2f35a47 commit 5142594
Showing 1 changed file with 17 additions and 5 deletions.
22 changes: 17 additions & 5 deletions _posts/2018-11-28-gsoc18-distancesearch.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,21 @@
---
layout: post
title: GSOC 2018: Improvements in distance search methods
title: "GSOC 2018: Improvements in distance search methods"
---

We are pleased to announce another successful year of [Google Summer of Code][] with the [NumFOCUS][] organization,
thanks to [Richard Gowers][] and [Jonathan Barnoud][] for mentoring the GSoC students.
This year one of the projects was to improve the performance of pairwise distance computations, which is used quite frequently in MDAnalysis in different forms.
MDAnalysis v0.19.0 and higher include the new functions [`MDAnalysis.lib.distances.capped_distance`][] and [`MDAnalysis.lib.distances.self_capped_distance`][]
This year one of the projects was [to improve the performance of pairwise distance computations][], which is used quite frequently in MDAnalysis in different forms.

MDAnalysis v0.19.0 and higher now include the _new functions [`MDAnalysis.lib.distances.capped_distance`][] and [`MDAnalysis.lib.distances.self_capped_distance`][]_
which offer a much faster way to calculate all pairwise distances up to a certain maximum distance.
By only considering distances up to a certain maximum, we can use various algorithms to optimise the number of pairwise comparisons that are performed.
Behind the scenes, these functions are using one of three different algorithms:
[bruteforce][] which is a naive pairwise distance calculation algorithm,
[pkdtree][] which is a wrapper method around Scipy's KD tree search algorithm
and [nsgrid][] which is an implementation of cell-list algorithm.
This last algorithm uses the new ``MDAnalysis.lib.nsgrid`` module which was implemented with the help of [Sebastien Buchoux][].
This last algorithm uses the new [`MDAnalysis.lib.nsgrid`][] module which was implemented with the help of [Sebastien Buchoux][].

For more information on these algorithms the reader is encouraged to read @ayushsuhane's [blog], which includes a comparison of these approaches and their performance in different conditions.


Expand Down Expand Up @@ -80,20 +82,30 @@ Finally, this function can be tested with ``capped_distance`` to check the perfo

## Performance improvements

As a final note, we managed to get a speed improvement of
- ~ 2-3 times in Radial Distribution Function computation,
- ~ 10 times in identification of bonds, and
- ~ 10 times in distance based selections for the already existing benchmarks in MDAnalysis.

The performance is also found to improve with larger datasets but is not reported in benchmarks. Any motivated reader is welcome to submit their feedbacks about the performance of the above-mentioned functions on their data, and/or a benchmark which we would be happy to showcase to the world.

This was a flavor of what work was done during GSoC'18. Apart from performance improvements, it is envisioned that this internal functionality will reduce the burden from the user to understand all the technical details of distance search algorithms and instead allow a user to focus on their analysis, as well as allow future developers to easily implement any new algorithm which can exceed the present performance benchmarks.

As a final note, we managed to get a speed improvement of ~ 2-3 times in Radial Distribution Function computation, ~ 10 times in identification of bonds, and ~ 10 times in distance based selections for the already existing benchmarks in MDAnalysis. The performance is also found to improve with larger datasets but is not reported in benchmarks. Any motivated reader is welcome to submit their feedbacks about the performance of the above-mentioned functions on their data, and/or a benchmark which we would be happy to showcase to the world.

[Ayush Suhane][], [Richard Gowers][]

[Google Summer of Code]: https://summerofcode.withgoogle.com/projects/#5050592943144960
[NumFOCUS]: https://numfocus.org/
[Ayush Suhane]: https://github.com/ayushsuhane
[to improve the performance of pairwise distance computations]: {% post_url 2018-04-26-gsoc-students %}#ayush-suhane-improve-distance-search-methods-in-mdanalysis
[`MDAnalysis.lib.distances.capped_distance`]: https://www.mdanalysis.org/docs/documentation_pages/lib/distances.html#MDAnalysis.lib.distances.capped_distance
[`MDAnalysis.lib.distances.self_capped_distance`]: https://www.mdanalysis.org/docs/documentation_pages/lib/distances.html#MDAnalysis.lib.distances.self_capped_distance
[report]: https://gist.github.com/ayushsuhane/fd114cda20e93b0f61a8acb6d25d3276
[bruteforce]: http://www.csl.mtu.edu/cs4321/www/Lectures/Lecture%206%20-%20Brute%20Force%20Closest%20Pair%20and%20Convex%20and%20Exhausive%20Search.htm
[pkdtree]: https://en.wikipedia.org/wiki/K-d_tree
[nsgrid]: https://en.wikipedia.org/wiki/Cell_lists
[blog]: https://ayushsuhane.github.io/
[`MDAnalysis.lib.nsgrid`]: https://www.mdanalysis.org/docs/documentation_pages/lib/nsgrid.html
[Sebastien Buchoux]: https://github.com/seb-buch
[Richard Gowers]: https://github.com/richardjgowers
[Jonathan Barnoud]: https://github.com/jbarnoud

0 comments on commit 5142594

Please sign in to comment.