-
Notifications
You must be signed in to change notification settings - Fork 657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uses faster distance evaluations in Individual methods in Capped Function #2041
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, just a few questions
def search_tree(self, centers, radius): | ||
""" | ||
Searches all the pairs within radius between ``centers`` | ||
and ``coords`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make it clear that coords
are the coordinates already in this tree (right?)
""" | ||
Searches all the pairs within radius between ``centers`` | ||
and ``coords`` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also maybe add a note that internally this is building a new kdtree, so people don't think they can be clever and do that themselves to optimise
package/MDAnalysis/lib/distances.py
Outdated
|
||
pairs, dist = [], [] | ||
k, count = 0, 0 | ||
for i in range(N): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks ugly. Isn't it better to make the NxM array then fill it using np.triu
indices and work from there?
Codecov Report
@@ Coverage Diff @@
## develop #2041 +/- ##
===========================================
- Coverage 88.9% 88.89% -0.01%
===========================================
Files 143 143
Lines 17386 17414 +28
Branches 2665 2674 +9
===========================================
+ Hits 15457 15481 +24
- Misses 1321 1323 +2
- Partials 608 610 +2
Continue to review full report at Codecov.
|
…n pkdtree, added tests, updated Changlog
4607fb3
to
6678b59
Compare
Hi, So earlier the methods were written to handle large data sets, for instance loop over every coordinate in
distance_array
, which is not very pythonic as well as looping over numpy array is also not a good idea if the main motive is to increase the performance. Now that we have the framework in place, and since other methods are better after a certain size of data (which is easily handled by sparse matrix), it is better to switch to populating the fullMXN
matrix without worrying about the memory constraints.Changes made in this Pull Request:
return_distances
in capped_function which calculates distance between pairs only when it is desired (therefore decreasing the number of computations where we are concerned only with indices)search_tree
method in lib.pkdtree which directly gives the pairs between a query and search dataset (and is faster than searching over every indices)PR Checklist