Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uses faster distance evaluations in Individual methods in Capped Function #2041

Merged
merged 7 commits into from
Aug 13, 2018

Conversation

ayushsuhane
Copy link
Contributor

Hi, So earlier the methods were written to handle large data sets, for instance loop over every coordinate in distance_array, which is not very pythonic as well as looping over numpy array is also not a good idea if the main motive is to increase the performance. Now that we have the framework in place, and since other methods are better after a certain size of data (which is easily handled by sparse matrix), it is better to switch to populating the full MXN matrix without worrying about the memory constraints.

Changes made in this Pull Request:

  • Modified _bruteforce_capped, _pkdtree_capped, _bruteforce_capped_self for faster distance evaluations
  • Added keyword return_distances in capped_function which calculates distance between pairs only when it is desired (therefore decreasing the number of computations where we are concerned only with indices)
  • Added search_tree method in lib.pkdtree which directly gives the pairs between a query and search dataset (and is faster than searching over every indices)

PR Checklist

  • [x ] Tests?
  • [x ] Docs?
  • [x ] CHANGELOG updated?
  • Issue raised/referenced?

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, just a few questions

def search_tree(self, centers, radius):
"""
Searches all the pairs within radius between ``centers``
and ``coords``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it clear that coords are the coordinates already in this tree (right?)

"""
Searches all the pairs within radius between ``centers``
and ``coords``

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also maybe add a note that internally this is building a new kdtree, so people don't think they can be clever and do that themselves to optimise


pairs, dist = [], []
k, count = 0, 0
for i in range(N):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks ugly. Isn't it better to make the NxM array then fill it using np.triu indices and work from there?

@codecov
Copy link

codecov bot commented Aug 10, 2018

Codecov Report

Merging #2041 into develop will decrease coverage by <.01%.
The diff coverage is 95%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2041      +/-   ##
===========================================
- Coverage     88.9%   88.89%   -0.01%     
===========================================
  Files          143      143              
  Lines        17386    17414      +28     
  Branches      2665     2674       +9     
===========================================
+ Hits         15457    15481      +24     
- Misses        1321     1323       +2     
- Partials       608      610       +2
Impacted Files Coverage Δ
package/MDAnalysis/lib/distances.py 88.11% <100%> (+0.22%) ⬆️
package/MDAnalysis/lib/pkdtree.py 91.2% <80.95%> (-3.16%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 388346d...f0f0043. Read the comment docs.

@richardjgowers richardjgowers merged commit d2e22ff into MDAnalysis:develop Aug 13, 2018
@ayushsuhane ayushsuhane deleted the modify-capped branch August 14, 2018 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants