Using Capped distance in distance selections #2035

ayushsuhane · 2018-08-09T03:59:44Z

Fixes #974 , part of #782

Increases the performance of 'around' selection using NSgrid.

PR Checklist

Tests?
Docs?
[X ] CHANGELOG updated?
Issue raised/referenced?

ayushsuhane · 2018-08-09T04:05:44Z

As benchmark for 'around' selection is already present, I compared the three methods (exactly same functions) for that specific case. It seems that the timings have improved by an order of magnitude.
https://github.com/ayushsuhane/Benchmarks_Distance/blob/master/Notebooks/AroundBenchmark.ipynb
However, bear in mind, the improvement is for this case and not for every case.

@orbeckst mentioned adding benchmarks. While this will be reflected if we run the benchmarks, should I add more cases? Maybe PBC/nonPBC as in here (Issue #1721 ). But due to lack of time, I guess I will pick it up after GSOC period is over.

ayushsuhane · 2018-08-09T04:10:23Z

@richardjgowers @jbarnoud I am not sure how to add another method here mostly because of flags. Any suggestions to get around the problem without breaking the tests (and more specifically code). I thought its better to discuss here, how do we want to proceed. Do we want to keep earlier code and optimize the methods or do we want to replace with one or two using flags/keywords. As we are on the topic, I don't think using capped function for optimization (in its current form) is suitable here, because it does extra computations of pairs/distances which we don't need in this case.

codecov · 2018-08-09T04:57:36Z

Codecov Report

Merging #2035 into develop will decrease coverage by 0.03%.
The diff coverage is 92.3%.

@@             Coverage Diff             @@
##           develop    #2035      +/-   ##
===========================================
- Coverage    88.93%   88.89%   -0.04%     
===========================================
  Files          144      144              
  Lines        17490    17449      -41     
  Branches      2693     2696       +3     
===========================================
- Hits         15554    15512      -42     
+ Misses        1323     1322       -1     
- Partials       613      615       +2

Impacted Files	Coverage Δ
package/MDAnalysis/core/selection.py	`99.31% <92.3%> (-0.05%)`	⬇️
package/MDAnalysis/lib/pkdtree.py	`90.1% <0%> (-1.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d2e22ff...ff1605c. Read the comment docs.

orbeckst · 2018-08-09T10:43:31Z

This is looking promising already.

We want to get rid of the flags #782 so there might not remain a way for the user to choose a method. Bearing this in mind, would you recommend to switch over to NSGrid (and take the performance hit in some cases)?

orbeckst · 2018-08-09T10:45:04Z

benchmarks: While this will be reflected if we run the benchmarks, should I add more cases? Maybe PBC/nonPBC as in here (Issue #1721 ).

PBC/non-PBC would be excellent.

richardjgowers · 2018-08-09T16:28:27Z

@ayushsuhane I think ideally we'd reuse capped_distance_array, then we could simplify this greatly. The only reason I can think not to is that it calculates indices and distances, so maybe could be faster. Is this why you didn't use that?

richardjgowers · 2018-08-09T16:29:32Z

Ah just reread, you could add a return_distances kwargs (default True) to capped_distance and then it should be identical to this right? It doesn't make sense to have so much duplicated code.

ayushsuhane · 2018-08-09T17:55:46Z

@richardjgowers Yes. While this is the case with Periodic KDTree and distance array, it would not be a problem if we are using only capped_distance(.., .., method='nsgrid') as it returns all the distances and pairs, which serves the purpose here.

However, yes, adding return distance to kwargs is a good idea. I will look into it

ayushsuhane · 2018-08-09T18:05:05Z

@orbeckst For this special case of around selections, it wouldn't hurt to take a hit and replacing with nsgrid. But I think a better approach would be to use capped_distance and optimize there. So that if anyone plan to change/modify the method later, one has to change only the capped_function, and not worry about its dependencies.

That being said, there are certain drawbacks in the capped_distance, which I wrote with different assumptions and needs to be changed. For instance, _bruteforce_capped checks distance with every atoms individually in a loop, which is a bad idea for a numpy array but it can handle arbitrary size of atoms. Now that we wouldn't want to use brute force for arbitrary size and limit it only till its the fastest, it would be better to change it. Similarly for _pkdtree_capped where we have used for loops to get individual distances and pairs, which can/should be avoided.

richardjgowers

@ayushsuhane Ok I think this looks like a case where you need to eat your own dog food. You've merged in some awesome changes in lib.distances and now you're rewriting/rephrasing them in this code. AroundSelection should be blissfully unaware of how distances are calculated, it should just call some function which does it.

Maybe this leads to a few more optimisations in capped later (ie make the distances optionally returned), but this is the thing that all the capped stuff was written for.

I'm pretty sure once we go through selections and start using the capped functions there'll be a lot less lines, but it will be a lot faster too!

ayushsuhane · 2018-08-10T02:25:10Z

@richardjgowers One more thing. It seems importing any library inside the function, for instance from . import PeriodicKDTree inside _pkdtree_capped takes a large amount of time which is reflected baldly in benchmarks. Any suggestions how can we avoid it. Also, I guess its a bad practice to use local imports, isn't it? I thought we just did it for the time being, but if we are looking for optimizations, it shouldn't really be there

ayushsuhane · 2018-08-10T09:56:02Z

It seems importing doesn't take any significant time. My bad.

ayushsuhane · 2018-08-10T19:53:23Z

I think after #2041, we can move to capped_distance directly. Here is an example of choosing the best method for around selection based on the rules specified in capped_distance.

However, I have one concern in changing the codebase of around. It seems tests are written for different flags and methods for every class i.e. apply_kdtree, apply_distmat. So there are essentially two things, one is removal of flags, and another is using a single method instead of multiple _apply methods. What do you think should be done first. Any suggestions how should I go about it?

richardjgowers · 2018-08-10T20:11:42Z

@ayushsuhane I think you can just rip the flags (& associated tests) out. If someone chose kdtree in the flags and we don't respect that, they'll still get the same distances out, so it's not going to hurt anyone.

richardjgowers · 2018-08-13T13:44:59Z

@ayushsuhane this needs a quick rebase right?

…n pkdtree, added tests, updated Changlog

…ass Distance based selection + point selection except CylindricalSelection

richardjgowers · 2018-08-13T17:43:46Z

Looks solid, iirc the coverage for selections was complete, so if tests pass this is good to go

ayushsuhane · 2018-08-13T18:59:11Z

@richardjgowers Not sure what is happening here. Can you take a look? I tried restarting the build but the appveyor still fails due to encore. Also, I dont think I can restart the appveyor build

zemanj · 2018-08-14T00:34:47Z

@ayushsuhane This looks like yet another random number issue to me... I just had the same test failing on my linux box, but it passed in a subsequent run. The failing test uses analysis.encore.bootstrap.get_ensemble_bootstrap_samples() with only 10 samples, so probably that's the culprit. Maybe just give it another try?

orbeckst · 2018-08-14T01:20:32Z

@zemanj you can go to Travis and restart the job --- I don't know if the restart button shows up for @ayushsuhane.

…

Am Aug 13, 2018 um 17:34 schrieb Johannes Zeman ***@***.***>: @ayushsuhane This looks like yet another random number issue to me... I just had the same test failing on my linux box, but it passed in a subsequent run. The failing test uses analysis.encore.bootstrap.get_ensemble_bootstrap_samples() with only 10 samples, so probably that's the culprit. Maybe just give it another try? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

zemanj · 2018-08-14T01:41:09Z

@orbeckst Ok I restarted the Travis build for this PR, but now my noob question: In how far is that related to AppVeyor?

zemanj · 2018-08-14T01:45:55Z

@richardjgowers Ok I gotta remember that shortcut 😉

richardjgowers · 2018-08-14T01:46:09Z

@zemanj it’s unrelated :) I didn’t immediately see the rerun appveyor button on mobile so I turned the PR off then on again....

zemanj · 2018-08-14T01:47:35Z

That's what i thought... I don't see any option to rerun a build on AppVeyor either. Not on mobile, though.

orbeckst · 2018-08-14T17:20:35Z

Ok I restarted the Travis build for this PR, but now my noob question: In how far is that related to AppVeyor?

Sorry, I only saw #2035 (comment) on my phone and did not realize that this referred to AppVeyor.

richardjgowers requested changes Aug 9, 2018

View reviewed changes

richardjgowers self-assigned this Aug 10, 2018

ayushsuhane changed the title ~~Using NSGrid in around selections~~ Using Capped distance in distance selections Aug 11, 2018

ayushsuhane force-pushed the aroundbench branch 3 times, most recently from 4b6a0cb to 4904d4d Compare August 12, 2018 02:32

ayushsuhane added 8 commits August 13, 2018 10:03

modified capped function for faster computations, added search_tree i…

93582da

…n pkdtree, added tests, updated Changlog

updated docs, added cutoff in determine methods

b3c03dc

safety condition added for bruteforce method

69e89a0

changed determine_method

f7f6625

minor correction for failed tests

a239db5

added _apply_nsgrid to selections

37f93a8

Removed dependency on kdtree flags, substituted capped_distance in cl…

13b6ec9

…ass Distance based selection + point selection except CylindricalSelection

updated CHANGELOG

7987084

ayushsuhane force-pushed the aroundbench branch from 4904d4d to 7987084 Compare August 13, 2018 17:26

minor flake8 corrections

ff1605c

richardjgowers approved these changes Aug 13, 2018

View reviewed changes

richardjgowers closed this Aug 14, 2018

richardjgowers reopened this Aug 14, 2018

richardjgowers merged commit e396630 into MDAnalysis:develop Aug 14, 2018

ayushsuhane deleted the aroundbench branch August 14, 2018 18:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Capped distance in distance selections #2035

Using Capped distance in distance selections #2035

ayushsuhane commented Aug 9, 2018 •

edited

Loading

ayushsuhane commented Aug 9, 2018 •

edited

Loading

ayushsuhane commented Aug 9, 2018 •

edited

Loading

codecov bot commented Aug 9, 2018 •

edited

Loading

orbeckst commented Aug 9, 2018

orbeckst commented Aug 9, 2018

richardjgowers commented Aug 9, 2018

richardjgowers commented Aug 9, 2018

ayushsuhane commented Aug 9, 2018

ayushsuhane commented Aug 9, 2018

richardjgowers left a comment

ayushsuhane commented Aug 10, 2018

ayushsuhane commented Aug 10, 2018

ayushsuhane commented Aug 10, 2018

richardjgowers commented Aug 10, 2018

richardjgowers commented Aug 13, 2018

richardjgowers commented Aug 13, 2018

ayushsuhane commented Aug 13, 2018 •

edited

Loading

zemanj commented Aug 14, 2018

orbeckst commented Aug 14, 2018 via email

zemanj commented Aug 14, 2018

zemanj commented Aug 14, 2018

richardjgowers commented Aug 14, 2018

zemanj commented Aug 14, 2018

orbeckst commented Aug 14, 2018

Using Capped distance in distance selections #2035

Using Capped distance in distance selections #2035

Conversation

ayushsuhane commented Aug 9, 2018 • edited Loading

PR Checklist

ayushsuhane commented Aug 9, 2018 • edited Loading

ayushsuhane commented Aug 9, 2018 • edited Loading

codecov bot commented Aug 9, 2018 • edited Loading

Codecov Report

orbeckst commented Aug 9, 2018

orbeckst commented Aug 9, 2018

richardjgowers commented Aug 9, 2018

richardjgowers commented Aug 9, 2018

ayushsuhane commented Aug 9, 2018

ayushsuhane commented Aug 9, 2018

richardjgowers left a comment

Choose a reason for hiding this comment

ayushsuhane commented Aug 10, 2018

ayushsuhane commented Aug 10, 2018

ayushsuhane commented Aug 10, 2018

richardjgowers commented Aug 10, 2018

richardjgowers commented Aug 13, 2018

richardjgowers commented Aug 13, 2018

ayushsuhane commented Aug 13, 2018 • edited Loading

zemanj commented Aug 14, 2018

orbeckst commented Aug 14, 2018 via email

zemanj commented Aug 14, 2018

zemanj commented Aug 14, 2018

richardjgowers commented Aug 14, 2018

zemanj commented Aug 14, 2018

orbeckst commented Aug 14, 2018

ayushsuhane commented Aug 9, 2018 •

edited

Loading

ayushsuhane commented Aug 9, 2018 •

edited

Loading

ayushsuhane commented Aug 9, 2018 •

edited

Loading

codecov bot commented Aug 9, 2018 •

edited

Loading

ayushsuhane commented Aug 13, 2018 •

edited

Loading