Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check for too many grids in NSGrid #3441

Merged
merged 11 commits into from
Nov 9, 2021

Conversation

scal444
Copy link
Contributor

@scal444 scal444 commented Oct 16, 2021

The NSGrid cython code uses integer indexing, which is typically
32 bit. 3D grids with small cutoffs or giant box sizes can have
grids > 1000 per dimension, which leads to integer overflow when
trying to index the grids.

Fixes #3183

Changes made in this Pull Request:

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@codecov
Copy link

codecov bot commented Oct 16, 2021

Codecov Report

Merging #3441 (1be6f64) into develop (a33e303) will not change coverage.
The diff coverage is 100.00%.

❗ Current head 1be6f64 differs from pull request most recent head 6afc2dc. Consider uploading reports for the commit 6afc2dc to get more accurate results
Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #3441   +/-   ##
========================================
  Coverage    93.75%   93.75%           
========================================
  Files          176      176           
  Lines        23163    23163           
  Branches      3297     3297           
========================================
  Hits         21717    21717           
  Misses        1395     1395           
  Partials        51       51           
Impacted Files Coverage Δ
package/MDAnalysis/lib/nsgrid.pyx 97.65% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a33e303...6afc2dc. Read the comment docs.

@scal444
Copy link
Contributor Author

scal444 commented Oct 16, 2021

I'm a bit confused by the coverage output - it's not really possible that this change could have made such a delta. I'd suspect that some test might be failing now (and therefore not covering other things it would have), but the mdanalysis-CI seems to pass.

@IAlibay
Copy link
Member

IAlibay commented Oct 16, 2021

I'm a bit confused by the coverage output - it's not really possible that this change could have made such a delta. I'd suspect that some test might be failing now (and therefore not covering other things it would have), but the mdanalysis-CI seems to pass.

It's just complaining because the majority of CI hasn't run yet, so the diff is off.

Actions has this whole limit first time contributors thing and it's only recently that they introduced relaxed rules so we've not changed it back

Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! One small nitpick.

testsuite/MDAnalysisTests/lib/test_nsgrid.py Outdated Show resolved Hide resolved
Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for working on this @scal444 - just of comments from a quick review.

package/CHANGELOG Show resolved Hide resolved
testsuite/MDAnalysisTests/lib/test_nsgrid.py Outdated Show resolved Hide resolved
testsuite/MDAnalysisTests/lib/test_nsgrid.py Outdated Show resolved Hide resolved
testsuite/MDAnalysisTests/lib/test_nsgrid.py Outdated Show resolved Hide resolved
package/MDAnalysis/lib/nsgrid.pyx Show resolved Hide resolved
package/MDAnalysis/lib/nsgrid.pyx Outdated Show resolved Hide resolved
Copy link
Member

@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably out of scope for PR, but we may also want to use Py_ssize_t for indexing:
https://stackoverflow.com/q/20987390/2942522

I probably need to follow that advice more often myself (using int is pretty common indeed though).

@hmacdope
Copy link
Member

hmacdope commented Oct 17, 2021 via email

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't have to raise an error. The algorithm can still work by capping the number of boxes to this limit.

@scal444
Copy link
Contributor Author

scal444 commented Oct 17, 2021

This doesn't have to raise an error. The algorithm can still work by capping the number of boxes to this limit.

Good point. I wonder if we should still fail, since it's over a billion cells and very unlikely to be performant. Not sure where this team's preference lies between automatically adjusting parameters vs failing and having the user adjust themselves. I guess it makes sense for us to do it, given that it won't affect the algorithm accuracy to adjust the grid size upwards, but now there's the question of what to cap it at? My first thought is to increase the grid size so that there's at least an average of x particles per grid, but we might need to do some research on what a good limit is.

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're up at ~1,000 cells a side, and even if we're searching for bonds (so r=1A), you've got a box with 100nm sides. This probably has about 100 million atoms in it? I'm not sure brute force methods are going to be more performant here.

I think at best this is a warning that the number of cells got truncated and that the devs should finally write some templates/fused types to handle the ultra large case.

package/MDAnalysis/lib/nsgrid.pyx Show resolved Hide resolved
@richardjgowers
Copy link
Member

That said, that all assumed heterogenous positions which might not be true. If a ratio of particles per cell fell below some heuristic threshold, then switching to a different algorithm (sparse cell grid thing) might well be a good idea.

@scal444
Copy link
Contributor Author

scal444 commented Oct 19, 2021

This probably has about 100 million atoms in it

The example in the original issue is a counterexample here, where a box with a smaller number of atoms is inflated. The grid is set based off of the box size and cutoff, so you end up with more grids than particles to pack onto the grid. I was suggesting something like increasing the grid size to at least achieve some reasonable particle density.

@scal444
Copy link
Contributor Author

scal444 commented Oct 20, 2021

In the interest of moving this along, how about I implement the suggestion to cap the number of boxes, and we can revisit for optimizations. I'll see how this affects the comments regarding unit tests and apply whatever suggestions are still relevant.

@richardjgowers
Copy link
Member

richardjgowers commented Oct 20, 2021 via email

@pep8speaks
Copy link

pep8speaks commented Oct 24, 2021

Hello @scal444! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 73:80: E501 line too long (80 > 79 characters)

Comment last updated at 2021-11-09 15:09:32 UTC

@scal444
Copy link
Contributor Author

scal444 commented Oct 24, 2021

RE pep8 violations - if I run autopep8 it flags other problems in the same file. What's the project's stance on refactoring? In general, guess it's better to make a preliminary cleanup change, but here should I just add in the other style changes to the same PR? Or just change the affected lines?

@lilyminium
Copy link
Member

lilyminium commented Oct 24, 2021 via email

@hmacdope
Copy link
Member

Thanks for incorporating feedback @scal444! I will approve workflow run, its blocked because you are a first time contributor.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of extra changes on my end.

package/MDAnalysis/lib/nsgrid.pyx Show resolved Hide resolved
testsuite/MDAnalysisTests/lib/test_nsgrid.py Show resolved Hide resolved
Kevin Boyd added 4 commits October 26, 2021 07:53
The NSGrid cython code uses integer indexing, which is typically
32 bit. 3D grids with small cutoffs or giant box sizes can have
grids > 1000 per dimension, which leads to integer overflow when
trying to index the grids.

Fixes MDAnalysis#3183
Rather than throwing an exception, the number of grids is capped,
not affecting the algorithm output
@richardjgowers
Copy link
Member

richardjgowers commented Oct 26, 2021 via email

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we skip the test on CI please? Otherwise looks good.

@IAlibay
Copy link
Member

IAlibay commented Oct 27, 2021

Can we skip the test on CI please? Otherwise looks good.

We'll want to make this skip on more than just CI - we can't make an 8 GB memory test run on folk's laptops, it's going to be a nightmare for folks on lower end workstations and laptops.

@scal444
Copy link
Contributor Author

scal444 commented Oct 27, 2021

We'll want to make this skip on more than just CI - we can't make an 8 GB memory test run on folk's laptops, it's going to be a nightmare for folks on lower end workstations and laptops.

Alternatively - could we patch the max grid dim variable in the test to bring it down to something smaller? We'd still be testing the same error case, and it won't change the rest of the algorithm. Not sure how to override cython constants in python unit tests but there's probably something we could do.

@richardjgowers
Copy link
Member

We'll want to make this skip on more than just CI - we can't make an 8 GB memory test run on folk's laptops, it's going to be a nightmare for folks on lower end workstations and laptops.

Alternatively - could we patch the max grid dim variable in the test to bring it down to something smaller? We'd still be testing the same error case, and it won't change the rest of the algorithm. Not sure how to override cython constants in python unit tests but there's probably something we could do.

The constraint is going to be very baked in. Cython goes sideways into C++ then gets compiled, so the constraint is a constant somewhere in the object code. For the same reason, I'd rather not have it as a user variable.

@hmacdope
Copy link
Member

hmacdope commented Oct 28, 2021 via email

@richardjgowers
Copy link
Member

richardjgowers commented Oct 28, 2021 via email

@hmacdope
Copy link
Member

I was thinking something like:

DEF test_ns_grid_max = 1 # set on compile, only to run  on special occasions, like Christmas

IF test_ns_grid_max
 DEF MAX_DIM=reasonable_value_to_test
ELSE
 DEF MAX_DIM= 1290

but Im probably missing something / its super ugly / disabling on CI is much cleaner.

@richardjgowers
Copy link
Member

I was thinking something like:

DEF test_ns_grid_max = 1 # set on compile, only to run  on special occasions, like Christmas

IF test_ns_grid_max
 DEF MAX_DIM=reasonable_value_to_test
ELSE
 DEF MAX_DIM= 1290

but Im probably missing something / its super ugly / disabling on CI is much cleaner.

Yeah this would work, but you'd be reinstalling the entire package for a single test. I think it's probably fine not to test for this (unless someone is actively tinkering with the algorithm..).

@richardjgowers richardjgowers mentioned this pull request Oct 29, 2021
5 tasks
@IAlibay
Copy link
Member

IAlibay commented Oct 29, 2021

Yeah this would work, but you'd be reinstalling the entire package for a single test. I think it's probably fine not to test for this (unless someone is actively tinkering with the algorithm..).

I think we're overthinking this. As @richardjgowers first pointed out, let's just guard the test with an env variable, set it in our macOS runners (which for now have >10 GB ram), and document it somewhere - job done.

@hmacdope
Copy link
Member

Good point @IAlibay, sorry for muddying the waters, much better to just block test.

Enable test in macos runners which have sufficient memory
@scal444
Copy link
Contributor Author

scal444 commented Oct 31, 2021

Test is now guarded by an env variable

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the one thing on my end. Aside from @richardjgowers' comment, everything else looks good. Thanks @scal444 !

testsuite/MDAnalysisTests/lib/test_nsgrid.py Outdated Show resolved Hide resolved
Used parentheses for string concatenation and a bool check converter
Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! Thanks @scal444

IAlibay
IAlibay previously requested changes Nov 8, 2021
Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a final PEP8 thing please.

testsuite/MDAnalysisTests/lib/test_nsgrid.py Show resolved Hide resolved
testsuite/MDAnalysisTests/lib/test_nsgrid.py Outdated Show resolved Hide resolved
@richardjgowers richardjgowers merged commit 735abb8 into MDAnalysis:develop Nov 9, 2021
@richardjgowers
Copy link
Member

@scal444 thanks for fixing this and congrats on getting your first PR on the board!

@IAlibay IAlibay added the defect label Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NSGrid segfaults for large systems and small cutoff
7 participants