-
Notifications
You must be signed in to change notification settings - Fork 665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NSGrid segfaults for large systems and small cutoff #3183
Comments
IMHO we need an intelligent solution for this, not just a simple cap on the number of grid cells. |
@richardjgowers @zemanj is this issue fixed in 1.1.1/2.0.0? |
Nope, the issue still persists (at least on develop). |
can confirm is still segfaults on 1.1.1 too |
Probably to add to the conversation here, there's a really interesting thing where if you put the cutoff to either 2 or 3.5 you get the following ValueError Traceback (most recent call last)
~/software/anaconda/python3.6/2019.7/envs/mda-1.1.1/lib/python3.8/site-packages/numpy/core/numeric.py in full(shape, fill_value, dtype, order, like)
340 fill_value = asarray(fill_value)
341 dtype = fill_value.dtype
--> 342 a = empty(shape, dtype, order)
343 multiarray.copyto(a, fill_value, casting='unsafe')
344 return a
ValueError: negative dimensions are not allowed
Exception ignored in: 'MDAnalysis.lib.nsgrid.FastNS._pack_grid'
Traceback (most recent call last):
File "/biggin/b131/bioc1523/software/anaconda/python3.6/2019.7/envs/mda-1.1.1/lib/python3.8/site-packages/numpy/core/numeric.py", line 342, in full
a = empty(shape, dtype, order)
ValueError: negative dimensions are not allowed |
Segfault is in initial _pack_grid call, at self.next_id[i] = self.head_id[j] It may be that is giving something bad, but the grid sizes seemed weird, so there may be an upstream issue. I can dig into this |
OK I believe this is integer overflow. The inflated setup with the given cutoff leads to more cells than the limits of 32-bit integers. The overflow math leads to incorrectly small structures like head_id. |
I started converting most of the ints in the FastNS cython file to long long, but then stopped. I think it's not reasonable to run this algorithm with these given input and cutoff dimensions - for a system with 100 particles, we're attempting to build a grid with over 5 billion elements. Additionally, c subroutines like _pbc_ortho have ints as their API boundaries, so we have the possibility of truncating anyway. I suggest a bounds check to make sure the grid dimensions are within the range of int32 arithmetic, with a message suggesting increasing the cutoff distance if the input is going to give us out-of-bounds grid sizes. Further down the road, we could improve heuristics around brute-force vs nsgrid algorithm selection. |
what's the box size and cutoff combination you are using to trigger this? I think an easy fix for this is just to limit the number of cells in any direction to cube root(INT32_MAX). |
I used the original bug description to reproduce. Your suggested fix is what I had in mind as well. |
The NSGrid cython code uses integer indexing, which is typically 32 bit. 3D grids with small cutoffs or giant box sizes can have grids > 1000 per dimension, which leads to integer overflow when trying to index the grids. Fixes MDAnalysis#3183
The NSGrid cython code uses integer indexing, which is typically 32 bit. 3D grids with small cutoffs or giant box sizes can have grids > 1000 per dimension, which leads to integer overflow when trying to index the grids. Fixes MDAnalysis#3183
Expected behavior
FastNS
never segfaults, no matter how strange its input may be.Actual behavior
FastNS
segfaults if the box is large and the cutoff small.The reason is that the number of grid cells is not limited in
nsgrid.pyx
.Code to reproduce the behavior
Output:
Current version of MDAnalysis
The text was updated successfully, but these errors were encountered: