-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Surface Interpolation Function [Fixes Issue45] #52
Conversation
Hello @ojeda-e! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2021-08-07 22:39:35 UTC |
membrane_curvature/surface.py
Outdated
# interpolate values in array | ||
interpolated_array = np.interp(index_array, | ||
np.flatnonzero(~mask_nans), | ||
array_surface[~mask_nans]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since mask_nans
is used twice in inverted form, and never in its original form, perhaps it is worth it to invert at the original isnan
location rather than inverting the original value twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely missed it and it's a good point. You mean, having this instead, correct?
mask_nans = ~np.isnan(array_surface) #invert original
index_array = np.arange(array_surface.shape[0])
interpolated_array = np.interp(index_array,
np.flatnonzero(mask_nans), #not inverted
array_surface[mask_nans]). #not inverted
]) | ||
def test_surface_interpolation(dummy_surface, expected_interpolated_surface): | ||
surface = surface_interpolation(dummy_surface) | ||
assert_almost_equal(surface, expected_interpolated_surface) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For arrays of floats, the docstring for assert_almost_equal
suggests using i.e., assert_allclose
these days for better consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MDAnalysis uses assert_almost_equal
in preference to assert_array_almost_equal
. assert_allclose
might be more recent than when we started using numpy tests so MDAnalysis might just be outdated and should be switching eventually. Follow @tylerjereddy 's advice :-).
The pushed changes include:
I know |
@lilyminium sorry, it's my fault because I made a bit of a mess here. The problem is in |
I think so. Even if you know yourself that the 400% difference is small because the real value is small, it communicates to future users that you expect the degree of difference to be around 0.0000001. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests here are a good start! However I think you definitely need to vary your values (maybe by doing tests on your added data files) for robust checking. I am already reasonably certain that self.results.average_interpolated_z_surface
etc. are not calculated the way you want them to, which wouldn't have been caught with current test cases (although there aren't actually tests for those yet -- they need some too!).
|
||
def _conclude(self): | ||
self.results.average_z_surface = np.nanmean(self.results.z_surface, axis=0) | ||
self.results.average_mean = np.nanmean(self.results.mean, axis=0) | ||
self.results.average_gaussian = np.nanmean(self.results.gaussian, axis=0) | ||
if self.interpolation: | ||
self.results.average_interpolated_z_surface = np.nanmean( | ||
self.results.interpolated_z_surface[self._frame_index], axis=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.results.interpolated_z_surface[self._frame_index], axis=0) | |
self.results.interpolated_z_surface, axis=0) |
And with the others too -- if you specify frame_index
, you'll wind up only getting the mean of the last frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need np.nanmean
? Do you expect np.nan
values in your interpolated surfaces? If not, using np.mean
will reveal errors in case there ever are np.nan
values.
|
||
def interpolation_by_array(array_surface): | ||
""" | ||
Interpolates values contained in `array_surface` over axis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Over which axis?
mask_nans = ~np.isnan(array_surface) | ||
|
||
# index of array_surface | ||
index_array = np.arange(array_surface.shape[0]) | ||
|
||
# interpolate values in array | ||
interpolated_array = np.interp(index_array, | ||
np.flatnonzero(mask_nans), | ||
array_surface[mask_nans]) | ||
|
||
return interpolated_array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite understanding this. mask_nans
is 2D, with shape (n_x, n_y). index_array
is 1D, with shape n_x
. np.flatnonzero(mask_nans)
and array_surface[mask_nans]
will be 1D, of length between 0 to n_x * n_y. So I think that means that you are only ever operating on the first row of array_surface
?
I guess my question is then, why is array_surface
2D? Why not just pass in one row at a time? In addition, as it is a 2D surface, why use a 1D interpolation function? Why not a 2D one?
""" | ||
|
||
interpolated_surface = np.apply_along_axis(interpolation_by_array, axis=0, arr=array_surface) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see that you are applying interpolation_by_array
by row, then. In that case I suggest amending the documentation of interpolation_by_array
to specify that the input is 1-dimensional with length n_x
, instead of n_x, n_y
. Although I think doing 2D interpolation would be less arbitrary (choosing which axis is x and which is y is basically a random, right?)
(np.array(([150., 150., 150.], | ||
[150., np.nan, 150.], | ||
[150., 150., 150.])), | ||
np.full((3, 3), 150.)), | ||
# array 3x4 with all 150 and two nans | ||
(np.array([[150., 150, 150., 150.], | ||
[150., np.nan, np.nan, 150.], | ||
[150., 150., 150., 150.]]), | ||
np.full((3, 4), 150.)), | ||
# array 4x4 with all 150 and two nans | ||
(np.array([[150., 150, 150., 150.], | ||
[150., np.nan, np.nan, 150.], | ||
[150., 130., 140., 150.], | ||
[150., 150., 150., 150.]]), | ||
np.array([[150., 150, 150., 150.], | ||
[150., 140., 145., 150.], | ||
[150., 130., 140., 150.], | ||
[150., 150., 150., 150.]])), | ||
# array 3x3 with lots of nans | ||
(np.array([[np.nan, np.nan, 150.], | ||
[150, np.nan, 150.], | ||
[np.nan, 150., np.nan]]), | ||
np.full((3, 3), 150.)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add some tests with more interesting values? e.g. in the one below, the 145 number could have come from interpolating either along the x-axis or the y-axis. In order to be diagnostic of potential current and future bugs, I'd be interested in seeing one where using 1D interpolation gives different results if it's run across the x-axis, vs. the y-axis.
(np.array([[150., 150, 150., 150.],
[150., np.nan, np.nan, 150.],
[150., 130., 140., 150.],
[150., 150., 150., 150.]]),
np.array([[150., 150, 150., 150.],
[150., 140., 145., 150.],
[150., 130., 140., 150.],
[150., 150., 150., 150.]])),
@pytest.mark.parametrize('dim_x, dim_y, x_bins, y_bins, dummy_array, expected_interp_surf', [ | ||
# array 3x3 with all 150 and one nan | ||
(300, 300, 3, 3, np.array([[0., 0., 150.], [100., 0., 150.], [200., 0., 150.], | ||
[0., 100., 150.], [100., 100., np.nan], [200., 100., 150.], | ||
[0., 200., 150.], [100., 200., 150.], [200., 200., 150.]]), | ||
np.full((1, 3, 3), 150.)), | ||
# array 3x3 with all 150 and one nan | ||
(300, 300, 3, 3, np.array([[0., 0., 150.], [100., 0., 150.], [200., 0., 150.], | ||
[0., 100., 150.], [100., 100., np.nan], [200., 100., 150.], | ||
[0., 200., np.nan], [100., 200., 150.], [200., 200., 150.]]), | ||
np.full((1, 3, 3), 150.)), | ||
# array 3x4 with all 150 and three nans | ||
(300, 400, 3, 4, np.array([[0., 0., 150.], [100., 0., 150.], [200., 0., 150.], | ||
[0., 100., 150.], [100., 100., np.nan], [200., 100., 150.], | ||
[0., 200., 150], [100., 200., np.nan], [200., 200., np.nan], | ||
[0., 300., 150.], [100., 300., 150.], [200., 300., 150.]]), | ||
np.full((1, 3, 4), 150.)), | ||
# array 4x4 with all 120 and many nans | ||
(400, 400, 4, 4, np.array([[0., 0., np.nan], [100., 0., 120.], [200., 0., 120.], [300., 0., np.nan], | ||
[0., 100., 120.], [100., 100., np.nan], [200., 100., 120.], [300., 100., 120.], | ||
[0., 200., 120], [100., 200., np.nan], [200., 200., np.nan], [300., 200., 120.], | ||
[0., 300., np.nan], [100., 300., 120.], [200., 300., 120.], [300., 300., np.nan]]), | ||
np.full((1, 4, 4), 120.)) | ||
]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here as well, tests where the expected values aren't all one number are needed to really check assumptions such as:
- is it interpolating along the axis I think it is
- is it interpolating in a linear way (vs. some random other polynomial)
- is it even interpolating, or just taking the most common number?
- are we sure it's interpolating independently for every frame, or is it interpolating across different frames?
etc.
I am, incidentally, somewhat surprised that coordinates are getting wrapped with np.nan
values in them. @richardjgowers should this be happening...?
This is a starting point to solve #45
Changes in this PR:
surface_interpolation
). Includes docstrings.surface_interpolation
.As suggested by @orbeckst in this comment, the function here included uses NumPy. Thanks for encouraging numpy!
Surfaces included in tests are dummy_arrays with the following characteristics:
np.nan
.np.nan
.np.nan
.np.nan
. 💪🏽Probably I'll be asked to add more tests, but this is ok to start. Probably @lilyminium will give me lots of ideas. 😅
Thanks.