-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Undefined values in bins occupied by insertion in membrane. #45
Comments
pandas is slow, use numpy/scipy! In GridDataFormats we have the option to interpolate on a grid https://github.com/MDAnalysis/GridDataFormats/blob/be6132ac13041390a880061e4e873044b6c29573/gridData/core.py#L311 ; this is not the cleanest code but perhaps useful to initially look at. |
Commenting here instead of the PR so that discussion is in one place: I'm not sure that this is a good idea. There is no membrane where the protein is, so there shouldn't be any membrane curvature there either. |
Thanks @lilyminium . I was considering having the interpolation optional and adding a warning message when np.nans are simply too much. I didn't add anything in
In summary, untreated undefined values may result in extended regions of The user will have an option to interpolate the
But in any case, and in the way I see the solution to calculate curvature, the option of interpolation has to be provided. I was considering an option to mask the protein, but I won't elaborate on it (unless I am asked to) and will leave it as an enahcement. |
Thank you for your detailed and clear explanation, @ojeda-e. However, I think I must be missing something, because I don't understand how the protein causes that nan value on the edge. How big is the membrane relative to grid you've illustrated? I would think that the more flexible solution is to return all the data and let users make their own choices about how they want to interpolate and visualize the grid.
Is this still true with your current code that uses |
Sorry @lilyminium, my bad for not clarifying. My intention with the illustration was to show the same effect (spread of undefined values) due to separate effects:
In the first case is easier to identify how a single
You mean, if even by using nanmean we still may have a considerable loss of information? My answer is: I think yes, it can happen, but it depends on several factors. For example, in systems with extremely short simulation time (low Edit: Maybe we can have some opinions from @MDAnalysis/gsoc-mentors ? |
I'm very impressed by your detailed analysis @ojeda-e! In general my approach is that a |
Thanks, @hmacdope. Now, the fact that we calculate second partial derivatives, means that we calculate gradient twice. In this example, we can see the effect of only one These are some of the reasons why calculating the curvature of biological membranes can get extremely hard. The formalism asks to have second partial derivatives. Gradients are calculated as differences between adjacent values. Now, add to this rigid formalism that we depend on the lipid composition of the membrane. So, if have a bilayer with 1:1:1 phospholipid_1:phospholipid_2:other_lipid, where other_lipid is any type of lipid that has a flip-flop rate high enough that we have several flip-fllop events in the interval we are averaging, well, we lose 1/3 of the elements used to derive the surface. Now, imagine that, on top of all the above you had a simulation setup small enough that your protein covers lots of unit cells. In this context, and how I see the problem, having the option to interpolate is necessary. The possibility of abusers of interpolation methods, that's another story. That the user is calculating curvature in a simulation setup that is not suitable to calculate curvature, that's another story too. The user will be more than welcome to turn interpolation off. But it should be offered. |
When calculating
get_z_surface
undefined values in bins inside the embedded element (i.e. protein) may arise since noz
coordinates populate bins in the grid. Such undefined values spread in the array during the calculation of curvature.One possible solution is to interpolate the bins occupied by the protein. I would avoid using
pandas
although that approach seems the most straightforward. Checknp.interpolate
.The text was updated successfully, but these errors were encountered: