-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AnalysisBase #43
AnalysisBase #43
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fantastic start @ojeda-e! Your intuition is spot on for what should be done in the init/prepare/single_frame/conclude methods. I have some notes on the structure, and I wonder if we can avoid using counter arrays entirely by using np.nanmean
.
membrane_curvature/base.py
Outdated
|
||
def _single_frame(self): | ||
|
||
surface_ = derive_surface(n_x_bins=self.n_cells_x, n_y_bins=self.n_cells_y, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do you give derive_surface
the coordinates? I don't see self.n_cells_x, self.n_cells_y, self.max_width_x, self.max_width_y
defined either?
membrane_curvature/base.py
Outdated
def _conclude(self): | ||
|
||
# Calculate average of np.array over trajectory normalized over counter | ||
self.results.average_surface = self.results.surface[0] / self.results.surface[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use np.nanmean
to average over your surfaces without needing to have these counter
arrays?
For example, you could set up a z_surface
array to collect all the surfaces you get from derive_surface
without modification:
def _prepare(self):
self.results.z_surface = np.full((self.n_frames, self.x_bins, self.y_bins), np.nan)
def _single_frame(self):
self.results.z_surface[self._frame_index] = derive_surface(...)
and then finally average over the frames in _conclude:
def _conclude(self):
self.results.average_z_surface = np.nanmean(self.results.z_surface, axis=0)
(and the same for the mean and gaussian surfaces).
membrane_curvature/base.py
Outdated
self.pbc = pbc | ||
self.y_bins = y_bins | ||
self.x_bins = x_bins | ||
self.x_range = x_range if x_range else (0, universe.dimensions[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I like these defaults!
membrane_curvature/base.py
Outdated
import numpy as np | ||
|
||
|
||
def __init__(self, universe, select, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need universe
to get passed? Is select
an AtomGroup or a string -- if it's an AtomGroup, you could just use universe = select.universe
.
FYI most AnalysisBase classes use select
as a string, and many accept either an AtomGroup or Universe for the first argument. To safeguard against the multiple possibilities, you could do this:
def __init__(self, universe, select="all", ..., **kwargs):
super().__init__(universe.universe.trajectory, **kwargs)
self.selection = universe.select_atoms(select)
membrane_curvature/base.py
Outdated
self.results.gaussian = (cumulative, count) | ||
|
||
|
||
def _output(self, value_per_frame, cumulative_output, counter): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to ask you something about this approach, @lilyminium . To be honest, I didn't see any other sensible option to accumulate the values per frame and count. Would you please share with me some alternatives? What do you think would be a good idea?
(Not that I am looking for a change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to accumulate the count per frame as an end goal? My first thought would be to collect all the surfaces in a 3D array: (n_frames, n_x_cells, n_y_cells) and then average over them with np.nanmean
for the average surface.
Hi @lilyminium |
This design is looking really, really good @ojeda-e. I don't have much to say except that some lines are looking quite long so you may want to check that you're following PEP8. One trick could be to make I guess the proof is in the pudding -- would you like to start writing tests to see how it works? :-) |
Also great idea to switch the order of the arguments to |
Yes! :D I will open an issue for the respective tests and will work on them after solving If you are ok with it, I will skip making |
@lilyminium re-reading your comment makes me wonder if you would prefer the tests in this PR or in a separate issue. Apologies I wrongly assumed it was a separate PR. What do you think works best? |
I would personally prefer in this PR, just so we make sure that everything in the main branch always works :-)
… On Jul 7, 2021, at 6:30 PM, Estefania Barreto-Ojeda ***@***.***> wrote:
@lilyminium re-reading your comment makes me wonder if you would prefer the tests in this PR or in a separate issue. Apologies I wrongly assumed it was a separate PR. What do you think works best?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
devtools/conda-envs/test_env.yaml
Outdated
- sphinx | ||
|
||
- MDAnalysis==2.0.0-dev0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to install it explicitly via pip:
- MDAnalysis==2.0.0-dev0 | |
- pip: | |
- MDAnalysis==2.0.0-dev0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ojeda-e
Because it's relying on the Results
class, could you raise an issue as a reminder that MDA should get pinned to >=2.0.0 in setup.py
? (I wouldn't do it now, but it'd be a good thing to do before the first release)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also +1 to @RMeli's comment. The beta needs to be installed via pip
(swapping -dev0
with b0
should do the trick).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, @IAlibay, this PR demonstrates to be that the beta was worth the effort because it makes it a lot easier for packages to test against what 2.0 will look like.
Another catch is that I think it’s actually called MDAnalysis==2.0.0b0
@IAlibay could you confirm?
… On Jul 9, 2021, at 12:53 PM, Rocco Meli ***@***.***> wrote:
@RMeli commented on this pull request.
In devtools/conda-envs/test_env.yaml:
> - sphinx
-
+ - MDAnalysis==2.0.0-dev0
I think you need to install it explicitly via pip:
⬇️ Suggested change
- - MDAnalysis==2.0.0-dev0
+ - pip:
+ - MDAnalysis==2.0.0-dev0
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
@lilyminium I had this feeling and I was looking for the announcement mentioning the beta package name... Do you know where I can find it (for future reference)? It is not with the usual blogs. Was it just on Twitter? |
We never got around to it, we were going to announce it with the MDAnalysis workshop materials but unfortunately that got stalled. edit: also yes, can confirm it's MDAnalysis==2.0.0b0 Long term we should have a CI runner that directly installs the dev version though. |
Sorry I didn't mean to remove those review requests... hazards of githubbing on phone 🤦♀️ |
Thanks, @lilyminium, @IAlibay, and @RMeli for your comments. I just wanted to add an initial version of the tests with previously used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of comments, overall looks good mainly needs documentation.
membrane_curvature/base.py
Outdated
if (self.selection.n_residues) == 0: | ||
raise ValueError("Invalid selection. Empty AtomGroup.") | ||
|
||
# Raise if range doesn't cover entire dimensions of simulation box |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I missed some of the discussion on this.
Assuming the grid needs to cover the entire box, how do you deal with NPT-led box fluctuations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's something I didn't pay too much attention to (and I didn't clarify, my bad). One of the assumptions I had is that calculation of membrane curvature is performed in equilibrated systems. If the box shrinks or expands too much (one unit cell in the grid), the calculation wouldn't be accurate.
The reason why I introduced the restrictions (which I changed now to a warning message) is that (1) if the surface is derived from a portion of the membrane, the calculation of curvature in the edges will be inaccurate because the calculation of a gradient runs over adjacent values. On the other hand, (2) in the scenario in which the grid is too big, unnecessary np.nans
will be introduced. It means that, again, the undefined adjacent values will have a negative impact on the calculation of curvature.
I don't see a straightforward solution for NPT systems unless the calculation returns a different grid per frame and then results must be normalized over grids of the same size, and I don't have a clear idea, at least right now, of how to keep track of such variation over frames. Do you have any suggestions here?
membrane_curvature/base.py
Outdated
pbc=True, **kwargs): | ||
|
||
super().__init__(universe.universe.trajectory, **kwargs) | ||
self.selection = universe.select_atoms(select) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make everything clearer to future developers I would rename this from self.selection to either self.ag or self.atomgroup. That way folks can look at the name and immediately know what type of object it is (also better matches other MDA analysis methods).
membrane_curvature/base.py
Outdated
self.y_range = y_range if y_range else (0, universe.dimensions[1]) | ||
|
||
# Raise if selection doesn't exist | ||
if (self.selection.n_residues) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing wrong with this technically, but it might be good to simplify it to just len(atomgroup) == 0
purely on the the basis that it's more descriptive of your intent / more widely used way of doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By renaming selection
to ag
, your suggestion reads better, so I updated it. Thanks.
Hello @ojeda-e! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2021-07-14 01:04:48 UTC |
As suggested by @IAlibay, docstrings were added and suggested changes that improve the readability of the code. :) |
Hi Estefania,
I believe the DensityAnalysis in MDAnalysis deals with a similar problem. Perhaps have a look at how the box is defined there? If I recall correctly, the box doesn’t change but you may get warnings. ***@***.*** would know more as the one who wrote it!)
https://docs.mdanalysis.org/1.1.1/documentation_pages/analysis/density.html
https://docs.mdanalysis.org/1.1.1/_modules/MDAnalysis/analysis/density.html#DensityAnalysis
Cheers,
Lily
… On Jul 11, 2021, at 10:00 AM, Estefania Barreto-Ojeda ***@***.***> wrote:
@ojeda-e commented on this pull request.
In membrane_curvature/base.py:
> + y_range=None,
+ pbc=True, **kwargs):
+
+ super().__init__(universe.universe.trajectory, **kwargs)
+ self.selection = universe.select_atoms(select)
+ self.pbc = pbc
+ self.n_x_bins = n_x_bins
+ self.n_y_bins = n_y_bins
+ self.x_range = x_range if x_range else (0, universe.dimensions[0])
+ self.y_range = y_range if y_range else (0, universe.dimensions[1])
+
+ # Raise if selection doesn't exist
+ if (self.selection.n_residues) == 0:
+ raise ValueError("Invalid selection. Empty AtomGroup.")
+
+ # Raise if range doesn't cover entire dimensions of simulation box
I think that's something I didn't pay too much attention to (and I didn't clarify, my bad). One of the assumptions I had is that calculation of membrane curvature is performed in equilibrated systems. If the box shrinks or expands too much (one unit cell in the grid), the calculation wouldn't be accurate.
The reason why I introduced the restrictions (which I changed now to a warning message) is that (1) if the surface is derived from a portion of the membrane, the calculation of curvature in the edges will be inaccurate because the calculation of a gradient runs over adjacent values. On the other hand, (2) in the scenario in which the grid is too big, unnecessary np.nans will be introduced. It means that, again, the undefined adjacent values will have a negative impact on the calculation of curvature.
I don't see a straightforward solution for NPT systems unless the calculation returns a different grid per frame and then results must be normalized over grids of the same size, and I don't have a clear idea, at least right now, of how to keep track of such variation over frames. Do you have any suggestions here?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
This PR includes the AnalysisBase with respective docstrings and tests. I would appreciate a final review here @IAlibay @orbeckst @fiona-naughton |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your AnalysisBase version looks very good — clean structure, good use of functions, well done!
I have a bunch of comments on documentation and testing. In addition to the inline comments I noted that your tests always set explicitly selection="all"
. You should also test that the implicit default works (just in case this gets changed) and you should also check that other selections work.
Again: well done!
membrane_curvature/base.py
Outdated
Number of bins in grid in the `x` dimension. | ||
n_y_bins : int, optional. Default: '100' | ||
Number of bins in grid in the `y` dimension. | ||
x_range : tuple of (float, float), optional. Deafult: (0, `universe.dimensions[0]`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x_range : tuple of (float, float), optional. Deafult: (0, `universe.dimensions[0]`) | |
x_range : tuple of (float, float), optional. Default: (0, `universe.dimensions[0]`) |
membrane_curvature/base.py
Outdated
Number of bins in grid in the `y` dimension. | ||
x_range : tuple of (float, float), optional. Deafult: (0, `universe.dimensions[0]`) | ||
Range of coordinates (min, max) in the `x` dimension. | ||
y_range : tuple of (float, float), optional. Deafult: (0, `universe.dimensions[1]`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
y_range : tuple of (float, float), optional. Deafult: (0, `universe.dimensions[1]`) | |
y_range : tuple of (float, float), optional. Default: (0, `universe.dimensions[1]`) |
membrane_curvature/surface.py
Outdated
@@ -1,7 +1,7 @@ | |||
import numpy as np | |||
|
|||
|
|||
def derive_surface(n_cells_x, n_cells_y, selection, max_width_x, max_width_y): | |||
def derive_surface(selection, n_cells_x, n_cells_y, max_width_x, max_width_y): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- update the doc string
- renaming
selection
->atoms
would be clearer
MembraneCurvature(universe, select='name PO4', x_range=(0, 10)) | ||
MembraneCurvature(universe, select='name PO4', y_range=(0, 10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to test these two lines separately, otherwise you can get a passing test even if one of them does not raise the warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pytest.warns
can also match part of the warnings string, always a good idea to test that the warning looks like what you expect it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking fantastic. There's one required fix with the logger but I am sure you'll come up with a sensible solution there so I'll just approve.
membrane_curvature/base.py
Outdated
Notes | ||
----- | ||
The derived surface and calculated curvatures are available in the | ||
``.results``::atttributes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mark up the results attribute:
``.results``::atttributes. | |
:attr:`results` attributes. |
membrane_curvature/base.py
Outdated
The derived surface and calculated curvatures are available in the | ||
``.results``::atttributes. | ||
|
||
The attribute :attr:`MembraneCurvature.z_surface` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this inside results
? So
The attribute :attr:`MembraneCurvature.z_surface` | |
The attribute :attr:`MembraneCurvature.results.z_surface` |
If you only want to display z_surface
use the tilde
The attribute :attr:`~MembraneCurvature.results.z_surface`
membrane_curvature/base.py
Outdated
The attribute :attr:`MembraneCurvature.z_surface` | ||
contains the derived surface averaged over the `n_frames` of the trajectory. | ||
|
||
The attributes :attr:`MembraneCurvature.mean_curvature` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
results?
membrane_curvature/base.py
Outdated
contains the derived surface averaged over the `n_frames` of the trajectory. | ||
|
||
The attributes :attr:`MembraneCurvature.mean_curvature` and | ||
:attr:`MembraneCurvature.gaussian_curvature` contain the computed values of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
results?
membrane_curvature/base.py
Outdated
Average of the array elements in `gaussian_curvature` | ||
Example | ||
----------- | ||
You can pass a universe containing your selection of reference: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs double colon to format as code
You can pass a universe containing your selection of reference: | |
You can pass a universe containing your selection of reference:: |
membrane_curvature/base.py
Outdated
|
||
import logging | ||
|
||
logger = logging.getLogger("MDAnalysis.analysis.density") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name your logger differently ... something like
logger = logging.getLogger("MDAnalysis.analysis.density") | |
logger = logging.getLogger("MDAnalysis.MDAKit.membrane_curvature") |
I started your logger's name with "MDAnalysis" so that output gets added to the default logger that can be started with
import MDAnalysis
MDAnalysis.start_logging()
and whose root logger is named "MDAnalysis". Hence any logger named "MDAnalysis.*" will also log under it.
I suggest "MDAnalysis.MDAKit" to define a namespace that all MDAKits could be using but we haven't really thought about this so this could definitely be subject to change. Maybe @IAlibay @lilyminium @richardjgowers @fiona-naughton have opinions.
In any case, for right now, pretty much anything is ok that's not "density" ;-).
That was an important merged PR, well done! |
This PR is an initial suggestion to fix #41
Methods included in the initial analysis base:
__init__
_prepare
_single_frame
_conclude
Some additional comments
This first AnalysisBase was build under the following assumptions:
AtomGroup
remain the same during then_frames
of the trajectory.H
andK
may be jeopardized by the introduction ofnp.nans
. This selection will certainly generatenp.nans
in the calculation of the gradient, which ultimately will affect the averaged grids. Hence, the attempt of warning the user (in__init__
).np.array
ofsurface
,K
, andH
over frames. (in_conclude
)surface
,H
, andK
, is performed in every frame (in_single_frame
).surface
,H
andK
and their normalized grids are initialized (in_prepare
). They are a tuple of np.arrays where the first element contains the respective values, and the second element counts the elements when defined._output
function was introduced. (But I am not sure this is something people would agree with)As discussed in #21 with @lilyminium, this analysis base will potentially replace
core.py
.