-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
groupby method for GroupBase #1112
Conversation
Not sure why my branch has stupid merge commits...hmmm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Will need some tests for different data types (object, int, float). There's MDAnalysisTests.core.groupbase.make_Universe
to mock a Universe with arbitrary attributes.
With floats, the comparison between them might be a little fragile (esp. for something like derived quantities), so you could use numpy.isclose
rather than ==
.
Once/If we have NaN values for missing values, we'll need to handle this here too as NaN != NaN
, so maybe add a comment about that so we don't forget
@@ -816,6 +816,24 @@ def wrap(self, compound="atoms", center="com", box=None): | |||
if not all(s == 0.0): | |||
o.atoms.translate(s) | |||
|
|||
def groupby(self, topattr): | |||
"""Obtain groupings of the components of this Group according the values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a one liner, so something like 'group according to a given attribute'
---------- | ||
topattr: str | ||
Topology attribute to group components by. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an example to the docstring too (just what you put in the PR would do)
Unique values of the topology attribute as keys, Groups as values. | ||
|
||
""" | ||
return {i: self[self.__getattribute__(topattr) == i] for i in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we're fooling around with __getattr__
methods in our classes, it might be safer to use getattr(self, topattr)
which is more like calling self.topattr
(rather than directly going to the class method). I'm not sure if calling the class method directly could bypass something
How come that Otherwise the groupby is a nice idea. They seems to allow a lot of sub selections on AtomGroups, selecting residues could then be done with just a groupby `ag.groupby('residues')'. |
For floating point fields lke masses we should allow a |
Where should general docs for this go? I would say they belong to selections. There we can also add some tips about common misconceptions of the segment and residues selections on the python objects (not using a |
Shouldn't this depend on the precision of the data type? You could default to maybe |
We could do that. But then you also have to compare the types and choose the epsilon of the least accurate type. Then there is also the issue that we should actually use the accuracy of the file format the value originally comes from. For example the occupancy entries in the PDB are only accurate upto 2 decimal places. I thought having a single |
@kain88-de if you have a system with 4 atoms, 2 residues of size 2, and an AtomGroup with atoms 2&3 (see picture), then you can go upwards to Residues and not leave the scope of the AtomGroup, but if you come downwards from Residues then you will leave the AtomGroup. So Or another way of thinking about it, the memory of the original selection is kept going upwards because there is only many-to-one mappings, whereas downwards there is one-to-many mappings. |
I concede the point – let the user determine the level of equivalence. For XTC coordinates, 1e-3... Note that float32 epsilon is In [4]: numpy.finfo(dtype=numpy.float32).eps
Out[4]: 1.1920929e-07
In [6]: numpy.finfo(dtype=numpy.float32).precision
Out[6]: 6 so I'd be a bit more generous with the default, say 1e-07 or even 1e-06. |
Regarding docs #1112 (comment):
The 0.15.0 docs had a pretty sizable narrative on using and working with the different levels in the container hierarchy under Fundamental building blocks — MDAnalysis.core.AtomGroup which seems to have been removed from the 0.16.0 docs during the topology reboot. It might be worthwhile putting that back in a separate reST file in the sphinx doc near the beginning (not under the new Core objects: Containers — MDAnalysis.core.groups which can stay lean and API-focused). This would then also be a good place for some the docs discussed here. |
I noticed we have a |
@dotsdl & @richardjgowers what still needs to be done here? |
Tests! |
Sorry all, been traveling for the holiday. It's everything at once these days I'm afraid. :/ |
Ok I've finished this up if someone else wants to review it |
I think you need to rebase the branch against develop because it still contains "stupid merge commits" #1112 (comment) |
I think if you just hit the magic squash and merge button it'll make it all into a single @dotsdl commit which is fine |
So this is the "minimal" groupby without |
@richardjgowers or @kain88-de I leave it to you to squash (or do any last minute fixes if needed). |
@richardjgowers You can decide. I don't have time now to review this. |
reduced number of getattr calls
435ba94
to
3deb8bd
Compare
Addresses #1105
Changes made in this Pull Request:
groupby
method toGroupBase
that allows for familiar groupby operations on single topology attributes a lapandas
ordatreant
e.g.
and then we can do:
before merging, it would be real nice if multiple attributes could be given so that groupings can be done on combinations of values.
PR Checklist