-
Notifications
You must be signed in to change notification settings - Fork 666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add arbitrary TopologyAttr selection #2927
Conversation
Hello @lilyminium! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2020-12-08 00:45:53 UTC |
96d3241
to
e965e9b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Is it strictly equivalent to our current Selections? Ie can we also delete many of the existing ones?
Yeah we should be able to remove many of the selection classes that just subclass StringSelection or RangeSelection :-)
… On 1 Sep 2020, at 23:25, Richard Gowers ***@***.***> wrote:
@richardjgowers commented on this pull request.
This looks good. Is it strictly equivalent to our current Selections? Ie can we also delete many of the existing ones?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
package/MDAnalysis/core/selection.py
Outdated
thismask = vals >= lower | ||
thismask &= vals <= upper | ||
else: | ||
thismask = vals == lower |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Equivalence is iffy for floats. I'd be almost inclined to raise an error and tell users that they need to do a range for floats. However, I also see how users will get annoyed when "val == 1" works and "val == 1.0" raises an exception.
So maybe just document that equality is problematic...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks cool. I have some comments/questions inline (on number ranges mostly). It will need some minimal documentation in the doc strings and in the online docs (and the UG...). How can we make it easy for users to understand which TopologyAttrs are available for selections? Perhaps autogenerate all of them and add an asterisk that says that this is only available for certain topologies?
0601443
to
f2d2c5f
Compare
Codecov Report
@@ Coverage Diff @@
## develop #2927 +/- ##
========================================
Coverage 93.09% 93.09%
========================================
Files 186 186
Lines 24665 24736 +71
Branches 3196 3216 +20
========================================
+ Hits 22961 23028 +67
- Misses 1656 1660 +4
Partials 48 48
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat. My only real concern is documentation, both for developers and users. I don't have any great suggestions but would like to hear yours (and other's) thoughts.
a881f87
to
333b70e
Compare
def test_bool_sel(): | ||
pytest.importorskip("rdkit.Chem") | ||
u = MDAnalysis.Universe.from_smiles("Nc1cc(C[C@H]([O-])C=O)c[nH]1") | ||
assert len(u.select_atoms("aromaticity")) == 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this selection name, is this what it is already called? I think 'aromatic'/'not aromatic' is correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because "aromaticity" is the name of the topology attribute and it just gets automatically added. I think it's the only Boolean selection we have right now, which is why it's in the test. There is also a separately defined topology attribute with the "aromatic" token
"""Range selection for float values""" | ||
|
||
pattern = f"({FLOAT_PATTERN}){RANGE_PATTERN}({FLOAT_PATTERN})" | ||
dtype = float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for matching floats we should be using a fuzzier match, something like np.almost_equal? Or would that just create bad results for positions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really want to encourage equality comparisons for floats. IIRC there's already a warning in the docs that for floats one should do range comparisons.
But if you think that we get less grief by having some "almost equal" matching for floats then we could do it.
What would we choose as tolerance? What algorithm for comparison (using only absolute, only relative, or whatever np.allclose() does where we set the default values?
Or we could introduce a ~
operator for "almost equal matching". Or to be very onerous, the unicode symbol U+2248 ALMOST EQUAL TO ≈
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would we choose as tolerance?
I made this the user's problem by adding rtol
and atol
keywords. I don't know how to get around things like the below though (or if we should be trying):
>>> 0.3000000000000000000000000 == 0.30000000000000001
True
>>> 0.3000000000000000000000000 is 0.30000000000000001
True
>>> 0.30000000000000001 == 3/10
True
>>> 0.30000000000000001 is 3/10
True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also brings up what to do with the PropertySelection ("prop mass == 0.3") and SameSelection ("same mass as (mass 0.3)"). For now I changed the former to use np.isclose
but the latter checks for value membership with equality (np.in1d
)
edit: except for positions, which is np.isclose
63bb636
to
9406aac
Compare
package/MDAnalysis/core/selection.py
Outdated
# thismask |= np.isclose(vals, lower, atol=self.atol, | ||
# rtol=self.rtol) | ||
# thismask |= np.isclose(vals, upper, atol=self.atol, | ||
# rtol=self.rtol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on this? The integer range selection includes the ends; should float range selections?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, then let's be consistent. (The clearer version for floats would be strict inequalities, i.e., an open interval, but my feeling is that users rather go for "this value or more" or "this value or less" and don't want to include their own epsilon... "this value minus epsilon or more".)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going back and forth on using isclose()
for the endpoints.
But I am now tending to just use <=
and >=
and then it behaves exactly as normal comparison operators in Python. I think that's less confusing than the special case where we would make the endpoints fuzzy.
Other opinions welcome! – @jbarnoud @richardjgowers @zemanj ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that solution, but I've kept it in a separate commit (fc68f03) in case people disagree :-)
c033715
to
44e9895
Compare
44e9895
to
a5e98d1
Compare
8dd5e33
to
3384126
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all the extra docs. Very helpful!
The main thing is to switch to inclusive float selections (at least I favor them, with simple <=
and >=
) – despite the iffiness with floats, I think this is more in-line with what users would try to use.
Otherwise I have various minor comments.
I am blocking it because of the question on how to finally to float ranges.
package/MDAnalysis/core/selection.py
Outdated
# thismask |= np.isclose(vals, lower, atol=self.atol, | ||
# rtol=self.rtol) | ||
# thismask |= np.isclose(vals, upper, atol=self.atol, | ||
# rtol=self.rtol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going back and forth on using isclose()
for the endpoints.
But I am now tending to just use <=
and >=
and then it behaves exactly as normal comparison operators in Python. I think that's less confusing than the special case where we would make the endpoints fuzzy.
Other opinions welcome! – @jbarnoud @richardjgowers @zemanj ?
4759da0
to
4db7800
Compare
4db7800
to
bf81063
Compare
("mass -5--3", 2, {}), # select -5 to -3 | ||
("mass -5 --3", 2, {}), # spacing | ||
("mass -5- -3", 2, {}), # spacing | ||
("mass -3 : -5", 0, {}), # wrong way around |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this selects nothing rather than raise an error? I'm trying to think what this should do, maybe it should automatically flip the values to select -3 : -5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @orbeckst and I discussed this, and we decided to follow the behaviour of VMD etc which just select nothing when the ranges are the wrong way around.
|
||
|
||
@pytest.mark.parametrize("selstr, n_atoms", [ | ||
("aromaticity", 5), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think aromaticity
is the wrong word for this, should be not aromatic
. Should there be some way of overriding the default keyword used by the TopologyAttrMeta or should exceptions just require a Selection class to be manually written (current behaviour)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aromatic
still exists, the metaclass just dumbly checks whether the attrname (aromaticity in this case) exists in _SELECTIONDICT
; possibly if we switched the contents of _SELECTIONDICT to be classes instead, we could avoid the redundancy? As it is we could just not document weird quirks like this?
To me the options are:
- change the attribute name to "aromatic"
- change _SELECTIONDICT to have classes instead
- do nothing, hope people don't notice the double up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only major issue is that the new "to" range operator is not mentioned in CHANGELOG.
I'll happily approve as soon this is added – please just ping me again.
Great addition to the selection system!!
I've merged for now, @richardjgowers, we can modify the behaviour of |
* Add arbitrary TopologyAttr selection (MDAnalysis#2927) Fixes MDAnalysis#2925 Fixes MDAnalysis#2875 Fixes MDAnalysis#3054 Changes made in this Pull Request: - added a class factory to subclass `core.selection.Selection` for each TopologyAttr - added tokens to `core.selection.SameSelection` - added `FloatRangeSelection` and `BoolSelection` - added negatives, scientific notation and "to" delimiter for ranges * Add ReadTheDocs configuration for PR builds (MDAnalysis#3060) - Adds RTD configuration - Add `environment.yml` for package installation * Remove appveyor * Install MDAnalysis on ReadTheDocs via pip (MDAnalysis#3071) Install via `pip install package/` to build current docs on ReadTheDocs * try stringio * rm metals file * pin pytest * pin pytest on gh actions * Fixes RMSF docstring (Issue MDAnalysis#2806) (MDAnalysis#3033) Fixes the RMSF docstring's align command and adds transformation to make the results accurate * MAINT: simplify guessers regex (MDAnalysis#3085) * the `SYMBOLS` regex in `guessers.py` does not require any escape sequences because the metacharacters are inactive in the character class (this includes the range metacharacter when placed at the start or end of the character class) * MAINT: char class regex improve * avoid the overhead of a regex character class when that character class has only a single character (i.e., serves no purpose) * there is only one instance of this in MDA codebase discovered by my [scraping code](https://github.com/tylerjereddy/regex-improve) * for a longer explanation see my similar changes in NumPy codebase: numpy/numpy#18083 * Fix syntax warning over comparison of literals using is. * Quick fix for atommethods to return empty residue group (MDAnalysis#3089) Returns empty residue group for _get_prev_residues_by_resid and _get_next_residues_by_resid * Add to authors list. Co-authored-by: Lily Wang <[email protected]> Co-authored-by: IAlibay <[email protected]> Co-authored-by: Tyler Reddy <[email protected]> Co-authored-by: Lily Wang <[email protected]> Co-authored-by: Irfan Alibay <[email protected]> Co-authored-by: Oliver Beckstein <[email protected]> Co-authored-by: Karthikeyan Singaravelan <[email protected]> Co-authored-by: Aditya Kamath <[email protected]>
Fixes MDAnalysis#2925 Fixes MDAnalysis#2875 Fixes MDAnalysis#3054 Changes made in this Pull Request: - added a class factory to subclass `core.selection.Selection` for each TopologyAttr - added tokens to `core.selection.SameSelection` - added `FloatRangeSelection` and `BoolSelection` - added negatives, scientific notation and "to" delimiter for ranges
Fixes #2925
Fixes #2875
Fixes #3054
Changes made in this Pull Request:
core.selection.Selection
for each TopologyAttr, providing that the token has not already been defined in_SELECTIONDICT
(i.e. explicitly defined classes inselection.py
take precedence)core.selection.SameSelection
for "same xx as" selectionFloatRangeSelection
andBoolSelection
PR Checklist