Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add arbitrary TopologyAttr selection #2927

Merged
merged 17 commits into from
Dec 10, 2020

Conversation

lilyminium
Copy link
Member

@lilyminium lilyminium commented Sep 1, 2020

Fixes #2925
Fixes #2875
Fixes #3054

Changes made in this Pull Request:

  • added a class factory to subclass core.selection.Selection for each TopologyAttr, providing that the token has not already been defined in _SELECTIONDICT (i.e. explicitly defined classes in selection.py take precedence)
  • added tokens to core.selection.SameSelection for "same xx as" selection
  • added FloatRangeSelection and BoolSelection

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@pep8speaks
Copy link

pep8speaks commented Sep 1, 2020

Hello @lilyminium! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 149:13: E125 continuation line with same indent as next logical line
Line 149:13: E128 continuation line under-indented for visual indent
Line 1524:1: W293 blank line contains whitespace

Line 1292:1: E302 expected 2 blank lines, found 1
Line 1318:80: E501 line too long (82 > 79 characters)
Line 1319:80: E501 line too long (82 > 79 characters)
Line 1320:80: E501 line too long (82 > 79 characters)
Line 1321:80: E501 line too long (85 > 79 characters)

Comment last updated at 2020-12-08 00:45:53 UTC

@lilyminium lilyminium force-pushed the topattr-token branch 3 times, most recently from 96d3241 to e965e9b Compare September 1, 2020 13:20
Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Is it strictly equivalent to our current Selections? Ie can we also delete many of the existing ones?

@lilyminium
Copy link
Member Author

lilyminium commented Sep 1, 2020 via email

thismask = vals >= lower
thismask &= vals <= upper
else:
thismask = vals == lower
Copy link
Member

@orbeckst orbeckst Sep 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equivalence is iffy for floats. I'd be almost inclined to raise an error and tell users that they need to do a range for floats. However, I also see how users will get annoyed when "val == 1" works and "val == 1.0" raises an exception.

So maybe just document that equality is problematic...

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks cool. I have some comments/questions inline (on number ranges mostly). It will need some minimal documentation in the doc strings and in the online docs (and the UG...). How can we make it easy for users to understand which TopologyAttrs are available for selections? Perhaps autogenerate all of them and add an asterisk that says that this is only available for certain topologies?

testsuite/MDAnalysisTests/core/test_atomselections.py Outdated Show resolved Hide resolved
package/MDAnalysis/core/selection.py Outdated Show resolved Hide resolved
package/MDAnalysis/core/selection.py Outdated Show resolved Hide resolved
package/MDAnalysis/core/selection.py Outdated Show resolved Hide resolved
package/MDAnalysis/core/selection.py Outdated Show resolved Hide resolved
testsuite/MDAnalysisTests/data/elements.pdb Show resolved Hide resolved
testsuite/MDAnalysisTests/data/elements.pdb Show resolved Hide resolved
package/MDAnalysis/core/selection.py Outdated Show resolved Hide resolved
@lilyminium lilyminium force-pushed the topattr-token branch 6 times, most recently from 0601443 to f2d2c5f Compare September 2, 2020 05:46
@codecov
Copy link

codecov bot commented Sep 2, 2020

Codecov Report

Merging #2927 (f1cb79e) into develop (cf75dd6) will increase coverage by 0.00%.
The diff coverage is 97.01%.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #2927   +/-   ##
========================================
  Coverage    93.09%   93.09%           
========================================
  Files          186      186           
  Lines        24665    24736   +71     
  Branches      3196     3216   +20     
========================================
+ Hits         22961    23028   +67     
- Misses        1656     1660    +4     
  Partials        48       48           
Impacted Files Coverage Δ
package/MDAnalysis/core/selection.py 98.75% <96.55%> (-0.50%) ⬇️
package/MDAnalysis/core/groups.py 98.58% <100.00%> (-0.01%) ⬇️
package/MDAnalysis/core/topologyattrs.py 96.66% <100.00%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf75dd6...f1cb79e. Read the comment docs.

@lilyminium lilyminium requested a review from orbeckst September 8, 2020 12:27
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat. My only real concern is documentation, both for developers and users. I don't have any great suggestions but would like to hear yours (and other's) thoughts.

package/MDAnalysis/core/selection.py Outdated Show resolved Hide resolved
package/MDAnalysis/core/selection.py Outdated Show resolved Hide resolved
package/MDAnalysis/core/selection.py Outdated Show resolved Hide resolved
package/MDAnalysis/core/selection.py Outdated Show resolved Hide resolved
package/MDAnalysis/core/selection.py Show resolved Hide resolved
testsuite/MDAnalysisTests/core/test_topologyattrs.py Outdated Show resolved Hide resolved
package/MDAnalysis/core/selection.py Outdated Show resolved Hide resolved
testsuite/MDAnalysisTests/core/test_atomselections.py Outdated Show resolved Hide resolved
def test_bool_sel():
pytest.importorskip("rdkit.Chem")
u = MDAnalysis.Universe.from_smiles("Nc1cc(C[C@H]([O-])C=O)c[nH]1")
assert len(u.select_atoms("aromaticity")) == 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this selection name, is this what it is already called? I think 'aromatic'/'not aromatic' is correct

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because "aromaticity" is the name of the topology attribute and it just gets automatically added. I think it's the only Boolean selection we have right now, which is why it's in the test. There is also a separately defined topology attribute with the "aromatic" token

"""Range selection for float values"""

pattern = f"({FLOAT_PATTERN}){RANGE_PATTERN}({FLOAT_PATTERN})"
dtype = float
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe for matching floats we should be using a fuzzier match, something like np.almost_equal? Or would that just create bad results for positions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really want to encourage equality comparisons for floats. IIRC there's already a warning in the docs that for floats one should do range comparisons.

But if you think that we get less grief by having some "almost equal" matching for floats then we could do it.
What would we choose as tolerance? What algorithm for comparison (using only absolute, only relative, or whatever np.allclose() does where we set the default values?

Or we could introduce a ~ operator for "almost equal matching". Or to be very onerous, the unicode symbol U+2248 ALMOST EQUAL TO .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would we choose as tolerance?

I made this the user's problem by adding rtol and atol keywords. I don't know how to get around things like the below though (or if we should be trying):

>>> 0.3000000000000000000000000 == 0.30000000000000001
True
>>> 0.3000000000000000000000000 is 0.30000000000000001
True
>>> 0.30000000000000001 == 3/10
True
>>> 0.30000000000000001 is 3/10
True

Copy link
Member Author

@lilyminium lilyminium Sep 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also brings up what to do with the PropertySelection ("prop mass == 0.3") and SameSelection ("same mass as (mass 0.3)"). For now I changed the former to use np.isclose but the latter checks for value membership with equality (np.in1d)

edit: except for positions, which is np.isclose

Comment on lines 875 to 878
# thismask |= np.isclose(vals, lower, atol=self.atol,
# rtol=self.rtol)
# thismask |= np.isclose(vals, upper, atol=self.atol,
# rtol=self.rtol)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on this? The integer range selection includes the ends; should float range selections?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, then let's be consistent. (The clearer version for floats would be strict inequalities, i.e., an open interval, but my feeling is that users rather go for "this value or more" or "this value or less" and don't want to include their own epsilon... "this value minus epsilon or more".)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going back and forth on using isclose() for the endpoints.

But I am now tending to just use <= and >= and then it behaves exactly as normal comparison operators in Python. I think that's less confusing than the special case where we would make the endpoints fuzzy.

Other opinions welcome! – @jbarnoud @richardjgowers @zemanj ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that solution, but I've kept it in a separate commit (fc68f03) in case people disagree :-)

@lilyminium lilyminium force-pushed the topattr-token branch 3 times, most recently from c033715 to 44e9895 Compare October 8, 2020 05:26
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all the extra docs. Very helpful!

The main thing is to switch to inclusive float selections (at least I favor them, with simple <= and >=) – despite the iffiness with floats, I think this is more in-line with what users would try to use.

Otherwise I have various minor comments.

I am blocking it because of the question on how to finally to float ranges.

package/CHANGELOG Outdated Show resolved Hide resolved
package/MDAnalysis/core/groups.py Show resolved Hide resolved
package/MDAnalysis/core/selection.py Show resolved Hide resolved
package/MDAnalysis/core/selection.py Show resolved Hide resolved
package/MDAnalysis/core/selection.py Outdated Show resolved Hide resolved
testsuite/MDAnalysisTests/core/test_atomselections.py Outdated Show resolved Hide resolved
Comment on lines 875 to 878
# thismask |= np.isclose(vals, lower, atol=self.atol,
# rtol=self.rtol)
# thismask |= np.isclose(vals, upper, atol=self.atol,
# rtol=self.rtol)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going back and forth on using isclose() for the endpoints.

But I am now tending to just use <= and >= and then it behaves exactly as normal comparison operators in Python. I think that's less confusing than the special case where we would make the endpoints fuzzy.

Other opinions welcome! – @jbarnoud @richardjgowers @zemanj ?

("mass -5--3", 2, {}), # select -5 to -3
("mass -5 --3", 2, {}), # spacing
("mass -5- -3", 2, {}), # spacing
("mass -3 : -5", 0, {}), # wrong way around
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this selects nothing rather than raise an error? I'm trying to think what this should do, maybe it should automatically flip the values to select -3 : -5

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @orbeckst and I discussed this, and we decided to follow the behaviour of VMD etc which just select nothing when the ranges are the wrong way around.



@pytest.mark.parametrize("selstr, n_atoms", [
("aromaticity", 5),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think aromaticity is the wrong word for this, should be not aromatic. Should there be some way of overriding the default keyword used by the TopologyAttrMeta or should exceptions just require a Selection class to be manually written (current behaviour)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aromatic still exists, the metaclass just dumbly checks whether the attrname (aromaticity in this case) exists in _SELECTIONDICT; possibly if we switched the contents of _SELECTIONDICT to be classes instead, we could avoid the redundancy? As it is we could just not document weird quirks like this?

To me the options are:

  • change the attribute name to "aromatic"
  • change _SELECTIONDICT to have classes instead
  • do nothing, hope people don't notice the double up

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only major issue is that the new "to" range operator is not mentioned in CHANGELOG.

I'll happily approve as soon this is added – please just ping me again.

Great addition to the selection system!!

package/CHANGELOG Show resolved Hide resolved
package/MDAnalysis/core/groups.py Outdated Show resolved Hide resolved
@lilyminium lilyminium merged commit 7746207 into MDAnalysis:develop Dec 10, 2020
@lilyminium lilyminium deleted the topattr-token branch December 10, 2020 01:02
@lilyminium
Copy link
Member Author

I've merged for now, @richardjgowers, we can modify the behaviour of aromaticity in a future PR?

cbouy pushed a commit to cbouy/mdanalysis that referenced this pull request Jan 12, 2021
* Add arbitrary TopologyAttr selection (MDAnalysis#2927)

Fixes MDAnalysis#2925 
Fixes MDAnalysis#2875
Fixes MDAnalysis#3054 

Changes made in this Pull Request:
 - added a class factory to subclass `core.selection.Selection` for each TopologyAttr
 - added tokens to `core.selection.SameSelection`
 - added `FloatRangeSelection` and `BoolSelection`
 - added negatives, scientific notation and "to" delimiter for ranges

* Add ReadTheDocs configuration for PR builds (MDAnalysis#3060)

 - Adds RTD configuration
 - Add `environment.yml` for package installation

* Remove appveyor

* Install MDAnalysis on ReadTheDocs via pip (MDAnalysis#3071)

Install via `pip install package/` to build current docs on ReadTheDocs

* try stringio

* rm metals file

* pin pytest

* pin pytest on gh actions

* Fixes RMSF docstring (Issue MDAnalysis#2806) (MDAnalysis#3033)

Fixes the RMSF docstring's align command and adds transformation to make the results accurate

* MAINT: simplify guessers regex (MDAnalysis#3085)

* the `SYMBOLS` regex in `guessers.py` does not require
any escape sequences because the metacharacters are inactive
in the character class (this includes the range metacharacter
when placed at the start or end of the character class)

* MAINT: char class regex improve

* avoid the overhead of a regex character class
when that character class has only a single character
(i.e., serves no purpose)

* there is only one instance of this in MDA codebase
discovered by my [scraping
code](https://github.com/tylerjereddy/regex-improve)

* for a longer explanation see my similar changes in
NumPy codebase:
numpy/numpy#18083

* Fix syntax warning over comparison of literals using is.

* Quick fix for atommethods to return empty residue group (MDAnalysis#3089)

Returns empty residue group for _get_prev_residues_by_resid and _get_next_residues_by_resid

* Add to authors list.

Co-authored-by: Lily Wang <[email protected]>
Co-authored-by: IAlibay <[email protected]>
Co-authored-by: Tyler Reddy <[email protected]>
Co-authored-by: Lily Wang <[email protected]>
Co-authored-by: Irfan Alibay <[email protected]>
Co-authored-by: Oliver Beckstein <[email protected]>
Co-authored-by: Karthikeyan Singaravelan <[email protected]>
Co-authored-by: Aditya Kamath <[email protected]>
PicoCentauri pushed a commit to PicoCentauri/mdanalysis that referenced this pull request Mar 30, 2021
Fixes MDAnalysis#2925 
Fixes MDAnalysis#2875
Fixes MDAnalysis#3054 

Changes made in this Pull Request:
 - added a class factory to subclass `core.selection.Selection` for each TopologyAttr
 - added tokens to `core.selection.SameSelection`
 - added `FloatRangeSelection` and `BoolSelection`
 - added negatives, scientific notation and "to" delimiter for ranges
@IAlibay IAlibay mentioned this pull request May 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handling of negative residues Allow selection with elements attribute Allow select_atoms to select chain
6 participants