Add arbitrary TopologyAttr selection #2927

lilyminium · 2020-09-01T13:08:19Z

Fixes #2925
Fixes #2875
Fixes #3054

Changes made in this Pull Request:

added a class factory to subclass core.selection.Selection for each TopologyAttr, providing that the token has not already been defined in _SELECTIONDICT (i.e. explicitly defined classes in selection.py take precedence)
added tokens to core.selection.SameSelection for "same xx as" selection
added FloatRangeSelection and BoolSelection

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

pep8speaks · 2020-09-01T13:08:25Z

Hello @lilyminium! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file package/MDAnalysis/core/selection.py:

Line 149:13: E125 continuation line with same indent as next logical line
Line 149:13: E128 continuation line under-indented for visual indent
Line 1524:1: W293 blank line contains whitespace

In the file testsuite/MDAnalysisTests/core/test_atomselections.py:

Line 1292:1: E302 expected 2 blank lines, found 1
Line 1318:80: E501 line too long (82 > 79 characters)
Line 1319:80: E501 line too long (82 > 79 characters)
Line 1320:80: E501 line too long (82 > 79 characters)
Line 1321:80: E501 line too long (85 > 79 characters)

Comment last updated at 2020-12-08 00:45:53 UTC

richardjgowers

This looks good. Is it strictly equivalent to our current Selections? Ie can we also delete many of the existing ones?

lilyminium · 2020-09-01T13:49:32Z

Yeah we should be able to remove many of the selection classes that just subclass StringSelection or RangeSelection :-)

…

On 1 Sep 2020, at 23:25, Richard Gowers ***@***.***> wrote: @richardjgowers commented on this pull request. This looks good. Is it strictly equivalent to our current Selections? Ie can we also delete many of the existing ones? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

orbeckst · 2020-09-01T19:00:03Z

package/MDAnalysis/core/selection.py

+                thismask = vals >= lower
+                thismask &= vals <= upper
+            else:
+                thismask = vals == lower


Equivalence is iffy for floats. I'd be almost inclined to raise an error and tell users that they need to do a range for floats. However, I also see how users will get annoyed when "val == 1" works and "val == 1.0" raises an exception.

So maybe just document that equality is problematic...

orbeckst

This looks cool. I have some comments/questions inline (on number ranges mostly). It will need some minimal documentation in the doc strings and in the online docs (and the UG...). How can we make it easy for users to understand which TopologyAttrs are available for selections? Perhaps autogenerate all of them and add an asterisk that says that this is only available for certain topologies?

testsuite/MDAnalysisTests/core/test_atomselections.py

package/MDAnalysis/core/selection.py

testsuite/MDAnalysisTests/core/test_atomselections.py

testsuite/MDAnalysisTests/data/elements.pdb

package/MDAnalysis/core/selection.py

codecov · 2020-09-02T06:46:15Z

Codecov Report

Merging #2927 (f1cb79e) into develop (cf75dd6) will increase coverage by 0.00%.
The diff coverage is 97.01%.

@@           Coverage Diff            @@
##           develop    #2927   +/-   ##
========================================
  Coverage    93.09%   93.09%           
========================================
  Files          186      186           
  Lines        24665    24736   +71     
  Branches      3196     3216   +20     
========================================
+ Hits         22961    23028   +67     
- Misses        1656     1660    +4     
  Partials        48       48

Impacted Files	Coverage Δ
package/MDAnalysis/core/selection.py	`98.75% <96.55%> (-0.50%)`	⬇️
package/MDAnalysis/core/groups.py	`98.58% <100.00%> (-0.01%)`	⬇️
package/MDAnalysis/core/topologyattrs.py	`96.66% <100.00%> (+0.04%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf75dd6...f1cb79e. Read the comment docs.

package/MDAnalysis/core/selection.py

orbeckst

Neat. My only real concern is documentation, both for developers and users. I don't have any great suggestions but would like to hear yours (and other's) thoughts.

package/MDAnalysis/core/selection.py

package/doc/sphinx/source/documentation_pages/selections.rst

testsuite/MDAnalysisTests/core/test_topologyattrs.py

package/MDAnalysis/core/selection.py

testsuite/MDAnalysisTests/core/test_atomselections.py

richardjgowers · 2020-09-23T07:54:30Z

testsuite/MDAnalysisTests/core/test_atomselections.py

+def test_bool_sel():
+    pytest.importorskip("rdkit.Chem")
+    u = MDAnalysis.Universe.from_smiles("Nc1cc(C[C@H]([O-])C=O)c[nH]1")
+    assert len(u.select_atoms("aromaticity")) == 5


I don't like this selection name, is this what it is already called? I think 'aromatic'/'not aromatic' is correct

It's because "aromaticity" is the name of the topology attribute and it just gets automatically added. I think it's the only Boolean selection we have right now, which is why it's in the test. There is also a separately defined topology attribute with the "aromatic" token

richardjgowers · 2020-09-23T07:56:36Z

package/MDAnalysis/core/selection.py

+    """Range selection for float values"""
+
+    pattern = f"({FLOAT_PATTERN}){RANGE_PATTERN}({FLOAT_PATTERN})"
+    dtype = float


Maybe for matching floats we should be using a fuzzier match, something like np.almost_equal? Or would that just create bad results for positions?

I don't really want to encourage equality comparisons for floats. IIRC there's already a warning in the docs that for floats one should do range comparisons.

But if you think that we get less grief by having some "almost equal" matching for floats then we could do it.
What would we choose as tolerance? What algorithm for comparison (using only absolute, only relative, or whatever np.allclose() does where we set the default values?

Or we could introduce a ~ operator for "almost equal matching". Or to be very onerous, the unicode symbol U+2248 ALMOST EQUAL TO ≈.

What would we choose as tolerance?

I made this the user's problem by adding rtol and atol keywords. I don't know how to get around things like the below though (or if we should be trying):

>>> 0.3000000000000000000000000 == 0.30000000000000001 True >>> 0.3000000000000000000000000 is 0.30000000000000001 True >>> 0.30000000000000001 == 3/10 True >>> 0.30000000000000001 is 3/10 True

This also brings up what to do with the PropertySelection ("prop mass == 0.3") and SameSelection ("same mass as (mass 0.3)"). For now I changed the former to use np.isclose but the latter checks for value membership with equality (np.in1d)

edit: except for positions, which is np.isclose

lilyminium · 2020-09-25T02:51:39Z

package/MDAnalysis/core/selection.py

+                # thismask |= np.isclose(vals, lower, atol=self.atol,
+                #                        rtol=self.rtol)
+                # thismask |= np.isclose(vals, upper, atol=self.atol,
+                #                        rtol=self.rtol)


Thoughts on this? The integer range selection includes the ends; should float range selections?

Yes, then let's be consistent. (The clearer version for floats would be strict inequalities, i.e., an open interval, but my feeling is that users rather go for "this value or more" or "this value or less" and don't want to include their own epsilon... "this value minus epsilon or more".)

I am going back and forth on using isclose() for the endpoints.

But I am now tending to just use <= and >= and then it behaves exactly as normal comparison operators in Python. I think that's less confusing than the special case where we would make the endpoints fuzzy.

Other opinions welcome! – @jbarnoud @richardjgowers @zemanj ?

I like that solution, but I've kept it in a separate commit (fc68f03) in case people disagree :-)

orbeckst

Thank you for all the extra docs. Very helpful!

The main thing is to switch to inclusive float selections (at least I favor them, with simple <= and >=) – despite the iffiness with floats, I think this is more in-line with what users would try to use.

Otherwise I have various minor comments.

I am blocking it because of the question on how to finally to float ranges.

package/CHANGELOG

package/MDAnalysis/core/groups.py

package/MDAnalysis/core/selection.py

package/doc/sphinx/source/documentation_pages/selections.rst

testsuite/MDAnalysisTests/core/test_atomselections.py

orbeckst · 2020-12-04T23:14:16Z

package/MDAnalysis/core/selection.py

+                # thismask |= np.isclose(vals, lower, atol=self.atol,
+                #                        rtol=self.rtol)
+                # thismask |= np.isclose(vals, upper, atol=self.atol,
+                #                        rtol=self.rtol)


I am going back and forth on using isclose() for the endpoints.

But I am now tending to just use <= and >= and then it behaves exactly as normal comparison operators in Python. I think that's less confusing than the special case where we would make the endpoints fuzzy.

Other opinions welcome! – @jbarnoud @richardjgowers @zemanj ?

richardjgowers · 2020-12-05T16:53:19Z

testsuite/MDAnalysisTests/core/test_atomselections.py

+    ("mass -5--3", 2, {}),  # select -5 to -3
+    ("mass -5 --3", 2, {}),  # spacing
+    ("mass -5- -3", 2, {}),  # spacing
+    ("mass -3 : -5", 0, {}),  # wrong way around


So this selects nothing rather than raise an error? I'm trying to think what this should do, maybe it should automatically flip the values to select -3 : -5

I think @orbeckst and I discussed this, and we decided to follow the behaviour of VMD etc which just select nothing when the ranges are the wrong way around.

testsuite/MDAnalysisTests/core/test_atomselections.py

richardjgowers · 2020-12-05T16:58:28Z

testsuite/MDAnalysisTests/core/test_atomselections.py

+
+
+@pytest.mark.parametrize("selstr, n_atoms", [
+    ("aromaticity", 5),


So I think aromaticity is the wrong word for this, should be not aromatic. Should there be some way of overriding the default keyword used by the TopologyAttrMeta or should exceptions just require a Selection class to be manually written (current behaviour)

aromatic still exists, the metaclass just dumbly checks whether the attrname (aromaticity in this case) exists in _SELECTIONDICT; possibly if we switched the contents of _SELECTIONDICT to be classes instead, we could avoid the redundancy? As it is we could just not document weird quirks like this?

To me the options are:

change the attribute name to "aromatic"

change _SELECTIONDICT to have classes instead

do nothing, hope people don't notice the double up

orbeckst

My only major issue is that the new "to" range operator is not mentioned in CHANGELOG.

I'll happily approve as soon this is added – please just ping me again.

Great addition to the selection system!!

package/CHANGELOG

package/MDAnalysis/core/groups.py

lilyminium · 2020-12-10T01:02:41Z

I've merged for now, @richardjgowers, we can modify the behaviour of aromaticity in a future PR?

* Add arbitrary TopologyAttr selection (MDAnalysis#2927) Fixes MDAnalysis#2925 Fixes MDAnalysis#2875 Fixes MDAnalysis#3054 Changes made in this Pull Request: - added a class factory to subclass `core.selection.Selection` for each TopologyAttr - added tokens to `core.selection.SameSelection` - added `FloatRangeSelection` and `BoolSelection` - added negatives, scientific notation and "to" delimiter for ranges * Add ReadTheDocs configuration for PR builds (MDAnalysis#3060) - Adds RTD configuration - Add `environment.yml` for package installation * Remove appveyor * Install MDAnalysis on ReadTheDocs via pip (MDAnalysis#3071) Install via `pip install package/` to build current docs on ReadTheDocs * try stringio * rm metals file * pin pytest * pin pytest on gh actions * Fixes RMSF docstring (Issue MDAnalysis#2806) (MDAnalysis#3033) Fixes the RMSF docstring's align command and adds transformation to make the results accurate * MAINT: simplify guessers regex (MDAnalysis#3085) * the `SYMBOLS` regex in `guessers.py` does not require any escape sequences because the metacharacters are inactive in the character class (this includes the range metacharacter when placed at the start or end of the character class) * MAINT: char class regex improve * avoid the overhead of a regex character class when that character class has only a single character (i.e., serves no purpose) * there is only one instance of this in MDA codebase discovered by my [scraping code](https://github.com/tylerjereddy/regex-improve) * for a longer explanation see my similar changes in NumPy codebase: numpy/numpy#18083 * Fix syntax warning over comparison of literals using is. * Quick fix for atommethods to return empty residue group (MDAnalysis#3089) Returns empty residue group for _get_prev_residues_by_resid and _get_next_residues_by_resid * Add to authors list. Co-authored-by: Lily Wang <[email protected]> Co-authored-by: IAlibay <[email protected]> Co-authored-by: Tyler Reddy <[email protected]> Co-authored-by: Lily Wang <[email protected]> Co-authored-by: Irfan Alibay <[email protected]> Co-authored-by: Oliver Beckstein <[email protected]> Co-authored-by: Karthikeyan Singaravelan <[email protected]> Co-authored-by: Aditya Kamath <[email protected]>

Fixes MDAnalysis#2925 Fixes MDAnalysis#2875 Fixes MDAnalysis#3054 Changes made in this Pull Request: - added a class factory to subclass `core.selection.Selection` for each TopologyAttr - added tokens to `core.selection.SameSelection` - added `FloatRangeSelection` and `BoolSelection` - added negatives, scientific notation and "to" delimiter for ranges

lilyminium force-pushed the topattr-token branch 3 times, most recently from 96d3241 to e965e9b Compare September 1, 2020 13:20

richardjgowers reviewed Sep 1, 2020

View reviewed changes

orbeckst reviewed Sep 1, 2020

View reviewed changes

lilyminium force-pushed the topattr-token branch 6 times, most recently from 0601443 to f2d2c5f Compare September 2, 2020 05:46

lilyminium commented Sep 2, 2020

View reviewed changes

package/MDAnalysis/core/selection.py Show resolved Hide resolved

lilyminium requested a review from orbeckst September 8, 2020 12:27

orbeckst requested changes Sep 8, 2020

View reviewed changes

lilyminium force-pushed the topattr-token branch from a881f87 to 333b70e Compare September 23, 2020 04:48

richardjgowers reviewed Sep 23, 2020

View reviewed changes

lilyminium force-pushed the topattr-token branch from 63bb636 to 9406aac Compare September 25, 2020 02:49

lilyminium commented Sep 25, 2020

View reviewed changes

lilyminium force-pushed the topattr-token branch 3 times, most recently from c033715 to 44e9895 Compare October 8, 2020 05:26

lilyminium added 5 commits December 4, 2020 14:04

add arbitrary topologyattr selection

587cb13

update RangeSelection to parse spaces

978f95c

removed extra classes

2bc7958

explicitly add classes to module for pickle

eead880

add tests for warnings/errors

8945c11

lilyminium added 2 commits December 4, 2020 14:05

raw strings

7418fcc

use int selection pattern in resid selection

a5e98d1

lilyminium force-pushed the topattr-token branch from 44e9895 to a5e98d1 Compare December 4, 2020 03:25

pep8

3384126

lilyminium force-pushed the topattr-token branch from 8dd5e33 to 3384126 Compare December 4, 2020 04:13

lilyminium requested a review from orbeckst December 4, 2020 21:05

orbeckst requested changes Dec 4, 2020

View reviewed changes

lilyminium force-pushed the topattr-token branch from 4759da0 to 4db7800 Compare December 5, 2020 06:53

review comments

bf81063

lilyminium force-pushed the topattr-token branch from 4db7800 to bf81063 Compare December 5, 2020 06:54

change float range selection to be end-inclusive with >= and <=

fc68f03

richardjgowers reviewed Dec 5, 2020

View reviewed changes

testsuite/MDAnalysisTests/core/test_atomselections.py Show resolved Hide resolved

richardjgowers reviewed Dec 5, 2020

View reviewed changes

testsuite/MDAnalysisTests/core/test_atomselections.py Show resolved Hide resolved

richardjgowers reviewed Dec 5, 2020

View reviewed changes

add more tests for spacing

587fdfb

lilyminium requested review from orbeckst and richardjgowers December 7, 2020 22:45

orbeckst requested changes Dec 8, 2020

View reviewed changes

package/CHANGELOG Show resolved Hide resolved

package/MDAnalysis/core/groups.py Outdated Show resolved Hide resolved

better changelog

f1cb79e

orbeckst approved these changes Dec 8, 2020

View reviewed changes

RMeli mentioned this pull request Dec 8, 2020

MOL2 parser populates elements attribute #3063

Merged

4 tasks

lilyminium merged commit 7746207 into MDAnalysis:develop Dec 10, 2020

lilyminium deleted the topattr-token branch December 10, 2020 01:02

IAlibay mentioned this pull request May 13, 2021

WIP: Master merge #3299

Merged

fiona-naughton added Component-Selections enhancement labels Sep 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add arbitrary TopologyAttr selection #2927

Add arbitrary TopologyAttr selection #2927

lilyminium commented Sep 1, 2020 •

edited

Loading

pep8speaks commented Sep 1, 2020 •

edited

Loading

richardjgowers left a comment

lilyminium commented Sep 1, 2020 via email

orbeckst Sep 1, 2020 •

edited

Loading

orbeckst left a comment

codecov bot commented Sep 2, 2020 •

edited

Loading

orbeckst left a comment

richardjgowers Sep 23, 2020

lilyminium Sep 23, 2020

richardjgowers Sep 23, 2020

orbeckst Sep 24, 2020

lilyminium Sep 25, 2020

lilyminium Sep 25, 2020 •

edited

Loading

lilyminium Sep 25, 2020

orbeckst Dec 4, 2020

orbeckst Dec 4, 2020

lilyminium Dec 5, 2020

orbeckst left a comment

orbeckst Dec 4, 2020

richardjgowers Dec 5, 2020

lilyminium Dec 5, 2020

richardjgowers Dec 5, 2020

lilyminium Dec 5, 2020

orbeckst left a comment

lilyminium commented Dec 10, 2020



		@pytest.mark.parametrize("selstr, n_atoms", [
		("aromaticity", 5),

Add arbitrary TopologyAttr selection #2927

Add arbitrary TopologyAttr selection #2927

Conversation

lilyminium commented Sep 1, 2020 • edited Loading

PR Checklist

pep8speaks commented Sep 1, 2020 • edited Loading

Comment last updated at 2020-12-08 00:45:53 UTC

richardjgowers left a comment

Choose a reason for hiding this comment

lilyminium commented Sep 1, 2020 via email

orbeckst Sep 1, 2020 • edited Loading

Choose a reason for hiding this comment

orbeckst left a comment

Choose a reason for hiding this comment

codecov bot commented Sep 2, 2020 • edited Loading

Codecov Report

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lilyminium Sep 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orbeckst left a comment

Choose a reason for hiding this comment

lilyminium commented Dec 10, 2020

lilyminium commented Sep 1, 2020 •

edited

Loading

pep8speaks commented Sep 1, 2020 •

edited

Loading

orbeckst Sep 1, 2020 •

edited

Loading

codecov bot commented Sep 2, 2020 •

edited

Loading

lilyminium Sep 25, 2020 •

edited

Loading