Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

different results of select_atoms in 0.17.0 and 0.16.2 #1795

Closed
tljm opened this issue Mar 1, 2018 · 13 comments
Closed

different results of select_atoms in 0.17.0 and 0.16.2 #1795

tljm opened this issue Mar 1, 2018 · 13 comments

Comments

@tljm
Copy link

tljm commented Mar 1, 2018

I call select_atoms method with following selection: '(resname WAT or resname HOH) and (sphzone 6.0 (resnum 99 147 231 261 289))'

Expected behaviour

I expect to get exactly the same results in 0.16.2 and 0.17.0 - 5 water molecules.

Actual behaviour

Version 0.16.2 gives 5 water molecules.
Version 0.17.0 gives 31 water molecules, completely different molecules.

@tylerjereddy
Copy link
Member

It would be helpful if you could provide a minimum working example for the issue -- for example a small sample input file to go along with this report.

@tljm
Copy link
Author

tljm commented Mar 2, 2018

Yes, sure. I should be more precise. You can reproduce this problem with following code::

import MDAnalysis as mda
U = mda.Universe('atom_selection_problem.pdb','atom_selection_problem.dcd')
U.select_atoms('(resname WAT or resname HOH) and (sphzone 6.0 (resnum 99 or resnum 147 or resnum 231 or resnum 261 or resnum 289))').resnums

Run this code with 0.16.2 and 0.17.0 and compare results. On my system, 0.16.2 gives 328, 411, 444, 495, 835 but 0.17.0 gives 820, 1518, 1633, 2280, 2325, 2518, 2562, 2583, 2924, 3251, 3426, 3543, 3791, 3854, 3871, 4243, 4397, 4463, 4566, 4815, 4932, 5638, 5642, 5734, 6081, 6254, 6374, 6390, 6425, 6513, 6736, 7044, 7353, 7416, 7534, 7608, 7768, 8436, 8438, 8545.

So, in case of 0.17.0 it is 40 residues (not 31). Any way, it looks very strange.

atom_selection_problem.zip

@orbeckst
Copy link
Member

orbeckst commented Mar 6, 2018

Once someone reproduces (ideally with a test file) then we need to label this as defect and fix.

@orbeckst orbeckst added this to the 0.18 milestone Mar 6, 2018
@tylerjereddy tylerjereddy self-assigned this Mar 6, 2018
@tylerjereddy
Copy link
Member

Tentatively assigning myself to take a look for a bit here.

@tylerjereddy
Copy link
Member

I've bisected over roughly two years of history to find this:

4baa8a923bd03ee24e03c32e01b74e010f1d608a is the first bad commit
commit 4baa8a923bd03ee24e03c32e01b74e010f1d608a
Author: Jose Borreguero <[email protected]>
Date:   Mon Dec 11 14:31:07 2017 -0500

    Refs #1731 Spherical zone selection test

:040000 040000 14bbda4d87e974d3da08a90cc720dad185fc93da 02e107da22c317a4820d78da67d98bb8d9547e3d M	package
:040000 040000 6d74b8b380ab473adf1fe58d1704acfee8dc9050 494c99c09554c27c2c4029ad96adfe16ce61b7c1 M	testsuite

@tylerjereddy
Copy link
Member

self.periodic is set to True via flags when _apply_distmat is called within SphericalZoneSelection at the first problematic commit hash. I can remove the "regression" by forcing the value of self.periodic to be None just prior to the usage in _apply_distmat.

I thought we were going to remove those pbc flags in favor of some kid of keywords -- I'll take a look at the same location in code at most recent commit hash.

@orbeckst
Copy link
Member

orbeckst commented Mar 7, 2018

Is the difference applying PBC by default?

Are the additionally found waters across the box's periodic boundary?

@orbeckst
Copy link
Member

orbeckst commented Mar 7, 2018

BTW, @tylerjereddy , please add the defect tag as soon as you conclude that there is bug.

Many thanks for looking into it (and +1 for bisect – did you write a script to test the condition?)

@tylerjereddy
Copy link
Member

The difference is calling center_of_geometry(pbc=periodic) where the presence of ANY box info will switch pbc to True. I haven't gone as far as a visual inspection in VMD.

Currently, we have:

periodic = box is not None

At the crucial location. At first, this looks bad, since a user might want to avoid PBC even if they have box info. However, the value of box is actually verified by checking both that there are dimensions and that self.periodic is True by self.validate_dimensions.

In the DistanceSelection parent class, self.periodic is still set by the flags, even in the latest commit hash, and my testing shows that it is indeed True by default. Does the default True setting surprise you?

As far as I can tell, the first bad commit hash vs. its first ancestor diff does not show any changes to the core pbc flags default -- so my current hypothesis is that these are somehow being set to True by deafult and the activation of the pbc feature for sphzone selections is not necessarily wrong, but is a victim of default pbc activation flags?

By the way, using the latest commit hash build of dev branch in IPython:

In [1]: import MDAnalysis

In [2]: MDAnalysis.core.flags
Out[2]: [('use_periodic_selections', True), ('use_KDTree_routines', 'fast'), ('length_unit', 'Angstrom'), ('use_pbc', False), ('convert_lengths', True), ('force_unit', 'kJ/(mol*Angstrom)'), ('time_unit', 'ps'), ('speed_unit', 'Angstrom/ps'), ('charge_unit', 'e')]

Unless I'm missing something, I propose that we write a unit test to verify that the default use_periodic_selections core flag is False, and that when flags are removed in the future, the test will be mutated accordingly.

I've informally confirmed that using mda.core.flags['use_periodic_selections'] = False at the top of my bisect debug script produces the original behavior.

I note that none of this is commenting on the correctness of either selection from a geometric standpoint, simply addressing the perceived regression.

@tylerjereddy
Copy link
Member

Note that ('use_pbc', False) is default, but that's actually a different flag that is not involved here!

tylerjereddy added a commit that referenced this issue Mar 7, 2018
* Fixes Issue #1795 by setting the
default value of use_periodic_selections
flag to False.

* Added corresponding unit test
@richardjgowers
Copy link
Member

From what I can tell, the bug here is that calling COG with pbc is wrong and doesn't do what you need. Eg for a molecule straddling the box boundaries, the COG is somewhere in the center of the box and has no relation to the position of the molecule. This is very similar to #1760

@orbeckst orbeckst added the PBC label Mar 7, 2018
tylerjereddy added a commit that referenced this issue Mar 9, 2018
* sphzone selections adjusted to address
Issue #1795

* corresponding unit test modified to
prevent regression only; proper handling
of pbc and wrapping matters is delayed
for now
tylerjereddy added a commit that referenced this issue Mar 13, 2018
@orbeckst orbeckst modified the milestones: 0.18, 0.18.x May 1, 2018
@richardjgowers richardjgowers mentioned this issue Aug 17, 2018
9 tasks
@richardjgowers
Copy link
Member

@tylerjereddy I had a quick look at this, I think this is still as issue this line shouldn't be using pbc=True ever.

If I try and do the selection manually, I get 15 molecules, which is annoyingly neither of the previously reported answers..

g1 = u.select_atoms('resname WAT HOH')
g2 = u.select_atoms('resnum 99 147 231 261 289')
center = g2.center_of_geometry()
da = mda.lib.distances.distance_array(g1.positions, center)
len(np.where(da < 6.0)[0])

Unless I'm misunderstanding what a sphzone does

I'll do some time travelling later to see where differences started happening...

@tljm
Copy link
Author

tljm commented Sep 6, 2018

@richardjgowers I think that manual selection is correct. You are getting 15 atoms which is 5 water molecules.

If you try something like:

np.unique(g1.atoms[np.where(da < 6.0)[0]].resids)

you will get:

[328, 411, 444, 495, 835]

which is, I believe, correct answer.

richardjgowers added a commit that referenced this issue Sep 14, 2018
SphericalZone and SphericalLayer no longer shift atoms to inside
primary unit cell when calculating center of reference group
@richardjgowers richardjgowers mentioned this issue Sep 14, 2018
4 tasks
richardjgowers added a commit that referenced this issue Sep 21, 2018
* fixes #1795

SphericalZone and SphericalLayer no longer shift atoms to inside
primary unit cell when calculating center of reference group

* additional tests for sphzone
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants