Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slicing fixes #916

Merged
merged 28 commits into from
Aug 2, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
93b52b0
Reformatting commit
jdetle Jul 29, 2016
9a58b1b
RMSF fix
jdetle Jul 27, 2016
8145421
Fixed DCD defaults and documentation.
jdetle Jul 27, 2016
0649737
Fixed analysis docs
jdetle Jul 27, 2016
9e7d439
Check_slice index fix
jdetle Jul 27, 2016
e9b6586
DCD fix
jdetle Jul 27, 2016
04dc85d
Changed RMSFarray to reflect correct data
jdetle Jul 27, 2016
1983709
Updated changelog [skip ci]
jdetle Jul 27, 2016
cb5852e
Added test for slicing, slowly changing DCD source for timeseries sli…
jdetle Jul 27, 2016
1b1d4f2
DCD.py changes
jdetle Jul 27, 2016
320b1dc
Everything working.
jdetle Jul 27, 2016
55910a0
Change to ++
jdetle Jul 28, 2016
83c7e4f
Elaborated on docs
jdetle Jul 28, 2016
beffbea
Added generator but it appears to not be working, please help.
jdetle Jul 28, 2016
8e24480
Deprecated skip for step
jdetle Jul 28, 2016
d687cc7
Changed to warnings.warn() because decorator wasnt working
jdetle Jul 29, 2016
6893abd
Fixed silly correl bug
jdetle Jul 29, 2016
6b27588
Fixed issue caused by integer division, when stop-start / nframes has…
jdetle Jul 29, 2016
32fce06
Reduced number of tests by a signifcant margin, removed test of test
jdetle Jul 30, 2016
68ae897
Fixed incomplete logic
jdetle Jul 30, 2016
6f9e57d
Changes to DCD correl functions
jdetle Jul 31, 2016
e476ee5
Changed to logic to take advantage of new default, simplify code.
jdetle Jul 31, 2016
d036859
I made a silly mistake in Timeseries
jdetle Jul 31, 2016
0c268ed
More testing
jdetle Jul 31, 2016
d3d4dad
Added comments.
jdetle Jul 31, 2016
6833e3a
Undo comment removal [skip ci]
jdetle Jul 31, 2016
76ec586
Added tests that cause failures
jdetle Aug 1, 2016
0f62787
Added known failures
jdetle Aug 1, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion package/CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,8 @@ Fixes
* Display of Deprecation warnings doesn't affect other modules anymore (Issue #754)
* Changed nframes to n_frames in analysis modules for consistency (Issue #890)
* fixed incompatibility with newer matplotlib in analysis.hole

* Fixed modules that improperly documented and/or used frame slicing
defaults (#914)
Changes
* Added protected variable _frame_index to to keep track of frame iteration
number in AnalysisBase
Expand Down
8 changes: 5 additions & 3 deletions package/MDAnalysis/analysis/contacts.py
Original file line number Diff line number Diff line change
Expand Up @@ -396,11 +396,13 @@ def __init__(self, u, selection, refgroup, method="hard_cut", radius=4.5,
dictionary of additional kwargs passed to `method`. Check
respective functions for reasonable values.
start : int, optional
First frame of trajectory to analyse, Default: 0
First frame of trajectory to analyse, Default: None becomes 0.
stop : int, optional
Last frame of trajectory to analyse, Default: -1
Frame index to stop analysis. Default: None becomes
n_frames. Iteration stops *before* this frame number,
which means that the trajectory would be read until the end.
step : int, optional
Step between frames to analyse, Default: 1
Step between frames to analyse, Default: None becomes 1.

"""
self.u = u
Expand Down
8 changes: 5 additions & 3 deletions package/MDAnalysis/analysis/diffusionmap.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,11 +227,13 @@ def __init__(self, u, select='all', metric=rmsd, cutoff=1E0-5,
weights : array, optional
Weights to be given to coordinates for metric calculation
start : int, optional
First frame of trajectory to analyse, Default: 0
First frame of trajectory to analyse, Default: None becomes 0.
stop : int, optional
Last frame of trajectory to analyse, Default: -1
Frame index to stop analysis. Default: None becomes
n_frames. Iteration stops *before* this frame number,
which means that the trajectory would be read until the end.
step : int, optional
Step between frames to analyse, Default: 1
Step between frames to analyse, Default: None becomes 1.
"""
self._u = u
traj = self._u.trajectory
Expand Down
8 changes: 5 additions & 3 deletions package/MDAnalysis/analysis/polymer.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,11 +59,13 @@ def __init__(self, atomgroups, **kwargs):
List of atomgroups. Each atomgroup should represent a single
polymer chain, ordered in the correct order.
start : int, optional
First frame of trajectory to analyse, Default: 0
First frame of trajectory to analyse, Default: None becomes 0.
stop : int, optional
Last frame of trajectory to analyse, Default: -1
Last frame of trajectory to analyse, Default: None becomes
n_frames.
step : int, optional
Step between frames to analyse, Default: 1
Frame index to stop analysis. Default: None becomes
n_frames. Iteration stops *before* this frame number.
"""
super(PersistenceLength, self).__init__(
atomgroups[0].universe.trajectory, **kwargs)
Expand Down
12 changes: 8 additions & 4 deletions package/MDAnalysis/analysis/rms.py
Original file line number Diff line number Diff line change
Expand Up @@ -556,7 +556,7 @@ def __init__(self, atomgroup):
self.atomgroup = atomgroup
self._rmsf = None

def run(self, start=0, stop=-1, step=1, progout=10, quiet=False):
def run(self, start=None, stop=None, step=None, progout=10, quiet=False):
"""Calculate RMSF of given atoms across a trajectory.

This method implements an algorithm for computing sums of squares while
Expand All @@ -565,11 +565,13 @@ def run(self, start=0, stop=-1, step=1, progout=10, quiet=False):
Parameters
----------
start : int (optional)
starting frame [0]
starting frame, default None becomes 0.
stop : int (optional)
stopping frame [-1]
Frame index to stop analysis. Default: None becomes
n_frames. Iteration stops *before* this frame number,
which means that the trajectory would be read until the end.
step : int (optional)
step between frames [1]
step between frames, default None becomes 1.
progout : int (optional)
number of frames to iterate through between updates to progress
output; ``None`` for no updates [10]
Expand All @@ -583,6 +585,8 @@ def run(self, start=0, stop=-1, step=1, progout=10, quiet=False):
Calculating Corrected Sums of Squares and Products." Technometrics
4(3):419-420.
"""
traj = self.atomgroup.universe.trajectory
start, stop, step = traj.check_slice_indices(start, stop, step)
sumsquares = np.zeros((self.atomgroup.n_atoms, 3))
means = np.array(sumsquares)

Expand Down
43 changes: 33 additions & 10 deletions package/MDAnalysis/coordinates/DCD.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@
import numpy as np
import struct
import types
import warnings

from ..core import flags
from .. import units as mdaunits # use mdaunits instead of units to avoid a clash
Expand All @@ -85,6 +86,7 @@
from . import dcdtimeseries



class Timestep(base.Timestep):
#: Indices into :attr:`Timestep._unitcell` (``[A, gamma, B, beta, alpha,
#: C]``, provided by the :class:`DCDReader` C code) to pull out
Expand Down Expand Up @@ -498,22 +500,33 @@ def _read_frame(self, frame):
ts.frame = frame
return ts

def timeseries(self, asel, start=0, stop=-1, skip=1, format='afc'):
def timeseries(self, asel, start=None, stop=None, step=None, skip=None,
format='afc'):
"""Return a subset of coordinate data for an AtomGroup

:Arguments:
*asel*
:class:`~MDAnalysis.core.AtomGroup.AtomGroup` object
*start, stop, skip*
range of trajectory to access, start and stop are inclusive
*start, stop, step*
A range of the trajectory to access, with start being inclusive
and stop being exclusive.
*format*
the order/shape of the return data array, corresponding
to (a)tom, (f)rame, (c)oordinates all six combinations
of 'a', 'f', 'c' are allowed ie "fac" - return array
where the shape is (frame, number of atoms,
coordinates)
:Deprecated:
*skip*
Skip has been deprecated in favor of the standard keyword step.
"""
start, stop, skip = self.check_slice_indices(start, stop, skip)
if skip is not None:
Copy link
Member

@orbeckst orbeckst Jul 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here add a deprecation warning, skip to be removed in 1.0.

Also add a .. versionchanged:: 0.16.0 that explains how skip is now step and behaves differently; you can add a link to issue #818.

step = skip
warnings.warn("Skip is deprecated and will be removed in"
"in 1.0. Use step instead.",
category=DeprecationWarning)

start, stop, step = self.check_slice_indices(start, stop, step)
if len(asel) == 0:
raise NoDataError("Timeseries requires at least one atom to analyze")
if len(format) != 3 and format not in ['afc', 'acf', 'caf', 'cfa', 'fac', 'fca']:
Expand All @@ -522,26 +535,36 @@ def timeseries(self, asel, start=0, stop=-1, skip=1, format='afc'):
# Check if the atom numbers can be grouped for efficiency, then we can read partial buffers
# from trajectory file instead of an entire timestep
# XXX needs to be implemented
return self._read_timeseries(atom_numbers, start, stop, skip, format)
return self._read_timeseries(atom_numbers, start, stop, step, format)

def correl(self, timeseries, start=0, stop=-1, skip=1):
def correl(self, timeseries, start=None, stop=None, step=None, skip=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test that step in correl behaves the way it should? correl has a different code base for reading, see coordinates/dcdtimeseries.pyx.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not addressed this, thanks.

Copy link
Contributor Author

@jdetle jdetle Jul 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orbeckst When you are saying 'behaves the way it should', you mean that start stop and step reflect the same indices in a trajectory as numpy array slicing? If this is what you mean, it definitely does not, I can go in and change the pyrex code to do that in the same fashion that I did for dcd.c. With that being said, it is more complicated to test, I would have to think about going about doing that for a little bit.

The second interpretation of what you said, just handing the new defaults correctly, also hasn't been addressed, and is easier to test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, all of these functions should behave consistently, eg in the same way as in timeseries and general trajectory slicing. If we are doing the cleanup everywhere else then we need to do it here, to.

On 30 Jul, 2016, at 13:39, John Detlefs [email protected] wrote:

@orbeckst When you are saying 'behaves the way it should, you mean that start stop and step reflect the same indices in a trajectory as numpy array slicing? If this is what you mean, It definitely does not, I can go in and change the pyrex code to do that in the same fashion that I did fordcd.c`. With that being said, it is more complicated to test, I would have to think about going about doing that for a little bit.

Oliver Beckstein * [email protected]
skype: orbeckst * [email protected]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. This PR was a bit of a pandora's box I guess 😃

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On 30 Jul, 2016, at 14:51, John Detlefs [email protected] wrote:

Alright. This PR was a bit of a pandora's box I guess 😃

More like a hornets' nest.

But yes, you're right. Much more than "yeah, just fix the docs".

Oliver Beckstein * [email protected]
skype: orbeckst * [email protected]

"""Populate a TimeseriesCollection object with timeseries computed from the trajectory

:Arguments:
*timeseries*
:class:`MDAnalysis.core.Timeseries.TimeseriesCollection`
*start, stop, skip*
subset of trajectory to use, with start and stop being inclusive
*start, stop, step*
A subset of the trajectory to use, with start being inclusive
and stop being exclusive.
:Deprecated:
*skip*
Skip has been deprecated in favor of the standard keyword step.
"""
start, stop, skip = self.check_slice_indices(start, stop, skip)
if skip is not None:
step = skip
warnings.warn("Skip is deprecated and will be removed in"
"in 1.0. Use step instead.",
category=DeprecationWarning)

start, stop, step = self.check_slice_indices(start, stop, step)
atomlist = timeseries._getAtomList()
format = timeseries._getFormat()
lowerb, upperb = timeseries._getBounds()
sizedata = timeseries._getDataSize()
atomcounts = timeseries._getAtomCounts()
auxdata = timeseries._getAuxData()
return self._read_timecorrel(atomlist, atomcounts, format, auxdata,
sizedata, lowerb, upperb, start, stop, skip)
sizedata, lowerb, upperb, start, stop, step)

def close(self):
if self.dcdfile is not None:
Expand Down
2 changes: 1 addition & 1 deletion package/MDAnalysis/coordinates/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -1230,7 +1230,7 @@ def check_slice_indices(self, start, stop, step):
----------
start, stop, step : int or None
Values representing the slice indices.
Can use `None` to use defaults of (0, -1, and 1)
Can use `None` to use defaults of (0, n_frames, and 1)
respectively.

Returns
Expand Down
30 changes: 22 additions & 8 deletions package/MDAnalysis/coordinates/dcdtimeseries.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ cdef extern from "correl.h":

import numpy as np

def __read_timecorrel(object self, object atoms, object atomcounts, object format, object auxdata, int sizedata, int lowerb, int upperb, int start, int stop, int skip):
def __read_timecorrel(object self, object atoms, object atomcounts, object format, object auxdata, int sizedata, int lowerb, int upperb, int start, int stop, int step):
cdef dcdhandle* dcd
cdef numpy.ndarray atomlist, atomcountslist, auxlist
cdef numpy.ndarray data, temp
Expand All @@ -69,8 +69,9 @@ def __read_timecorrel(object self, object atoms, object atomcounts, object forma

dcd = <dcdhandle*>PyCObject_AsVoidPtr(self._dcd_C_ptr)
cdef int n_frames
if (stop == -1): stop = dcd.nsets
n_frames = (stop-start+1) / skip

n_frames = ((stop-start) / step);
if (stop-start) % step > 0 : n_frames += 1;
cdef int numdata
numdata = len(format)
if numdata==0:
Expand Down Expand Up @@ -109,18 +110,31 @@ def __read_timecorrel(object self, object atoms, object atomcounts, object forma
cdef int index, numskip
cdef int i, j
cdef float unitcell[6]
cdef int remaining_frames = stop-start
numskip = 0
for i from 0 <= i < n_frames:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment said "We should check for fixed atoms here but we have not implemented it".

I don't think that you are addressing fixed atoms anywhere (it's an obscure feature in CHARMM DCD files where only the first frame contains the positions of the frozen atoms and all later frames do not have these positions, which are unchanging.) Therefore, please leave this comment in.

if (skip > 1):
if (step > 1 and i > 0):
# Check if we have fixed atoms
# XXX not done
numskip = skip - (dcd.setsread % skip) - 1
# Figure out how many steps to step over, if step = n, np array
# slicing treats this as skip over n-1, read the nth.
numskip = step - 1
# If the number to skip is greater than the number of frames left
# to be jumped over, just take one more step to reflect np slicing
# if there is a remainder, guaranteed to have at least one more
# frame.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good, thanks.

if(remaining_frames < numskip):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do?

Can you add a comment to explain, please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a comment, thanks.

Copy link
Member

@orbeckst orbeckst Jul 31, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks – I am interested in the summary of thoughts that go into these few lines (not "subtract one from step and call it numskip"....). Clearly, this is one of the key points where you put thought into so for others to quickly catch up it is important to get them on the right track when reading the code.

numskip = 1

rc = skip_dcdstep(dcd.fd, dcd.natoms, dcd.nfixed, dcd.charmm, numskip)
if (rc < 0):
raise IOError("Error skipping frame from DCD file")
dcd.setsread = dcd.setsread + numskip
# on first iteration, numskip = 0, first set is always read.
dcd.setsread += numskip
rc = read_dcdsubset(dcd.fd, dcd.natoms, lowerb, upperb, tempX, tempY, tempZ, unitcell, dcd.nfixed, dcd.first, dcd.freeind, dcd.fixedcoords, dcd.reverse, dcd.charmm)
dcd.first=0
dcd.setsread = dcd.setsread + 1
dcd.first = 0
dcd.setsread += 1
remaining_frames = stop - dcd.setsread
if (rc < 0):
raise IOError("Error reading frame from DCD file")
# Copy into data array based on format
Expand Down
Loading