-
Notifications
You must be signed in to change notification settings - Fork 659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] u.transfer_to_memory also transfers the box #1537
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests!
@@ -278,10 +278,9 @@ def __init__(self, coordinate_array, order='fac', | |||
"array ({})" | |||
.format(provided_n_atoms, self.n_atoms)) | |||
|
|||
self.dimensions = dimensions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd store this as _dimensions_array
to avoid someone accidentally using it thinking it was ts.dimensions
?
@@ -397,6 +396,11 @@ def _read_next_timestep(self, ts=None): | |||
[self.ts.frame] + | |||
[slice(None)]*(2-f_index)) | |||
ts.positions = self.coordinate_array[basic_slice] | |||
if self.dimensions is not None: | |||
if len(self.dimensions) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self.dimensions = [np.array([1, 2, 3, 4, 5, 6])]
this won't work
package/MDAnalysis/core/universe.py
Outdated
@@ -494,8 +495,22 @@ def transfer_to_memory(self, start=None, stop=None, step=None, | |||
coordinates = [] # TODO: use pre-allocated array | |||
for i, ts in enumerate(self.trajectory[start:stop:step]): | |||
coordinates.append(np.copy(ts.positions)) | |||
dimensions.append(np.copy(ts.dimensions)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
annoyingly this will copy the box even for non NPT systems, but I guess it's a small cost compared to the number of atoms that will be floating around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reliable way to detect a NPT system? I could detect boxes that are [0, 0, 0, 90, 90, 90]
but I am not sure it is much better than copying the box.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's no big deal, and probably safer to do it this way
package/MDAnalysis/core/universe.py
Outdated
'copied to memory (frame {frame})') | ||
pm = ProgressMeter(n_frames, interval=1, | ||
verbose=verbose, format=pm_format) | ||
for i, ts in enumerate(self.trajectory[start:stop:step]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repeated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comes from an issue I have with Readers that implement timeseries
(i.e. the DCD reader): the timeseries
method only returns the coordinates. So whatever the method I use for to get the coordinates, I still need to go through the trajectory to get the box.
I see 3 ways of handling this:
- a
timeseries
like method for the box to go with the one for the positions - having the box reading loginc twice: one if the
timeseries
is used that will iterate over the trajectory only to get the box, and one in the existing loop for readers that have notimeseries
- what I did: having the logic only once, but iterating twice in some cases
Solution 1 is the best but implies to add a chunk to the reader, 2 the most efficient for little new development, 3 is the simplest and avoid logic duplication.
I am not sure what is the best approach here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add the box info reader to the timeseries PR #1400?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be a good idea, indeed. Especially since, it would allow to just ditch the loop in transfer to memory.
I have one doubt, though. The way I see the API, I would add a ProtoReader.timeseries_dimensions
method. This would mean that in all cases transfer_to-memory
will iterate twice over the trajectory. This can be a performance issue with slow readers.
Any idea before I dive in it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timeseries function gets a keyword that tells it the information we would like to retrieve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Factor of 2 in trajectory reading makes a difference for processing large data sets. I/O is probably the biggest issue in speeding up analysis (and getting good parallel performance). I would very much like to have ways to use the fastest approaches that are possible and just fall back to slow solutions if we wouldn't have the feature otherwise.
I think the point of a high-level library such as MDA is that it handles these cases seamlessly. (Otherwise, what's the point vs just using the raw lib.formats readers.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the _timeseries_positions
approach should take care of taking advantage of full-speed when possible.
We must be very clear about this in both the general and the DCD-specific Reader docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if #1400 gets done, then we can just call Reader.timeseries
trajectory.timeseries
and that will use a fast version if a child class implements something fancy (DCD) or fall back onto essentially what is written here.
@kain88-de why do we need an xarray here? Everything is nice and square?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richardjgowers, just for the record: organization-wise I think it makes more sense for specialized children to override Reader.timeseries
and then themselves do the choice of whether to run their own fancy code or fall back to Reader.timeseries
. This way we don't need DCD-specific checks under the base Reader
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sorry, that's what I meant. 1400 implements Reader.timeseries
which DCD overrides later.
This commit makes the memory reader deal with a non-constant box dimensions and makes `Universe.transfer_to_memory copy the box in addition to the positions. Until now, only one box was stored in the memory reader, making it unpractical for NPT systems.
4c18ae9
to
2bf90a2
Compare
I added a |
This commit makes the memory reader deal with a non-constant box
dimensions and makes `Universe.transfer_to_memory copy the box in
addition to the positions.
Until now, only one box was stored in the memory reader, making it
impractical for NPT systems.
Fixes #1041
PR Checklist