frame offset files for all Readers to universaly support random access #314

orbeckst · 2015-06-16T01:01:02Z

I think the persistent frame offsets for Gromacs trajectories are a really neat addition and enable a lot of new analysis workflows that were not really feasible with XTC/TRR before. I'd like to see this framework extended to any of the other readers that do not support random access at the moment (notable exceptions are DCD and netcdf, which can do this automatically – any others?).

If random access is guaranteed to work then it becomes a lot more straight forward to parallelize over trajectory chunks, no matter what the underlying trajectory format is.

There are probably going to be problems with trajectory streams and/or compressed formats (e.g. xyz.bz2) but perhaps someone can think of a solution.

richardjgowers · 2015-06-16T09:47:29Z

I think a good thing to do for this would be to move the offset logic (reading and saving these as made) out of TRR/XTC and maybe into a mixin class which other Readers can then use

dotsdl · 2015-06-16T22:51:55Z

Could we get a checklist of formats that could do with an offsets mechanism? I'd be okay with working on pulling the offsets mechanism out of the xdrfiles TrjReader and generalizing it in a mixin for other Readers later in the summer.

richardjgowers · 2015-06-18T09:27:55Z

TRZ doesn't need to store offsets as the frames are all of equal size, I assume this is how DCD works too?

Uncompressed ASCII formats, (PDB, DL_Poly, XYZ, anything else?) can all save their offsets as they read (this is already done for DL_Poly, just it doesn't store them)

Compressed ASCII might be trickier as Oliver said, but should be possible...... I think the annoying thing is the compressed/uncompressed both go through the same Reader class so any solution needs to handle them immediately.

orbeckst · 2015-06-18T21:44:11Z

@richardjgowers : DCD frames are of fixed size. I think there's a complication with fixed coordinates and perhaps the first frame is different but judging from the comments in the code, fixed coordinates don't work anyway (and I've never used them).

orbeckst · 2015-12-14T20:11:44Z

Note the discussion at #441 regarding the best format to store the offsets on disk. Currently the best solution (according to benchmarks) appears to be numpy.savez.

kain88-de · 2015-12-17T20:34:33Z

Btw Numpy is on the same order of magnitude once you actually read the data. Just doing the dump doesn't read anything. But I'll stick with it because it can handle numpy arrays nicely

https://gist.github.com/kain88-de/5487e9d9c8f7d0e4995b

orbeckst · 2018-07-10T00:25:54Z

The new continuous chain reader code PR #1728 assumes that all readers can do random access, i.e.,

u.trajectory[frame]

I vaguely remember that @kain88-de said in a recent discussion that all our readers can do random access but I cannot find his comment for linking and perhaps I misremember it. EDIT: PR #1086 fixed issue #1081 ("sliced iterators") and it seems it adds random access to any multiframe readers that were not able to do it.

Before closing this issue I'd like to briefly hear from @MDAnalysis/coredevs if I am overlooking something here. Can all our readers do fast random access nowadays? Do we test this anywhere?

And if it is true, we should make sure the docs mention it! (#1827)

richardjgowers · 2018-07-10T00:32:01Z

I think it's true yes. The Readers that didn't use to do this were the ascii based ones, but I hacked those together. I guess it wasn't added to the API because I didn't want it to become a hard requirement even if it was true at the time?

We've also got stream readers, which I've never used, and I'm not sure they can do seeking, I wouldn't imagine so..

orbeckst · 2018-07-10T00:33:08Z

Just saw your comment while I updated mine with your PR #1086.

orbeckst · 2018-07-10T00:36:15Z

I think we can then close this issue. We certainly don't need offset files for all formats.

(We should require any file-based readers to be able to do fast random access, just so that we or rather users are not suddenly experiencing abysmal performance for one trajectory format. But we can discuss this separately.)

orbeckst added enhancement performance persistence Component-Readers labels Jun 16, 2015

orbeckst added this to the 1.0 milestone Jun 16, 2015

richardjgowers mentioned this issue Jun 16, 2015

Cleaned up Reader classes #317

Merged

orbeckst added Priority-Low and removed Priority-Medium labels Jun 20, 2015

richardjgowers mentioned this issue Jul 20, 2015

File descriptors that are opened and closed as needed for Universes #239

Open

richardjgowers mentioned this issue Sep 11, 2015

Added frame seeking to XYZReader #428

Merged

orbeckst mentioned this issue Dec 13, 2015

Cython xdrlib #441

Merged

15 tasks

richardjgowers mentioned this issue Jan 18, 2016

gromacs offsets files can't be checked for correct offsets #631

Closed

dotsdl mentioned this issue Mar 9, 2016

define a MDAnalysis.analysis user interface #719

Closed

jdetle mentioned this issue May 23, 2016

Diffusion Map Implementation in MDAnalysis #857

Closed

orbeckst mentioned this issue Sep 14, 2016

Trajectory slicing should be completely Pythonic #918

Closed

orbeckst mentioned this issue Nov 22, 2016

Support sliced iteration for all Readers #1081

Closed

orbeckst closed this as completed Jul 10, 2018

orbeckst mentioned this issue Nov 8, 2019

Outdated documentation for AMBER TRJReader #2398

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

frame offset files for all Readers to universaly support random access #314

frame offset files for all Readers to universaly support random access #314

orbeckst commented Jun 16, 2015

richardjgowers commented Jun 16, 2015

dotsdl commented Jun 16, 2015

richardjgowers commented Jun 18, 2015

orbeckst commented Jun 18, 2015

orbeckst commented Dec 14, 2015

kain88-de commented Dec 17, 2015

orbeckst commented Jul 10, 2018 •

edited

Loading

richardjgowers commented Jul 10, 2018

orbeckst commented Jul 10, 2018

orbeckst commented Jul 10, 2018

frame offset files for all Readers to universaly support random access #314

frame offset files for all Readers to universaly support random access #314

Comments

orbeckst commented Jun 16, 2015

richardjgowers commented Jun 16, 2015

dotsdl commented Jun 16, 2015

richardjgowers commented Jun 18, 2015

orbeckst commented Jun 18, 2015

orbeckst commented Dec 14, 2015

kain88-de commented Dec 17, 2015

orbeckst commented Jul 10, 2018 • edited Loading

richardjgowers commented Jul 10, 2018

orbeckst commented Jul 10, 2018

orbeckst commented Jul 10, 2018

orbeckst commented Jul 10, 2018 •

edited

Loading