-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XDR file seeks and tells working again on large files (closes #677) #678
Conversation
@@ -27,6 +27,7 @@ cdef extern from 'include/xdrfile.h': | |||
XDRFILE* xdrfile_open (char * path, char * mode) | |||
int xdrfile_close (XDRFILE * xfp) | |||
int xdr_seek(XDRFILE *xfp, int64_t pos, int whence) | |||
int64_t xdr_tell(XDRFILE *xfp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused import
In [10]: u = mda.Universe(top, traj)
#Until frame 25138, it works
In [11]: target_frame = 25138
In [12]: u.trajectory[target_frame]
Out[12]: < Timestep 25138 with unit cell dimensions [ 241.50689697 243.38421631 87.10479736 90. 90. 90. ] >
In [13]: print(u.trajectory._xdr.offsets[target_frame], len(u.atoms))
(4294835832, 45904)
# From frame 25139, it fails
In [14]: target_frame = 25139
In [15]: u.trajectory[target_frame]
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-15-fa427d781897> in <module>()
----> 1 u.trajectory[target_frame]
/home/jon/dev/mdanalysis/package/MDAnalysis/coordinates/base.pyc in __getitem__(self, frame)
1162 if isinstance(frame, int):
1163 frame = apply_limits(frame)
-> 1164 return self._read_frame(frame)
1165 elif isinstance(frame, (list, np.ndarray)):
1166 def listiter(frames):
/home/jon/dev/mdanalysis/package/MDAnalysis/coordinates/XDR.pyc in _read_frame(self, i)
140 self._xdr.seek(i)
141 self._frame = i - 1
--> 142 return self._read_next_timestep()
143
144 def _read_next_timestep(self, ts=None):
/home/jon/dev/mdanalysis/package/MDAnalysis/coordinates/XDR.pyc in _read_next_timestep(self, ts)
148 if ts is None:
149 ts = self.ts
--> 150 frame = self._xdr.read()
151 self._frame += 1
152 self._frame_to_ts(frame, ts)
/home/jon/dev/mdanalysis/package/MDAnalysis/lib/formats/xdrlib.pyx in MDAnalysis.lib.formats.xdrlib.XTCFile.read (MDAnalysis/lib/formats/xdrlib.c:7672)()
590 <rvec*>xyz.data, <float*> &prec)
591 if return_code != EOK and return_code != EENDOFFILE:
--> 592 raise RuntimeError('XTC Read Error occured: {}'.format(
593 error_message[return_code]))
594
RuntimeError: XTC Read Error occured: magic |
Nevermind, I did not clean the offset... (Isn't the validity of the offset supposed to be checked by trying to seek to the last frame? If so, it should not have used the old offset.) |
See #656 and #631. Currently the offsets aren't validated because I didn't find the old approach robust enough. In #656 I'm working on a new approach that the offsets are reloaded once we notice that they fail. |
On error xtc_seek now returns the system errno value, which is printed as is. This is more informative and fixes an IndexError looking up the error message, that occurred when xtc_seek failed and returned exdrNR. Added testing for _bytes_seek and _bytes_tell, also beyond 4GB filesize limits.
6eb9a77
to
4b634b2
Compare
I left the offset types as I added the low-level As had already been discussed in #441, For reference I got |
cdef int64_t offst | ||
|
||
if whence == "SEEK_CUR": | ||
whn = SEEK_CUR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a dictionary
""" | ||
cdef int offset | ||
cdef int64_t offset | ||
cdef int ok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same goes here. We don't need to create variables with the correct type for return values of functions. Cython takes care of that. But for arguments to C functions it is good to avoid ambiguity.
Looks good to me besides the comments. |
@kain88-de, for consistency, I was going through the code and there are many other initializations as |
Nope I just checked cython generates these variables with the correct type for us. So no overhead.
Yes I have been inconsistent there. It should be compared to EOK for clarity. You can do that if you want. Otherwise I'll do it myself.
I doubt that. You can check the produced C-code yourself by compiling the |
Ok, then I clean all these cdefs only for return codes, even if they're later compared to error codes. |
Thank you. |
@mnmelo. Looks good. You can merge it if you are done. |
XDR file seeks and tells working again on large files (closes #677)
Did not add tests, as they'd require a >2GB test file (a reason why this bug cropped up again after having been addressed in the past).
Also tested in a 32 bit vagrant box. For the record, the only failing test there was:
As discussed in #677, I'd vote for restoring the old behavior of
_xdr.tell()
, mostly for debugging convenience.