-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow lazy loading of DataContainers #367
Conversation
I still have a bit of trouble to understand the full implications of this change. For example for a structure object stored in the HDF5 file, when are the positions loaded at |
Currently they would be loaded when class Job:
...
def from_hdf(self, hdf, group_name):
self._structure = HDFStub(hdf, group_name + '/structure')
...
@property
def structure(self):
if isinstance(self._structure, HDFStub):
self._structure = self._structure.load() # all HDF access for this structure occurs here
return self._structure In the current setup this change only affects how However HDFStub.register('Atoms', lambda hdf, group: Atoms.lazy_from_hdf(hdf, group))
Yeah, will prepare a small demo. |
Only thing left from my side is whether I should enable |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Instead of subclassing for each new lazily loadable type, types may register themselves to the HDF5Stub class with a simple callback. The class then checks the 'NAME' field in the HDF group at loading time against values provided in the register call and use this callback. This allows for the same amount of customization, but has the advantage that you can wrap every HDF5 group in HDF5Stub without checking which subclass is necessary.
I rebased to fix the merge conflicts and will merge will lazy loading enabled by default, if no one objects. |
Also automatically register all subclasses of DataContainer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codacy actually said something meaningful! Otherwise lgtm.
self._group_name = group_name | ||
|
||
@classmethod | ||
def register(cls, type, load): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, codacy complains that this overrides a built-in type...do you know what it's talking about? In pycharm I don't get any complaint, and I was expecting like when you try to use "id" or "dict" as a variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The argument type
is a builtin. It is the base class of all classes and also a function to create new classes (iirc). Since I'm not using this inside this short method, I think it's ok, but I can also come up with a different name.
With this data containers (and potentially other data types) can be loaded "on demand" from HDF5 instead of all at once. In my simple example below this saves ~90% time on load.
The way it works is that in
from_hdf
values of the data container are not loading right away but just the hdf object and the path are saved in a "stub" object, that is then later transparently loaded upon item access.Tests and design spec follow when we discussed this.
This should help with #364 and inspect mode in general. My vision would be that expensive to load objects should be using data container for all storage and can then be
load()
ed withlazy=True
instead ofinspect()
ed.Here's an example that can be run with
ipython -i