Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Hdf mixin that takes care of type info + general boilerplate #416

Merged
merged 26 commits into from
Sep 16, 2021

Conversation

pmrv
Copy link
Contributor

@pmrv pmrv commented Aug 26, 2021

It came up a few times that DataContainer was subclassed because of the HDF functionality, but wasn't otherwise very suitable, e.g. here and here and generally we have objects that write themselves to HDF but are not based on a general data structure like DataContainer. I've had this sketch for a while now that should make it easier to implement this kind of classes.

As a demonstration I've derived DataContainer from this new mixin, but generally it should be easy to move old classes to this format (expect GenericParameters which handles HDF in a slightly different way).

Test + docs will follow after discussion whether we find this useful.

@pmrv pmrv added the enhancement New feature or request label Aug 26, 2021
@niklassiemer
Copy link
Member

I really like this idea a lot! It might make writing new self-hdf-able classes a lot easier! Maybe you should add the @classmethod from_hdf_args(cls, hdf) somehow, since this one allows to pass in needed __init__ variables upon to_object().

Copy link
Member

@liamhuber liamhuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this. WithHDF took me a second read-through to understand, but I find the use of __enter__ to handle the group name absolutely wonderful.

I posted a couple of minor suggestions to consider.

@@ -0,0 +1,63 @@
from abc import ABC, abstractmethod

class WithHDF:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class WithHDF:
class _WithHDF:

Or?

Comment on lines 33 to 49
@abstractmethod
def _get_hdf_group_name(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@abstractmethod
def _get_hdf_group_name(self):
@property
@abstractmethod
def _hdf_group_name(self):

At least I think that's the right order for the decorators.

Am I missing a side effect?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should work, I didn't do it before because it's a private method anyway, but I can change it now.

hdf["HDF_VERSION"] = self.__hdf_version__

def from_hdf(self, hdf, group_name=None):
group_name = group_name or self._get_hdf_group_name()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicit check for None.

self._to_hdf(hdf)
self._type_to_hdf(hdf)

def rewrite_hdf(self, hdf, group_name=None):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Garbage collection for HDF?

@stale
Copy link

stale bot commented Sep 13, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Sep 13, 2021
@liamhuber liamhuber removed the stale label Sep 13, 2021
@pmrv
Copy link
Contributor Author

pmrv commented Sep 15, 2021

Fixed the test, details in the commits. The problem was that previously DataContainer.from_hdf didn't call hdf.close() if it descended into a group. HasHDF does that now, but since DataContainer passes a reference to HDFStub, this in turn reached into the wrong location in the file (since where hdf points to changes on hdf.close).

Not copying caused an error in lazy DataContainers, because with the new
HasHDF inheritance it would:
1. hdf.open(...) and pass that to DataContainer._from_hdf
2. this method would then pass references to the HDF object to the
   HDFStub
3. HasHDF.from_hdf then calls hdf.close(), this modifies the hdf object
   passed to DataContainer in place, changing where it points to in the
   file
4. At some point HDFStub.load() would be called, but since the place
   where hdf points to changed under its feet HDFStub.load() then
   reaches into the wrong location in the file
@pmrv
Copy link
Contributor Author

pmrv commented Sep 15, 2021

Rebased to resolve merge conflict.

@liamhuber
Copy link
Member

liamhuber commented Sep 15, 2021

Beautiful, tests are passing so I'm happy! If you have no further concerns, please merge it and let's get this thing live :)

Since PyironObjects are the trifecta of python, database, and file storage, we should consider if we want some sort of class Storable(HasHDF, HasStorage) as an intermediate that never needs to be located in the database.

EDIT: spelling of trifecta

@pmrv pmrv merged commit 761483f into master Sep 16, 2021
@delete-merged-branch delete-merged-branch bot deleted the hdf_mixin branch September 16, 2021 11:16
@pmrv
Copy link
Contributor Author

pmrv commented Sep 16, 2021

Beautiful, tests are passing so I'm happy! If you have no further concerns, please merge it and let's get this thing live :)

Since PyironObjects are the trifecta of python, database, and file storage, we should consider if we want some sort of class Storable(HasHDF, HasStorage) as an intermediate that never needs to be located in the database.

EDIT: spelling of trifecta

Since HasStorage just has the input/output container, I'd actually think, that it should just derive from HasHDF.

@liamhuber
Copy link
Member

Since HasStorage just has the input/output container, I'd actually think, that it should just derive from HasHDF.

Yes, this can work. We just need to think a little bit whether (1) HasStorage._to_hdf calls self._storage.to_hdf(), or if (2) HasStorage overrides to_hdf with a super call and the storage saving.

(1) Pro: the ABC is satisfied and people can directly make new classes without ever worrying about storage (as long as everything they want serialized gets slapped onto the storage attribute. Con: If someone does want to save extra stuff they need to remember to call super in _to_hdf.

(2) Pro: _to_hdf maintains its status as "that thing I should do my own work in". Con: anyone inheriting from HasStorage is still stuck implementing _to/_from_hdf every single time, even if they don't need it.

I lean towards (1). Thoughts?

@pmrv
Copy link
Contributor Author

pmrv commented Sep 16, 2021

Yeah, (1) sounds cleaner. Overloading the saving stuff should only be necessary for fundamental object and jobs anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants