-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ NEW: MSONableNode
data type
#4975
Comments
So I met @mbercx in the kitchen after seeing this. First I said "shut up marnik, you don't know what you are talking about", and he had a little cry. But then we started brainstorming (because I do see where he is coming from), and possibly the aim here, is to be able to wrap a "pure python" class, such that when you call methods on it, which mutate the data, it stores this to the database as: A very rough possible implementation is: from functools import wraps
class PurePythonClass:
def to_json(self) -> dict:
return {}
def mymethod(self):
"""Something"""
return "hi"
def method_wrapper(f, wrapper: 'Wrapper'):
"""A method wrapper that stores data mutations to the aiida database."""
@wraps(f)
def func(*args, **kwds):
output = f(*args, **kwds)
new_state = wrapper._to_json(wrapper._wrapped)
if new_state != wrapper._state:
new_dict = Dict(dict=new_state)
# now create a calcfunction and link it as incoming to the new_dict
new_dict.store()
# memoize this on wrapper class
return output
return func
class Wrapper:
def __init__(self, wrapped, to_json):
self._wrapped = wrapped
self._to_json = to_json
self._state = to_json(wrapped)
def __getattr__(self, name: str):
"""Interject in the method call."""
method = getattr(self._wrapped, name)
if not method:
raise AttributeError(name)
return method_wrapper(method, self)
def __dir__(self):
"""Return method/attribute names for the wrapped class, to allow tab completion."""
return dir(self._wrapped)
pure_inst = PurePythonClass()
inst = Wrapper(pure_inst, lambda inst: inst.to_json())
new_inst = inst.mymethod() again, very rough, but thats my quick two cents |
Hey, I thought we agreed you wouldn't mention this in the issue! 😭 😉 |
Another note: a user might want to track "get" methods which do not mutate the instance that's stored in the wrapper in the provenance as well. An example here could be the |
Never trust an englishman! No jokes aside, it's good to have a mixture of "blue sky thinking" and practicality, and see where we can meet in the middle |
One more thing to consider: If we also store "get" methods in the provenance, it's possible that the output class is different from the class we're calling the method from. Consider again the In [4]: comp_node.get_atomic_fraction('Li')
Out[4]: 0.25 Getting an output node here is simple: a |
Even better would be if we could also use the constructor of a
|
We have thought about this a lot in the past and the tricky part is indeed mutability. Getting a generic class that can wrap anything as long as it implements an node = Structure()
with node.mutating_context():
for site in node.sites:
site.coords += [1, 1, 1] I think it is possible to get the source code that is within a context manager so we can store that on the |
Although I agree this is an issue if used in a loop as above, I'm not sure if it kills the idea if the feature is well-documented and the user understands what is going on. It could in fact be very powerful and easy to use as in the example of my previous comment. The problem here is that the
Indeed, @chrisjsewell had a similar idea, i.e. to store all modifications executed in one |
So I was planning to add this before; it is an iteration on #4975 (comment), whereby rather than automatically storing modifications, it simply stores them in a list, which the user can then call the from functools import wraps
class PurePythonClass:
def to_json(self) -> dict:
return {}
def mymethod(self):
"""Something"""
return "hi"
def method_wrapper(f, name: str, wrapper: 'Wrapper'):
"""A method wrapper that records data mutations."""
@wraps(f)
def func(*args, **kwds):
output = f(*args, **kwds)
new_state = wrapper._to_json(wrapper._wrapped)
if new_state != wrapper._state:
# maybe also store other data here about the mutation
wrapper._mutations.append({"method": name, "incoming": wrapper._state, "outgoing": new_state})
wrapper._state = new_state
if isinstance(output, type(wrapper._wrapped):
# wrap it and store the mutations list
output = ProvenanceWrapper(output, wrapper._to_json)
output._mutations = wrapper.mutations
return output
return func
class ProvenanceWrapper:
"""A wrapper class than remembers mutation of the wrapped class instance."""
def __init__(self, wrapped, to_json):
self._wrapped = wrapped
self._to_json = to_json
self._state = to_json(wrapped)
self._mutations = []
@property
def mutations(self):
"""Current (unstored) mutations."""
return self._mutations[:]
def store(self):
"""Store the mutations to the AiiDA profile."""
# more logic here...
new_dict = Dict(dict=new_state)
new_dict.store()
# now create a calcfunction and link it as incoming to the new_dict, etc
# either record all mutations in a single calcfunction, or have one per mutation
# reset mutations
self._mutations = []
def __getattr__(self, name: str):
"""Interject in the method call."""
method = getattr(self._wrapped, name)
if not method:
raise AttributeError(name)
return method_wrapper(method, name, self)
def __dir__(self):
"""Return method/attribute names for the wrapped class, to allow tab completion."""
return dir(self._wrapped) + ["mutations", "store"]
pure_inst = PurePythonClass()
inst = Wrapper(pure_inst, lambda inst: inst.to_json())
new_inst = inst.mymethod() |
Quick implementation for the import importlib
from aiida.orm import Data
from monty.json import MSONable
class MsonableData(Data):
"""Data plugin that allows to easily wrap objects that are MSONable.
.. note:: As the ``MSONable`` mixin class requires, the wrapped object needs to implement the methods ``as_dict``
and ``from_dict``.
"""
def __init__(self, obj, *args, **kwargs):
"""Construct the node from the pymatgen object."""
if obj is None:
raise TypeError('the `obj` argument cannot be `None`.')
if not isinstance(obj, MSONable):
raise TypeError('the `obj` argument needs to implement the ``MSONable`` class.')
for method in ['as_dict', 'from_dict']:
if not hasattr(obj, method) or not callable(getattr(obj, method)):
raise TypeError(f'the `obj` argument does not have the required `{method}` method.')
super().__init__(*args, **kwargs)
self._obj = obj
self.set_attribute_many(obj.as_dict())
def _get_object(self):
"""Return the cached wrapped MSONable object.
.. note:: If the object is not yet present in memory, for example if the node was loaded from the database,
the object will first be reconstructed from the state stored in the node attributes.
"""
try:
return self._obj
except AttributeError:
attributes = self.attributes
class_name = attributes['@class']
module_name = attributes['@module']
try:
module = importlib.import_module(module_name)
except ImportError:
raise ImportError(f'the objects module `{module_name}` can not be imported.')
try:
cls = getattr(module, class_name)
except AttributeError:
raise ImportError(f'the objects module `{module_name}` does not contain the class `{cls}`.')
self._obj = cls.from_dict(attributes)
return self._obj
@property
def obj(self):
"""Return the wrapped MSONable object."""
return self._get_object() Even without the whole mutability and provenance question, maybe this would already be useful to add to |
This sounds very useful and requires not much (fingers crossed) effort to be added to the core! |
Is your feature request related to a problem? Please describe
One of the challenges of developing for AiiDA is that in case you want to use a certain type of data, you need to be able to store it in the provenance as a node. This means that you can't just use a class representing a type of data in AiiDA without first writing a data type plugin. This can be a barrier to using AiiDA, since it requires extra work.
Below I describe an idea that I've been thinking about for a while to extend the number of data types in AiiDA with all
MSONable
s, and explore some of the consequences for provenance etc. There may still be flaws and in the end we might abandon the idea, but I wanted to write this down so others could comment.Describe the solution you'd like
Let's take
pymatgen
'sComposition
class as an example.Most of the classes in
pymatgen
areMSONable
, or Monty-JSON-serialisable. You can check themonty.json
module for more details, but essentially every class that isMSONable
needs to implement anas_dict()
method, so that via theMontyEncoder
andMontyDecoder
, these classes can be JSON-serialized and stored in a file or a JSON-like database such as MongoDB (this is heavily used by Fireworks).For the
Composition
example, this looks something like this:My suggestion is to consider if we can't leverage the JSON-serialisabilty of
MSONable
classes to extend the types of data we can store in the database with allMSONable
ones. Since we can store JSON in thepostgres
database, this should be fairly straightforward. And since themodule
andclass
are stored in the dictionary returned by theto_json()
method, we can reconstruct the originalMSONable
instance quite easily.Methods, provenance and immutability
Obtaining the original class from the node should be fairly straightforward for the user:
The user would then obtain the original instance that was stored and be able to use it in whatever way they want. This I think is already a fine first implementation. They can adapt the instance and then store a node again afterwards.
Of course, these intermediate steps will not be stored in provenance. So maybe we can go one step further, and split up the methods of the class:
get
methods that simply return data without modifying theNode
instance. These should be easily transferable to theMSONableNode
instance from theMSONable
class. Continuing from the (hypothetical) example above:set
methods that change something about the instance. These obviously cannot just adapt theMSONableNode
, since the nodes are immutable by design (and for good reason). However, we could have them return a newMSONableNode
instance. Moreover, we could store the method used as acalcfunction
, and add this operation to the provenance. Again, from the example:This call would have been added to the provenance as a
calcfunction
, so:I'm not sure how difficult it is to check if a method is a
get
orset
one. Just off the top of my head we'd probably load the originMSONable
instance, apply the method and then see if theas_dict()
methods of the instance before and after are different. I think it should be feasible, but maybe I'm missing something.Describe alternatives you've considered
Another thing I've thought about, but I'm not sure if it would be possible, is to write a
MSONable
-> AiiDA data type converter. I'd have to explore this some more to see if it's feasible (even partially, to get a template that can then be adapted). However, this might be an inferior idea, since that means that any change in e.g. theComposition
class would have to be updated in the corresponding data type class. On the other hand, that may be desirable to avoid things breaking with a breaking change inpymatgen
.EDIT: Additional idea
An additional idea would be to extend the
MSONable
concept to AiiDA nodes, creating a e.g. aNodeAble
class that can be added to any class to make instances of that class storable in an AiiDA database. I guess it would be a subclass ofMSONable
, where we rely on theas_dict()
method for the JSON stored in the database and perhaps anto_repository()
method that details how/what needs to be stored in the repository. Would be very cool if you could make any Python class storeable in an AiiDA database by simply:NodeAble
as a parent class.as_dict()
method.to_repository()
method.I think this would be easier and more maintainable than writing separate AiiDA data types for everything.
Maybe I'm missing some very important detail, and this is just a pipe dream. 🙃 Comments are most welcome!
The text was updated successfully, but these errors were encountered: