Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serializing Polymophic Hierarchy #106

Open
nathan5280 opened this issue Jul 9, 2019 · 9 comments
Open

Serializing Polymophic Hierarchy #106

nathan5280 opened this issue Jul 9, 2019 · 9 comments
Assignees
Labels

Comments

@nathan5280
Copy link
Contributor

I have decision tree where the nodes are subclasses to support different types of decisions. Not surprisingly, if I dump using the base Node schema I only get the fields in that class which doesn't even include the sub-nodes. Like I said, no real surprise there.

Is the idea of serializing a set of classes that use polymorphism totally out of scope or something we could work towards in a future release?

I'll poke away at some solutions. Maybe we can find something to incorporate into dcj.

import copy
import json
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import MutableMapping

from dataclasses_json import dataclass_json


@dataclass_json
@dataclass
class Node(ABC):
    name: str

    @abstractmethod
    def decide(self, state: MutableMapping) -> bool:
        pass


@dataclass_json
@dataclass
class DecisionNode(Node):
    true_node: Node
    false_node: Node

    def __init__(self, name: str, true_node: Node, false_node: Node, var_key: str, value: int):
        self.true_node = true_node
        self.false_node = false_node
        self.var_key = var_key
        self.value = value
        super().__init__(name=name)

    def decide(self, state: MutableMapping) -> bool:
        state = copy.deepcopy(state)
        if state[self.var_key] > self.value:
            return self.true_node.decide(state)
        else:
            return self.false_node.decide(state)


@dataclass_json
@dataclass
class LeafNode(Node):
    result: bool

    def __init__(self, name: str, result: bool):
        self.result = result
        super().__init__(name=name)

    def decide(self, state) -> bool:
        return self.result


def decide():
    true_leaf = LeafNode(name="true", result=True)
    false_leaf = LeafNode(name="false", result=False)
    decision_node = DecisionNode(
        name="int value", true_node=true_leaf, false_node=false_leaf, var_key="year", value=2000
    )
    state = {"year": 2019}
    result = decision_node.decide(state)
    assert result

    print(json.dumps(Node.schema().dump(decision_node), indent=2))

if __name__ == "__main__":
    decide()
@nathan5280
Copy link
Contributor Author

nathan5280 commented Jul 15, 2019

Came across Union today in the release notes. This gets me most of the way there. I'll keep poking at it.

from abc import ABC
from dataclasses import dataclass
from typing import Union, List

from dataclasses_json import dataclass_json


@dataclass_json
@dataclass
class Animal(ABC):
    name: str


@dataclass_json
@dataclass
class Bird(Animal):
    pass


@dataclass_json
@dataclass
class Deer(Animal):
    spots: bool


@dataclass_json
@dataclass
class Animals:
    animals: List[Union[Bird, Deer]]


def run():
    bird = Bird("Malard Duck")
    deer = Deer("Mule Deer", False)
    animals = Animals([bird, deer])
    str_repr = Animals.schema().dumps(animals, indent=2)
    print(str_repr)
    obj_repr = Animals.schema().loads(str_repr)
    dict_repr = Animals.schema().dump(obj_repr)
    print(dict_repr)


if __name__ == "__main__":
    run()
{
  "animals": [
    {
      "name": "Malard Duck",
      "__type": "Bird"
    },
    {
      "name": "Mule Deer",
      "spots": false,
      "__type": "Deer"
    }
  ]
}
{'animals': [{'name': 'Malard Duck', '__type': 'Bird'}, {'name': 'Mule Deer', 'spots': False, '__type': 'Deer'}]}

@nathan5280
Copy link
Contributor Author

I have a version of polymorphic serialization/deserialization working. I can't quite get this without making some changes to core.py and wanted to get some feedback before I get a pull request together. I see that we recently added a parameterized version of dataclasses_json to the library. This is how I think things would work when I put everything together.

User Code

@dataclass_json(alias="question")
@dataclass
class Question(ABC):
    question: str

@dataclass_json(alias="question-string")
@dataclass
class QuestionString(Question):
    value: string

@dataclass_json(alias="question-integer")
@dataclass
class QuestionString(Question):
    value: int

The changes to the code would be:
api.py: Add functionality to the dataclass_json decorator to get_alias() method to classes that have an alias defined. Rework the below example as it can be simpler with the recent code changes and merged with dataclass_json decorator.

def dataclass_json_poly(alias: str):
    """
    Wrap the dataclass_json decorator with the functionality to associate an
    alias name with a dataclass.  This is used in core._asdict to add the
    _class_alias field to the serialized object so that core._decode_dataclass
    knows what class to decode the dictionary into.

    This decorator should be used in place of dataclass_json.
    :param alias: Alias for the class when it is serialized.
    :return: Actual poly decorator
    """

    def dataclass_json_poly_decorator(cls):
        """
        Actual decorator for the class that has access to the alias free variable
        to use in the closure for the wrapped to_dict method.

        :param cls: Class to decorate
        :return: Decorated class.
        """

        def get_alias(*args):
            return alias

        # Wrap the class with the dataclass_json standard functionality
        cls = dataclass_json(cls)
        # Update the to_dict method with the wrapped version that passes in the alias.
        cls.get_alias = get_alias
        # Add the class to the alias:class map to use when deserializing an object.
        _alias_to_class_map[alias] = cls

        return cls

    return dataclass_json_poly_decorator

core.py module level variable.

# Map to keep track of polymorphic classes and their aliases.
# This is used in conjunction with the dataclass_json_poly decorator.
_alias_to_class_map = dict()

in core._decode_dataclass() add check to see if the object being deserialized has an alias and lookup the actual class in the _alias_to_class_map.

    try:
        if "_class_alias" in kvs:
            cls = _alias_to_class_map[kvs["_class_alias"]]
            del kvs["_class_alias"]
    except AttributeError:
        pass

core._asdict() add the following functionality to see if the class being serialized is a poly class and add the alias to the fields being serialized.

        try:
            # Check to see if the dataclass is a Poly class and needs the
            # class alias metadata added.
            alias = obj.get_alias()
            result.append(("_class_alias", alias))

        except AttributeError:
            pass

Notes:

  • This will be purely additive functionality
  • I don't have this working with QuestionInteger.schema().* It looks like there are a couple of
    packages for marshmallow to do this, but I haven't had the need yet or the time to track that down.

Let me know what you all think.

@lewfish
Copy link

lewfish commented Oct 18, 2019

I was just thinking about doing something similar. I just learned about this library, but it sounds like a good approach to me.

@aronszanto
Copy link

@nathan5280 this seems great. Would be great to get a review on a PR!

@aronszanto
Copy link

@nathan5280 or @lidatong any update on this one? Being able to use DCJ with ABC dataclasses is a really significant use case for us and if DCJ were able to encode the specific class that appears at runtime to inhabit a parent class field, that would a game changer. Happy to review a PR- seems like there may be most of the makings of one already done.

@nathan5280
Copy link
Contributor Author

@aronszanto My team has switched to FastAPI which uses Pydantic for the validation and mapping functionality that I so love in the lightweight DCJ package. It is unlikely that I'll get back to this anytime soon. Feel free to ping me if you decide to move forward with this or want to bounce some ideas off me.

@NoamNol
Copy link

NoamNol commented Sep 11, 2022

Any update? Is it possible without a Union?

@george-zubrienko
Copy link
Collaborator

I really don't see a big deal in implementing support for this as Python doesn't do type erasure, all fields are present in dict regardless of type. We'll see if we can get this done soon

@nosachamos
Copy link

I really like this lib... It's almost perfect! But this is a must have for any serious use-case or large(ish) app.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants