Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] Make (de)serialization a first class feature #181

Open
phvalguima opened this issue Jul 5, 2024 · 6 comments
Open

[Model] Make (de)serialization a first class feature #181

phvalguima opened this issue Jul 5, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@phvalguima
Copy link

phvalguima commented Jul 5, 2024

Currently, we have two issues that look like intertwined: (1) databag format changes between charm revisions and we need to handle that at upgrades; and (2) predictable serialization, so we make sure a given class with the same data always get transformed into exactly the same string.

The later problem will happen every time we change the way we handle databags. In upgrades, we will also have to provide a way to upgrade from "model-revision-X" to "model-revision-X+1" and as @MiaAltieri highlighted, they will have to coexist while the upgrade is in-progress.

The former is an old issue that recurrently bubbles up in Juju. As we start to build more complex data structures that are serialized to the databag, we cannot guarantee anymore that the final string remains the same, even if the original struct had exactly the same data.

Proposal

To have a BaseModel class that can cover both scenarios.

This class would look like:

class RootDataPlatformModel(BaseModel):

    revision: int

    @root_validator
    def convert_data_following_rev(...):
        # Sequence of if-elif-else that format the data according to the the incoming revision code

    

    def __str__(...):
        data = self.model_dump()  # define custom serializers with @model_serializer

        # Now, iterate over the revisions this class knows about: we must create a final string that follows the `revision` field
        data_following_rev = ....

        # Now that we know the data formatted as a dict for revision X, we should iterate over its structure and start converting
        # each piece to string
        for key, value in data_following_rev.items():
              # For each literal type, the conversion to string is already done
              # For each object within the "key" or "value" that implements RootDataPlatformModel, call str(this object)
              # For structures like raw Lists, Sets, Tuples, we need to make sure they are ordered

Using Metaclasses to Implement Different Revisions

This code will rapidly evolve into a complex web of if-elif-else as the number of exceptions between revisions grow. ideally, we do not want to deal with object "RootDataPlatformModel", but rather "RootDataPlatformModelRevisionX" when we are working on revision X.

So, what we could do instead is have a metaclass to decide which final object we will create, based on the revision:

class RootModelSelector(type):
    def __new__(cls, args, kwargs):
        # This method will return a different subclass of `BaseModel` depending on which revision code has been provided


class RootDataPlatformModel(metaclass=RevisionSelector):

    revision: int

    @root_validator
    def deserializer(...):
        # Now, this method is drastrically simplified, as we will use 
    

    def __str__(...):
         # Also simplified, we always serialize for the appropriate revision

And the way we'd use it is:

class PeerRelationRev95Model(RootPeerRelationModel):
    ...


class PeerRelationRev99Model(RootPeerRelationModel):
    ...

class PeerRelationSelector(RootModelSelector):
       # Implements the __new__ to ensure we are we select anything as follows:
       # if revision <= 95: return PeerRelationRev95Model()
       # elif revision 95 < or <=99: return PeerRelationRev99Model()


class RootPeerRelationModel(BaseModel, metaclass=PeerRelationSelector):

    revision: int

    @root_validator
    def deserializer(...):
        # Now, this method is drastrically simplified, as we will use 
    

    def __str__(...):
         # Also simplified, we always serialize for the appropriate revision

Any "upper class" calls exclusively the RootPeerRelationModel.

How to Select the Revision

The actual selection of the revision is done by the upper classes that create BaseModel objects. These are the classes that have access to the databag and can also query the main charm to know which revision they should take into consideration.

In the case we are in the middle of an upgrade, databag revisions X and X+1 may coexist for sometime. Therefore, a class that decides to load the databag should first check if all its peers have finished moving to X+1. If not, then it should continue to use (de)serialization for revision X.

Once the upgrade is finished, then the units can use revision X+1 (de)serialization and persist that new format in the databag instead.

@phvalguima phvalguima added the bug Something isn't working label Jul 5, 2024
Copy link

github-actions bot commented Jul 5, 2024

@carlcsaposs-canonical
Copy link
Contributor

2 cents: I don't think we should apply a sort, but instead check for equality & don't write if equal: https://chat.canonical.com/canonical/pl/gh1f3fb1ytyzmr6qdx993sq4yc

for certain data structures, the ordering can (but doesn't have to) have meaning—and, in the generic sense, I don't think we should affect that ordering
e.g. lists are ordered, so we shouldn't resort
dicts preserve insertion order, so we shouldn't resort

@phvalguima
Copy link
Author

@carlcsaposs-canonical on your proposal:

  • pros: not massively sorting data, which can cost quite some time
  • cons: the change in this case will have to go beyond the models and we will need some "gatekeeper" before putting data in the databag.

for certain data structures, the ordering can (but doesn't have to) have meaning—and, in the generic sense, I don't think we should affect that ordering

True, I am not suggesting to disrespect that meaning. If you are using these models above, then you're aware of what they'll do. If you don't want your data structure to be reordered, then create a model for it and control the way we'd convert it to string.

@phvalguima
Copy link
Author

@carlcsaposs-canonical there is a 2nd part of this proposal: how can we load data from the databag according to a certain revision. What do you think of that?

@carlcsaposs-canonical
Copy link
Contributor

  • will have to go beyond the models

how so? I believe it would be: de-serialize what's in the databag & compare that to what you're about to write. If equal, don't write. Otherwise, write

then you don't have to worry about sorting

@carlcsaposs-canonical
Copy link
Contributor

how can we load data from the databag according to a certain revision. What do you think of that?

think it might be better to discuss over a higher bandwidth medium (e.g. on a call)

my initial thoughts: is it for peer relations only?

if so, seems helpful although I think it could be a bit simpler and I think in many cases it might be easier to do something backwards-compatible than to make a migration. Also, during rollback or downgrade, how would old code know how to migrate back?

if it's also for remote relations, I think there are additional challenges & nuances that we'd need to look at

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants