-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ADR0006: Where to implement model serialization #1270
Conversation
I will leave some thoughts here for the different options:
About the
I agree with the
|
Clearly, I'm for the third option. In my experience building tuf-on-a-plane, it was very helpful to separate the two concerns, as I could: (1) cleanly separate the concerns and reason about them (a bug in one did not have to do with the other), and (2) easily extend custom parsers/decoders/deserializers without changing anything about the intermediate Metadata representation. The problem with the first option is that both the (de)serialization and Metadata code are all in one place, and it does not help to read or reason about them at the same time. I argue that they are both conceptually and practically separate. I should also note that whatever we decide (e.g., global key databases instead of functionally local key lookup tables) does end up influencing other implementations. The easier we make it for other developers to reason about how TUF works, the better. So if that means a better separation of the concerns, then we should do that. Anyway, that's just my 0.02 BTC (even though it's something I feel relatively strong about). What do others think? |
This is something we're seeing over and over as more implementations of TUF are being developed. Let's make "The easier we make it for other developers to reason about how TUF works, the better." our maintainer mantra. |
Thanks, Joshua. Another concern I'd like to us address is removing the optics that TUF is officially tied exclusively to canonical JSON as data transport, so if separating the two concerns (parsing vs models) helps us do that, then all the better. |
I've been thinking about this a for a few days while reviewing the current I also feel like that the reference implementation has some other goals:
Given those two things, I feel like proposal 4 in the ADR is not the way to go. As I understand it, it would necessitate a separate serialisation implementation for formats that aren't part of the default implementation. This would mean that 1) it is harder for people to implement a new serialisation method – many will want to start by copying, then modifying, the default mechanism and that won't work if JSON serialisation is baked in, and 2) that there's a separate codepath for the non-default serialisation mechanism – which may lead to subtle issues around serialisation to non-default formats. Given the goals proposed above, I am in favour of a generic and abstract serialisation/deserialisation interface which is separate from the metadata model, extensible without directly modifying the code of python-tuf, and shared by the upstream and any downstream serialisation/deserialisation implementations. |
Thanks for your comments, everyone! Based on the discussion here I have implemented option 3 in #1279. I do, however, have another option, or rather a variant of option 3, to offer. See option 5 in the just updated ADR, which I have generally fleshed out a bit. Note that #1279 is already crafted in such a way that we can easily switch to option 5 by chopping off the last commit (b786bc0). |
I had a look into option 5 and read the code without commit b786bc0. |
At first option 5 felt like a bit of smoke-and-mirrors to hide the fact that metadata still implements the json de/serialization... but on second look it doesn't bother me much. The potential issues might be:
but I think the advantages you listed are more important than those issues (let's not optimize too much for the hypothetical case)... So LGTM. I did try to come up with a clever-er way of separating to_dict()/from_dict() from the actual metadata just a little bit. E.g. some "JsonMixin" classes that lived with the actual Metadata classes but were still clearly separate. I could not come up with anything cleaner than this version. |
3. De/serialization separated | ||
4. Compromise 1: Default json de/serialization on metadata classes, | ||
non-default de/serialization separated | ||
5. Compromise 2: Dictitionary conversion for default json de/serialization on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5. Compromise 2: Dictitionary conversion for default json de/serialization on | |
5. Compromise 2: Dictionary conversion for default json de/serialization on |
I think that's only an issue in option 4. In option 5 we still serialize via the external serializer and thus don't have two different code paths. Or do I miss something?
You are right, I even briefly had that listed as con for option 5, but removed it because it felt so weak. The inherit-from-abstract-serializer scaffolding can be copy-pasted from the default json de/serializer, and adopting the to/from_dict method stack seemed like a fair ask. |
I'm not sure I'm convinced about Option 5. Can someone explain why it's conceptually easier to go from a high-level Metadata object to an intermediate-level dictionary object and then to a low-level serialization format such as JSON or XML? Why is the intermediate step necessary? |
What about the arguments in the ADR, @trishankatdatadog? |
I just mean if we later add an internal function in Metadata and use it in to_dict()... this internal function is now really part of serialization code but not available to someone doing it outside of Metadata. I don't think this is something we should worry about here. Maybe related point: The way metadata.py refers to JsonDicts in a lot of places does make it a little confusing: are they really json or just very specific dictionaries? I'm not sure if I can tell how separate metadata.py and json are at this point (maybe worth documenting things like |
Sorry, I'm not sure I fully follow. Why is it easier for third-party developers to write decoders from JSON/XML to Python dictionaries instead of more natural, higher-level Metadata objects? What are we saving here? I must be missing something. |
I don't think the idea was for 3rd parties to to use to_dict/from_dict: they are just helpers for the default de/serializer -- it's code that could be in the de/serializer but becames a bit simpler when you can just include it in metadata. The status and purpose of these functions could maybe be better documented. |
JSON to dictionary is really easy: import json
json.loads(data) For formats that don't have libraries with builtin dict de/serialization support it doesn't make sense to use a dictionary intermediary, but you also don't have to. |
Hm. I if it were all implemented in
Agreed, we should remove references to JSON in metadata.py. Removing the usage of |
I think I'm a bit lost here. There are too many options, and I've got too many things going on to properly switch context. Would appreciate a quick 1/2 hour meeting to quickly discuss options. Especially w/o quick code to glance at, I can't immediately tell what's going on, sorry. |
@trishankatdatadog, did you take a look at #1279, which implements your preferred option 3, as suggested above? Its PR description explains the alternative option 5 including pros and cons, and it even points to quick code to glance at that shows the diff between option 3 and 5 (see b786bc0). I think reading the PR description and skimming the diff shouldn't take longer than half an hour and give you a good idea what's going on here. Let me know if you still need a meeting afterwards. |
It's good. I read the code, and I like how Option 3 is implemented, thanks. What still I don't understand is the difference in Option 5. Is there any good reason to expose those utility from/to dictionary methods in the base Metadata class? Would other (de)serializers benefit from it? |
Thanks for the feedback, @trishankatdatadog! Regarding your question whether other de/serializers benefit from Although it doesn't make a big difference whether the So one argument is that it seems idiomatic to implement these functions as methods on the class, and the other argument is that the code seems a bit better structured when each method is scoped within the corresponding class, which in turn should make maintaining/modifying/extending the class model easier. |
Do what you think is best, Lukas and friends. I trust you will make the right decision. From my point of view, what is critical is that developers can easily write new (de)serializers targeting the same abstract, high-level metadata interface, and not worry about dictionaries and JSON and so on. |
Revert an earlier commit that moved to/from_dict metadata class model methods to a util module of the serialization sub-package. We keep to/from_dict methods on the metadata classes because: - It seems **idiomatic** (see e.g. 3rd-party libaries such as attrs, pydantic, marshmallow, or built-ins that provide default or customizable dict representation for higher-level objects). The idiomatic choice should make usage more intuitive. - It feels better **structured** when each method is encapsulated within the corresponding class, which in turn should make maintaining/modifying/extending the class model easier. - It allows us to remove function-scope imports (see subsequent commit). Caveat: Now that "the meat" of the sub-packaged JSON serializer is implemented on the class, it might make it harder to create a non-dict based serializer by copy-paste-amending the JSON serializer. However, the benefits from above seem to outweigh the disadvantage. See option 5 of ADR0006 for further details (theupdateframework#1270). Signed-off-by: Lukas Puehringer <[email protected]>
Revert an earlier commit that moved to/from_dict metadata class model methods to a util module of the serialization sub-package. We keep to/from_dict methods on the metadata classes because: - It seems **idiomatic** (see e.g. 3rd-party libaries such as attrs, pydantic, marshmallow, or built-ins that provide default or customizable dict representation for higher-level objects). The idiomatic choice should make usage more intuitive. - It feels better **structured** when each method is encapsulated within the corresponding class, which in turn should make maintaining/modifying/extending the class model easier. - It allows us to remove function-scope imports (see subsequent commit). Caveat: Now that "the meat" of the sub-packaged JSON serializer is implemented on the class, it might make it harder to create a non-dict based serializer by copy-paste-amending the JSON serializer. However, the benefits from above seem to outweigh the disadvantage. See option 5 of ADR0006 for further details (theupdateframework#1270). Signed-off-by: Lukas Puehringer <[email protected]>
Add decision record about the design of de/serialization between TUF metadata class model and wire line metadata formats. Chosen option: Serialization and class model are decoupled, but the class model provides conversion helper methods. Signed-off-by: Lukas Puehringer <[email protected]>
I think the decision is already made - we have implemented model (de)serialization by merging #1279. What do you think @jku? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes #-
Related to #1139 (Implements Option 5 of this ADR)
Description of the changes being introduced by the pull request:
Add decision record about the design of de/serialization between TUF metadata class model and tuf wire line metadata formats.
Please verify and check that the pull request fulfills the following
requirements: