-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move metadata class model de/serialization to sub-package #1279
Move metadata class model de/serialization to sub-package #1279
Conversation
tuf/api/metadata.py
Outdated
# Function-scope import to avoid circular dependency. Yucky!!! | ||
# TODO: At least move to a _get_default_metadata_serializer helper. | ||
from tuf.api.serialization.json import JSONSerializer # pylint: disable=import-outside-toplevel | ||
serializer = JSONSerializer(True) # Pass True to compact JSON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
serializer = JSONSerializer(True) # Pass True to compact JSON | |
serializer = JSONSerializer(compact=True) |
Using the named argument makes it clearer what's going on and obviates the need for the comment.
tuf/api/serialization/util.py
Outdated
Targets) | ||
|
||
|
||
def _get_signed_common_args_from_dict(_dict: Mapping[str, Any]) -> list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you decide to accept typing.Mapping
instead of typing.Dict
here and elsewhere in this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! The typing.Dict
docs have the answer:
To annotate arguments it is preferred to use an abstract collection type such as Mapping.
I didn't really question that advice, but it makes sense when you consider the API design principle of returning specific types, while accepting generic types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lovely PR, apologies for the delay in getting to review this @lukpueh.
I have a few minor questions and suggestions in-line; mostly around typing annotations, and variable & parameter names.
I also want to point out that the new code we are writing since introducing a new coding style seems to consistently violate the style guide's recommendations around imports, specifically:
Use
import
statements for packages and modules only, not for individual classes or functions.
New code in tuf.api is fairly consistently importing individual classes and functions from modules. Should we consider whether we want to make a style guide modification at this point?
- 79ce034, whose commit message has very specific design considerations
Thank you for the detailed commit messages. I wonder whether we might want to capture some of this detail in the ADR?
Discussion:
[snip]
Profrom/to_dict
as metadata class methods:
- better structured because encapsulated on each class
- keeps the serialization package minimal
from/to_dict
is hardly serialization, which implies bytes, but rather some sort of type casting / conversion, which seems appropriate to be implemented on the class- it would spare us the stupid cyclic imports
Having reviewed this PR, I am in favour of keeping [from|to]_dict
as metadata class methods. Curious to read what others think.
tuf/api/serialization/json.py
Outdated
from securesystemslib.formats import encode_canonical | ||
|
||
from tuf.api.metadata import Metadata, Signed | ||
from tuf.api.serialization import (MetadataSerializer, | ||
MetadataDeserializer, | ||
SignedSerializer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: the recently adopted style guide recommends against importing individual classes or functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know. I guess I just kept using the convention that we already used in this file, which we started before fully embracing our new style guide... and because PEP 8 says it's okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Joshua on this one.
We don't have a linter that would automatically catch those mistakes, but it's important we correct each other.
Also, it seems logical to forbid importing of functions and classes.
This way we prevent future problems with naming and namespaces.
tuf/api/serialization/json.py
Outdated
|
||
def deserialize(self, raw_data: bytes) -> Metadata: | ||
"""Deserialize utf-8 encoded JSON bytes into Metadata object. """ | ||
_dict = json.loads(raw_data.decode("utf-8")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could we rename this variable, rather than using the _
trick? Maybe json_dict, json_object (as the return type of json.loads
is a Python object), or loaded_json.
tuf/api/metadata.py
Outdated
@@ -213,11 +227,15 @@ def sign(self, key: JsonDict, append: bool = False) -> JsonDict: | |||
return signature | |||
|
|||
|
|||
def verify(self, key: JsonDict) -> bool: | |||
def verify(self, key: JsonDict, | |||
serializer: SignedSerializer = None) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments as above, serializer is optional and might be better named signed_serializer
tuf/api/serialization/json.py
Outdated
_dict = json.loads(raw_data.decode("utf-8")) | ||
return Metadata.from_dict(_dict) | ||
try: | ||
_dict = json.loads(raw_data.decode("utf-8")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here, with the code moved, can we rename _dict? Possible alternatives include json_object and json_dict.
tuf/api/metadata.py
Outdated
@@ -357,6 +261,7 @@ def bump_version(self) -> None: | |||
self.version += 1 | |||
|
|||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we aiming for three blank lines between top-level definitions? The style guide suggests two: https://google.github.io/styleguide/pyguide.html#35-blank-lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admit that I follow this secret personal convention:
Based on the guide I use "single blank lines as [I] judge appropriate within functions or methods", but then I use two blank lines between methods, to better distinguish from the single ones, and then three blank lines between top-level definitions, to distinguish from the two blank lines.
I should probably either strictly adhere to the guide, or propose a style guide amendment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, we should strictly adhere to the guide or propose an amendment. I'm neither for or against this personal convention, but I do think amendments to the style guide which contradict the "upstream" Google guide should be kept to a minimum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Let's play by the rules we chose. :) Sorry for the noise.
tuf/api/metadata.py
Outdated
@@ -487,6 +369,7 @@ def update(self, version: int, length: int, hashes: JsonDict) -> None: | |||
} | |||
|
|||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment on blank lines between top-level definitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #1279 (comment)
tuf/api/metadata.py
Outdated
@@ -545,6 +419,7 @@ def update( | |||
self.meta[metadata_fn]['hashes'] = hashes | |||
|
|||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same blank lines comment here, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #1279 (comment)
tuf/api/serialization/util.py
Outdated
"""Returns dict representation of 'Signed'-subclass object. """ | ||
# Dispatch to '*_to_dict'-function based on 'Signed' subclass type. | ||
# TODO: Use if/else cascade, if easier to read! | ||
return { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoa. Clever, but maybe a bit too much magic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes probably. :'(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, the if/else cascade is uglier but less magical and easier to follow.
Thanks for the detailed review and great suggestions, @joshuagl! I left a couple of replies inline and will address the rest in code. Regarding...
Maybe we should ... also in the light of #1261. Do you think we should get this straight before landing this PR?
I tried to capture the gist without making the ADR too verbose. But I'll try to revise the ADR once more. :)
Cool. I'll wait just a bit longer to see if others still want to weigh in. |
Can someone enlighten me as to why? Cos I don't see it yet. |
My personal feeling is that |
@MVrachev but are they useful to other (de)serializers? |
I am not sure where are the |
The point Trishank is making is that they look like implementation details of the json de/serializer hidden inside metadata.py. And I think we should admit that that's what they are -- e.g. On the other hand, the reason Lukas is proposing we leave them in is that it makes the code simpler -- I agree with this as well: the compromise seems reasonable and should not make other de/serializers any harder to implement. |
FYI, just left another comment on #1270 (comment) |
No need to block this PR. However, I think we should figure this out and ensure the new code conforms to the style ASAP. Otherwise we risk similar incidents unintentionally contradicting the style guide in future. At the same time as we ensure new code conforms to the style, it would be good to add style checking to CI – and perhaps have a black configuration for folks to ensure their code matches before submission.
This matches my understanding, and opinion, also. The compromise is pragmatic and reasonable, but let's be sure the comments indicate that those methods are a helper primarily for conversion to/from JSON. |
Add sub-package with 3 abstract base classes to: - serialize Metadata objects to bytes (transport) - deserialize Metadata objects from bytes (transport) - serialize Signed objects to bytes (signatures) pylint notes: - configure tox to use api/pylintrc - configure api/pylintrc to allow classes without public methods (default was 2) Design considerations --------------------- - Why not implement de/serialization on metadata classes? -> See ADR0006. - Why use separate classes for serialization and deserialization? -> Some users might only need either one, e.g. client only needs Deserializer. Maybe there are use cases where different implementations are used to serialize and deserialize. - Why use separate classes for Metadata- and Signed-Serialization? -> They require different concrete types, i.e. Metadata and Signed as parameters, and using these specific types seems to make the interface stronger. - Why are de/serialize methods not class/staticmethods? -> In reality we only use classes to namespace and define a type annotated interface, thus it would be enough to make the methods classmethods. However, to keep the de/serialize interface minimal, we move any custom format configuration to the constructor. (See e.g. "compact" for JSONSerializer in subsequent commit). Naming considerations --------------------- - Why de/serialize? -> Implies byte stream as input or output to the function, which is what our interface needs. - Why not marshaling? -> Synonym for serialize but implies transport, would be okay. - Why not encoding? -> Too abstract and too many connotations (character, a/v). - Why not parse? -> Too abstract and no good opposite terms (unparse, write, dump?) Signed-off-by: Lukas Puehringer <[email protected]>
b786bc0
to
91fca86
Compare
# Use different pylint configs for legacy and new (tuf/api) code | ||
# NOTE: Contrary to what the pylint docs suggest, ignoring full paths does | ||
# work, unfortunately each subdirectory has to be ignored explicitly. | ||
pylint {toxinidir}/tuf --ignore={toxinidir}/tuf/api,{toxinidir}/tuf/api/serialization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to ignore all files under a certain folder?
That way you would be able to ignore everything under tuf/api
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to ignore all files under a certain folder?
Nope (see my comment). I even looked at the implementation of --ignore
and --ignore-patterns
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC there's a corresponding ticket on the pylint issue tracker.
Add serializer.json module with implementations to serialize and deserialize TUF role metadata to and from the JSON wireline format for transportation, and to serialize the 'signed' part of TUF role metadata to the OLPC Canonical JSON format for signature generation and verification. Signed-off-by: Lukas Puehringer <[email protected]>
- Rename Metadata methods: - to_json_file -> to_file - from_json_file -> from_file - Remove Metadata.from_json/to_json - Remove Signed.to_canonical_bytes - Accept optional de/serializer arguments: - from_file (default: JSONDeserializer) - to_file (default: JSONSerializer) - sign, verify (default: CanonicalJSONSerializer) - inline disable pylint cyclic-import checks Signed-off-by: Lukas Puehringer <[email protected]>
Re-raise all errors that happen during de/serialization as custom De/SerializationError. Whilelist 'e', which is idiomatic for error, in api/pylintrc, and inline exempt broad-except, which are okay if re-raised. Signed-off-by: Lukas Puehringer <[email protected]>
Add tuf.api.serialization.util module with functions to convert between TUF metadata class model and the corresponding dictionary representation. These functions replace the corresponding to/from_dict classmethods. Configure api/pylintrc to exempt '_type' from protected member access warning, because the underscore prefix here is only used to avoid name shadowing. Signed-off-by: Lukas Puehringer <[email protected]>
Revert an earlier commit that moved to/from_dict metadata class model methods to a util module of the serialization sub-package. We keep to/from_dict methods on the metadata classes because: - It seems **idiomatic** (see e.g. 3rd-party libaries such as attrs, pydantic, marshmallow, or built-ins that provide default or customizable dict representation for higher-level objects). The idiomatic choice should make usage more intuitive. - It feels better **structured** when each method is encapsulated within the corresponding class, which in turn should make maintaining/modifying/extending the class model easier. - It allows us to remove function-scope imports (see subsequent commit). Caveat: Now that "the meat" of the sub-packaged JSON serializer is implemented on the class, it might make it harder to create a non-dict based serializer by copy-paste-amending the JSON serializer. However, the benefits from above seem to outweigh the disadvantage. See option 5 of ADR0006 for further details (theupdateframework#1270). Signed-off-by: Lukas Puehringer <[email protected]>
91fca86
to
37afbdc
Compare
Signed-off-by: Lukas Puehringer <[email protected]>
d5fb7ed
to
8be8ec1
Compare
This could use another review. :)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the updates Lukas. I have a couple of minor nits and one surprise, why are we using six in code that was never intended to run on Python2?
tuf/api/serialization/__init__.py
Outdated
|
||
class MetadataDeserializer(): | ||
"""Abstract base class for deserialization of Metadata objects. """ | ||
__metaclass__ = abc.ABCMeta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: now that we are only supporting Python3 we can switch to less confusing metaclass syntax, either
import abc
class MetadataDeserializer(ABC):
or
import abc
class MetadataDeserializer(metaclass=ABCMeta):
we can handle this in a future PR, though.
sort_keys=True).encode("utf-8") | ||
|
||
|
||
class CanonicalJSONSerializer(SignedSerializer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is worth clarifying, via the class name, that this is the OLPC canonical JSON?
You mention it in the header above, but as we have had several adopters surprised that our Canonical JSON isn't compatible with go-tuf's Canonical JSON, I wonder whether it is worth being very explicit?
Counter-argument, I do not like OLPCCanonicalJSONSerializer
as a class name...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with both arguments. :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way to avoid ambiguity would be to not call it CanonicalJSONSerializer
at all, but something like DefaultSignedSerializer
or so? Then people are forced to read the docstring to get more info. At any rate, I will mention OLPC in the class and function docstrings.
tuf/api/serialization/json.py
Outdated
@@ -7,6 +7,7 @@ | |||
|
|||
""" | |||
import json | |||
import six |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we using six in code that was never intended to support Python2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For raise_from
below, which I will change to raise ... from
.
tuf/api/serialization/json.py
Outdated
sort_keys=True).encode("utf-8") | ||
|
||
except Exception as e: # pylint: disable=broad-except | ||
six.raise_from(SerializationError, e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python3 syntax for this appears to be
raise SerializationError from e
tuf/api/serialization/json.py
Outdated
return encode_canonical(signed_dict).encode("utf-8") | ||
|
||
except Exception as e: # pylint: disable=broad-except | ||
six.raise_from(SerializationError, e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not use six here
raise SerializationError from e
# Function-scope import to avoid circular dependency. Yucky!!! | ||
# TODO: At least move to a _get_default_signed_serializer helper. | ||
from tuf.api.serialization.json import CanonicalJSONSerializer # pylint: disable=import-outside-toplevel | ||
if signed_serializer is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're missing some of the serializer
-> signed_serializer
rename in this patch (8858280)? The argument name and the variable assigned below are still serializer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the other way around, this line should have gone into b934ded. Apologies for the bad git add --patch
ing. Happy to rebase. Maybe after a final LGTM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WFM
|
||
def _common_fields_to_dict(self) -> Dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this have the @staticmethod
decorator also? Throughout the rest of this patch it is not being called on an instance of the class, but on the super()
proxy object, much like _common_fields_from_dict()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't be static because it needs self
, but thanks for pointing out that those two are called on super()
. I guess it's less confusing to call them on self
, given that they are only called in the parent class. 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, thanks.
tuf/api/serialization/json.py
Outdated
metadata_obj = Metadata.from_dict(json_dict) | ||
|
||
except Exception as e: # pylint: disable=broad-except | ||
six.raise_from(DeserializationError, e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python3 syntax for this appears to be
raise SerializationError from e
7e7187e
to
a223961
Compare
Thanks for the review! I (hope I) addressed all your comments in the last 3 commits. I chose to leave I'm happy to also fix the minor git history hiccup from #1279 (comment) once you have approved the PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for quickly addressing the review comments.
Thanks for the review! I (hope I) addressed all your comments in the last 3 commits. I chose to leave
CanonicalJSONSerializer
as is but update the docstrings. Let me know if that works for you.
Last 3 commits look great, thanks. I'm happy to leave CanonicalJSONSerializer
. Let's see if we can come up with a 'better' name before we release with an API stability promise.
I'm happy to also fix the minor git history hiccup from #1279 (comment) once you have approved the PR.
That would be good, thanks!
- Try to clarify purpose and remove unimportant TODO note - Use pylint block-level control for shorter lines, see http://pylint.pycqa.org/en/latest/user_guide/message-control.html#block-disables Signed-off-by: Lukas Puehringer <[email protected]>
Use typing.Optional for optional kwargs that default to None. Signed-off-by: Lukas Puehringer <[email protected]>
- Rename _dict to json_dict to avoid wrong semantics of leading underscore. (leading underscore was initially chosen to avoid name shadowing) - Rename 'serializer' argument of type 'SignedSerializer' to 'signed_serializer', to distinguish from 'serializer' argument of type 'MetadataSerializer'. Signed-off-by: Lukas Puehringer <[email protected]>
Signed-off-by: Lukas Puehringer <[email protected]>
Clarify that the TUF metadata class model is not bound to a JSON wireline format by: - re-wording module, class and method docstrings and code comments to add details about custom and default serialization and the purpose of from/to_dict methods, and - removing the 'JsonDict' type annotation -- instead we use generic Mapping[str, Any] for method arguments and strict Dict[str, Any] as return value as suggested in https://docs.python.org/3/library/typing.html#typing.Dict Signed-off-by: Lukas Puehringer <[email protected]>
Prior to this commit the (abstract) 'Signed' base class implemented from/to_dict methods, to be used by any subclass in addition to or instead of a custom from/to_dict method. The design led to some confusion, especially in 'Signed.from_dict' factories, which instantiated subclass objects when called on a subclass, which didn't implement its own 'from_dict' method. This commit demystifies the design, by implementing from/to_dict on all 'Signed' subclasses, and moving common from/to_dict tasks to helper functions in the 'Signed' class. The newly gained clarity and explicitness comes at the cost of slightly more lines of code. Signed-off-by: Lukas Puehringer <[email protected]>
Signed-off-by: Lukas Puehringer <[email protected]>
tuf.api is not designed for Python 2 compatibility. This commit removes the following stray compatibility constructs in its serialization subpackage: - '__metaclass__ = abc.ABCMeta' - six.raise_from Signed-off-by: Lukas Puehringer <[email protected]>
- Make class docstrings wording consistent. - Emphasize that we use the OLPC Canonical JSON specification. Signed-off-by: Lukas Puehringer <[email protected]>
Call an instance method and a static method that are only defined in a parent class from child instances using self (instance) and cls (static) instead of super(). While this doesn't make a practical difference, the new syntax is probably less confusing to the reader. Signed-off-by: Lukas Puehringer <[email protected]>
a223961
to
ef91964
Compare
Done. Given that the overall diff of the PR didn't change I suppose your approval is still valid. Merging... |
Fixes #-
Related to #1270
Description of the changes being introduced by the pull request:
This PR moves de/serialization logic of TUF metadata from the metadata class model (
api.metadata
) to a newly added serialization sub-package (api.serialization
) and adopts the metadata API accordingly, with the goal of making it easier for users to swap de/serialization implementations. In particular this PR adds:More infos can be found in ...
Discussion:
The meat of the default json and canonical json de/serializers is the
from/to_dict
logic, which is currently implemented in aserialization.util
module. I wonder if we shouldn't just keep it on the metadata class model instead (we'd only need to revert b786bc0).Pro
from/to_dict
as metadata class methods:from/to_dict
is hardly serialization, which implies bytes, but rather some sort of type casting / conversion, which seems appropriate to be implemented on the classPro
*_from/to_dict
in serialization sub-package:Please verify and check that the pull request fulfills the following
requirements: