-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New metadata API: add MetadataInfo and TargetFile classes #1223
New metadata API: add MetadataInfo and TargetFile classes #1223
Conversation
1f074f8
to
5745088
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work, @MVrachev and many thanks for taking a stab at this! Don't let the many comments discourage you, they are mostly minor nit picks that should be easy to fix. Let me know when this is ready for another round for review.
e517a86
to
fd340ad
Compare
It took me longer than expected, but I updated my pr.
|
fd340ad
to
bdcdf02
Compare
@MVrachev and @jku have rightfully pointed out that in my hierarchic Note that this mutation only occurs in classes that use inheritance, i.e. in class Parent:
def __init__(self, some_parent_attr: SomeParentAttr):
self.some_parent_attr = some_parent_attr
@classmethod
def from_dict(cls, data: Dict):
# Write back into passed 'data' dict so that we can pass it as '**data'
# to constructor, with parsed value for 'data["some_parent_attr"]',
# alongside other 'data' values that were parsed in subclasses.
# FIXME: Mutation of 'data' (passed by reference) might surprise caller
data["some_parent_attr"] = SomeParentAttr.from_dict(data["some_parent_attr"])
return cls(**data)
class Child(Parent):
def __init__(self,
some_parent_attr: SomeParentAttr,
some_child_attr: SomeChildAttr):
super().__init__(some_parent_attr)
self.some_child_attr = some_child_attr
@classmethod
def from_dict(cls, data: Dict):
# Parse and write back into passed 'data' dict so that we can hand it
# on to the parent class for any additional parsing of 'data' values.
# FIXME: Mutation of 'data' (passed by reference) might surprise caller
data["some_child_attr"] = SomeChildAttr.from_dict([data["some_child_attr"]])
return super().from_dict(data) @MVrachev proposes to class Parent:
@classmethod
def from_dict(cls, data: Dict):
# Create empty object with default or parametrized constructor with
# default arguments.
obj = cls()
# Parse 'data["some_parent_attr"]' and assign it to the empty object.
# Any child attributes are parsed and assigned later in the child
# class.
obj.some_parent_attr = SomeParentAttr.from_dict(data["some_parent_attr"])
return obj
class Child(Parent):
@classmethod
def from_dict(cls, data: Dict):
# Call parent parse to get object with parent attributes already
# assigned and ...
obj = super().from_dict(data)
# ... parse and assign any child attributes here.
obj.some_child_attr = SomeChildAttr.from_dict([data["some_child_attr"]])
return obj |
Just my 0.02 BTC as an observer: One thing I regretted adding in my original Metadata API was native JSON decoding/encoding support as class methods. While it works well, it does not not decouple our internal Metadata objects from the data transport format (which need not be JSON). In my new design, I use an abstract parser which can be overriden to implement different parsers for different data transport formats that nevertheless return the same type of internal Metadata objects we expect. My new design does not yet account for how to write those internal Metadata objects back into JSON, but I don't believe it would be too difficult while keeping separation of concerns. Anyway, please carry on, don't let me derail you, I was just making an observation based on something I caused. |
Good point, @trishankatdatadog. This would be a third, more procedural option, for our considerations in secure-systems-lab/securesystemslib#272 (comment). And it indeed completely decouples internal Metadata objects from external transport format. What I liked about my proposal is that serialization/deserialization logic is organized in small chunks that are encapsulated in the related class. But I'm really fine either way. Curious to hear what others think. |
I agree with your proposal @lukpueh and will change it. |
86aee3f
to
2beb35d
Compare
I implemented the suggestions from @lukpueh. It will be awesome if we can hear the opinions of others. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates, @MVrachev! Speaking of updates, should we add TODO comments to mark the update
methods for revision as discussed in #1223 (comment) pp? Otherwise this looks quite good to me. :)
4e26995
to
194a907
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c4f8330
to
56c82de
Compare
I addressed all comments made by @joshuagl and rebased to squash some of the commits and prepare this pr for merging. |
Closing to try and re-trigger GHA. |
There are linter failures on the new changes, could you please take a look @MVrachev? Note: it's likely that we need to update pylint configuration for the new coding style. |
56c82de
to
14ef799
Compare
I added one commit where I disable inline some of the irrelevant warnings and for warning |
# cls is most likely a child class of "Signed" and we need to setup | ||
# the appropriate fields. | ||
obj = cls() # pylint: disable=E1120 | ||
obj._type = signed_dict['_type'] # pylint: disable=W0212 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
W0212 is protected-access
, we're accessing a protected member outside of the class or a descendent. This makes sense given the pattern of creating an empty object, updating the fields, then returning the populated objet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU _type
is not actually meant to be "protected" in the Python OOP sense (the naming dates back before our transition to more OOP), but rather is a workaround to not shadow the builtin type
name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, let's add a comment to that effect?
# Warnings about rules E1120 and W0212 are not relevant here because | ||
# cls is most likely a child class of "Signed" and we need to setup | ||
# the appropriate fields. | ||
obj = cls() # pylint: disable=E1120 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E1120 is no-value-for-parameter
, which is that we're creating an instance of the class without passing values for __init__
's arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Orthogonal to this PR, given the number of linter warnings we're having to disable related to our from_dict
pattern, I wonder whether we should document this (and consider other options?) with an ADR. cc @lukpueh
tuf/api/pylintrc
Outdated
@@ -1,5 +1,5 @@ | |||
[MESSAGE_CONTROL] | |||
disable=fixme | |||
disable=fixme,E1101 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E1101 is no-member
, a variable has been accessed which the type checker doesn't think is a valid member of that type. That makes sense, we're upcasting to the base Signed
type, then accessing members of the subtype.
Rather than disabling this globally, it would probably make more sense to disable it for only the tuf.api.metadata
module (by including the #pylint: disable=
comment at the top of the module's file).
Agreed. I'll put something together. |
14ef799
to
2e07883
Compare
Updated the pr. Had to rebase in order to return |
Add a use case for the root class to be tested in test_generic_read and test_read_write_read_compare tests in test_apy.py Signed-off-by: Martin Vrachev <[email protected]>
As per the specification (v1.0.1) length and hashes fields in timestamp metadata are optional. We have implement this in the older API (see theupdateframework#1031) and we should implement it in the new API. Signed-off-by: Martin Vrachev <[email protected]>
From the reST/sphinx docs: https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#literal-blocks I added new lines and an identation where it was missed. Signed-off-by: Martin Vrachev <[email protected]>
Signed-off-by: Martin Vrachev <[email protected]>
Add MetaMetaFileFile class to tuf.api.metadata module. This class is be used for the "meta" field in Timestamp and Snapshot metadata files described the specification. Signed-off-by: Martin Vrachev <[email protected]>
Add a new TargetFile class to tuf.api.metadata module make Targets class to use it. This class will contain information about the "targets" field from targets.json Also, update the tests for that change. Signed-off-by: Martin Vrachev <[email protected]>
In the Signed.from_dict function we are passing signed - a dictionary. Dictionaries are passed by reference and thus we want to make sure we are not changing the dictionary passed as a function argument. I had to give default function arguments for all class arguments assigned in __init__ for Root, Snapshot, Targets, and Timestamp classes. I chose "None" as the easiest solution as a default argument, but we definitely want to add proper validation which will ensure we are not creating empty or partially populated objects. I didn't want to create a discussion for sensible defaults and argument validation here. There is already existing issue for that: theupdateframework#1140 Signed-off-by: Martin Vrachev <[email protected]>
Signed-off-by: Martin Vrachev <[email protected]>
Warnings E1120 and W0212 are irrelevant in one specific case, but rule E1101 creates a lot of false-positives in tuf/metadata/api. For example, there are 4 false-positives in this file related to "meta" and "targets" fields not existing in "signed_dict" in "from_dict" functions in Timestamp, Snapshot and Targets classes. Signed-off-by: Martin Vrachev <[email protected]>
2e07883
to
383e260
Compare
I updated this pr and had to rebase in order to replace the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your continued attention on this @MVrachev. I left a few comments, only one of which feels like a blocker – you removed a comment about converting expires
so that it can be passed to the constructor as **signed_dict
, but you've changed the code away from using that pattern. Should the conversion of expires to a datetime object be removed in favour of it being handled elsewhere?
Nice catch on the sphinx doc strings, can we add some kind of CI/linter check for these in future to prevent them being merged in an incorrect state? This feels issue worthy, rather than something we need to resolve here.
@lukpueh I'd appreciate your review on this, I don't feel quite up to speed on the new metadata API yet.
hashes: A optional dictionary containing hash algorithms and the | ||
hashes resulting from applying them over the metadata file.:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if something like the following might be clearer? Or am I nitpicking?
hashes: A optional dictionary containing hash algorithms and the | |
hashes resulting from applying them over the metadata file.:: | |
hashes: An optional dictionary mapping hash algorithms to the | |
hashes resulting from applying them over the metadata file | |
contents.:: |
|
||
|
||
def __eq__(self, other: 'MetadataInfo') -> bool: | ||
"""Compare objects by their values instead of by their addresses.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: by default Python compares object
s by their identity, which is like the address but may not be. Can we use the Pythonic term identity here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, didn't know about that.
hashes: A dictionary containing hash algorithms and the | ||
hashes resulted from applying them over the target file:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the above suggestion, I'd welcome feedback on whether this is clearer or not?
hashes: A dictionary containing hash algorithms and the | |
hashes resulted from applying them over the target file:: | |
hashes: A dictionary mapping hash algorithms to the | |
hashes resulting from applying them over the metadata file | |
contents:: |
|
||
|
||
def __eq__(self, other: 'TargetInfo') -> bool: | ||
"""Compare objects by their values instead of by their addresses.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about identity here.
@@ -289,22 +291,27 @@ def __init__( | |||
def from_dict(cls, signed_dict: JsonDict) -> 'Signed': | |||
"""Creates Signed object from its JSON/dict representation. """ | |||
|
|||
# Create empty object with default or parametrized constructor with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Create empty object with default or parametrized constructor with | |
# Create empty object with default or parameterized constructor with |
signed_dict['expires']) | ||
# NOTE: We write the converted 'expires' back into 'signed_dict' above | ||
# so that we can pass it to the constructor as '**signed_dict' below, | ||
# along with other fields that belong to Signed subclasses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the removal of this comment indicate we should no longer be doing the conversion of the expires string to a datetime object here?
I've converted this PR to draft until it is ready for review. Mostly to save myself from checking the discussion each time to see what the status is. |
I feel like this pr has a lot of discussions and changes that are outdated compared to our current code. I made a new branch migrating the changes from this pr and found out that: c37bdd2 and the discussion we had with @lukpueh starting from here are no longer relevant to our code, commits I feel it makes more sense to close this pr and open a new one. |
Replaced by #1329. |
This pr addresses #1139, but doesn't fix it yet.
I will add a separate pr which will handle
Roles
andKeys
classes.Description of the changes being introduced by the pull request:
It includes the following changes:
length
andhashes
optional inTimestamp
as it is in our specificationMetaFile
class and use it in theTimestamp
andSnapshot
classesTargetFile
class and use it in theTargets
classupdate
method in our classes and use a more verbose name.Please verify and check that the pull request fulfills the following
requirements: