-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve signed typing #1457
Improve signed typing #1457
Conversation
To get a real-world impression of how this would help us, see e.g. https://github.com/theupdateframework/tuf/blob/74fd891677817320a2f5701d436ac9f98161bc18/tuf/ngclient/_internal/metadata_bundle.py#L432
|
tuf/api/metadata.py
Outdated
md = deserializer.deserialize(data) | ||
|
||
# Ensure deserialized signed type matches the requested type | ||
if signed_type is not None and signed_type != type(md.signed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Generic
needed for the signed_type
check to work or they serve two different purposes (independent of each other)?
- making Metadata class Generic over T for type checking
- signed_type argument for runtime check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the runtime check certainly works without any generics, yes.
They are still related in that providing signed_type
also defines what the specific type T is: if the signed_type argument is not used, some other means needs to be used to get a specific type for static typing (like the syntax md = Metadata[Snapshot].from_bytes(data)
)
Few observations to stimulate discussion, as I'm heading AFK for a few days.
|
This is probably true...
We can (I don't object to it) but
They were included along with other annotations in 3.5. I don't see anything important in the API that has changed since then. I believe the performance of some operations in |
So I wrote an ADR, but then started thinking is this really appropriate? It's a one-off decision as far as I can tell, not something that needs to be referred to to later on... Anyway, I think the text may be useful so here it is: ADR: Use Generics in Metadata API to enable better static typingContext and Problem StatementMetadata API is type annotated quite well. The annotations are a major benefit to both users and developers of the API as they enable:
The Metadata API design naturally leads to usages like this:
The issue here is that while the developer knows from context that signed type is (or should be) Root, static typing tools cannot know this (and it is not verified at runtime). Because signed type is not known, almost no types in above example are known, and the protections of static type checking are not available. Considered Options
Decision OutcomeChosen option: Make Metadata a Generic container The slight increase in code complexity is outweighed by the advantages:
For users of the API the change is either invisible (as the feature is completely optional) or easy to use: It typically requires either
Pros and Cons of the OptionsDocument other usage patternsWe could tell users to do this instead:
This does let static type checker see the types here, but unfortunately the types are completely based on the type specified by the programmer. Also, the type chosen by the programmer cannot be verified at runtime in any way. Add more Metadata-class derivativesWe could add four new classes (RootMetadata, SnapshotMetadata, ...) that derive from Metadata and that define their signed attributes with the correct type. This would allow creating objects that are type checkable:
This also allows adding code in the new classes deserialization paths that would raise if the type in "root.json" is not what was expected. The downside here is adding four new classes to the most visible part of the API for practically no runtime purpose at all. Make Metadata a Generic containerMaking Metadata a Generic container (much like List or Dict) allows static typing to automatically work. While Metadata is "generic" it is also constrained to contain only one of the four types we know are valid. Types in the example below are fully understood by static type checker:
This also allows the The downsides are
|
Added an explanation in Metadata docstring |
Based on discussion on slack, I added a commit that moves the "typed constructors" to Signed, so
Functionality is the same as before but now we do not need the ugly signed_type variable. Using the API now looks quite good IMO (and Metadata.from_file() still exists so "unknown" metadata can be loaded if the need arises) There's not any more code but unfortunately adding two abstract methods to Signed means a lot of boilerplate (10 method signatures and docstrings) ... I think this is still better than adding new Metadata-derivatives I also edited the ADR-lookalike comment above to reflect the current style |
ee98aa3
to
1758fbb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how Generics help with determining the exact "Signed" type and I also prefer the changes from the latest commit rather than having an additional argument.
Before:
root_md = Metadata._from_file(filename, signed_type=Root)
after:
root_md = Root.metadata_from_file(filename)
Unfortunately it seems like we have already started abusing the call stack when constructing a Metadata object. It now roughly looks like this:
Root.metadata_from_bytes()
-> Metadata.from_bytes()
-> deserializer.deserialize()
-> Metadata.from_dict()
...
_type = metadata["signed"]["type"]
inner_cls = Root
-> Root.from_dict()
# Here we do another check of _type
# return Metadata object
# back to Root.metadata_from_bytes()
if not isinstance (metadata.signed, cls)
raise
return Metadata
One option is to revert to the initial form of metadata
proposed while it was still under development and have all methods creating metadata objects as part of Signed
. The call stack will remain similar but we get rid of the hoping from Signed
to Metadata
and back. This means an inconsistency in the to/from methods pairs, though:
class Metadata(Generic[T]):
sign()
verify_delegate()
to_dict()
to_file()
class Signed:
metadata_from_dict() -> Metadata
metadata_from_file() -> Metadata
metadata_from_bytes() -> Metadata
...
Root.metadata_from_bytes()
-> deserializer.deserialize() # need to pass cls type to deserializer
-> Root.from_dict()
# Signed type check here
return Metadata
I haven't actually tested this proposal so maybe there are pitfalls that I haven't spotted.
Jumping over
You still can't remove the last type check in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked that we were able to remove the signed_type
argument.
Still, I agree we have soo many _type
checks...
I just remembered we added additional _type
checks in the TrustedMetadataSet
...
Maybe this is a good time to rethink how we do initialization through Metadata_from_bytes()
as Teodora suggested.
@@ -127,7 +124,8 @@ def from_dict(cls, metadata: Dict[str, Any]) -> "Metadata": | |||
signatures[sig.keyid] = sig | |||
|
|||
return cls( | |||
signed=inner_cls.from_dict(metadata.pop("signed")), | |||
# Specific type T is not known at static type check time: cast | |||
signed=cast(T, inner_cls.from_dict(metadata.pop("signed"))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand, you are using cast
here to signal the type checker that you expect the type of the returned metadata.signed to be of type T
Wondering when does T
gets actually assigned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, it's useful to remember that T
and cast()
have zero runtime effect: cast() is me promising to mypy (not cpython) that signed will be one of the types T once the code runs, and telling mypy to not worry about it. The cpython runtime doesn't know anything about T or that promise.
For the static typing to work __init__() / from_dict()
themselves do not need to define what type T is: calling methods like Root.metadata_from_file()
do define it (as the return value is Metadata[Root]
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see this in action if you use the current constructor with this branch: Metadata.from_file()
still works but return value is not statically typed with the correct type (it's just Metadata[T]) because there is nothing the static type check could use to figure it out.
conveniently at least VS Code offers me all the possible completions (Root,Targets,Snapshot,Timestamp) when T is not defined so that's a win-win.
The issues in Teodoras mypy branch are a pretty good example of what this is about: It also gives us a place to see the alternatives. There are surprisingly few places where this is an issue in ngclient after all: maybe 10-15 well placed calls like |
Given that the alternative ( The work in this PR to use generics feels like the best solution we have to a problem we want to solve, that is – ensuring that the API is as easy to use as possible. I'm not entirely fond of the contained object constructing the container, but I don't yet have a better alternative to propose. I will try to spend some time with this PR and give it a more thorough review next week. |
|
d9fbb55
to
2aeed7d
Compare
New attempt: leave out the runtime type checks, just include the minimal generics:
If this seems reasonable I can make another PR with the runtime-type-checking Metadata constructors |
When we use Metadata, it is helpful if the specific signed type (and all of the signed types attribute types are correctly annotated. Currently this is not possible. Making Metadata Generic with constraint T, where T = TypeVar("T", "Root", "Timestamp", "Snapshot", "Targets") allows these annotations. Using Generic annotations is completely optional so all existing code still works -- the changes in test code are done to make IDE annotations more useful in the test code, not because they are required. Examples: md = Metadata[Root].from_bytes(data) md:Metadata[Root] = Metadata.from_bytes(data) In both examples md.signed is now statically typed as "Root" allowing IDE annotations and static type checking by mypy. Note that it's not possible to validate that "data" actually contains a root metadata at runtime in these examples as the annotations are _not_ visible at runtime at all: new constructors would have to be added for that. from_file() is now a class method like from_bytes() to make sure both have the same definition of "T" when from_file() calls from_bytes(): This makes mypy happy. Partially fixes theupdateframework#1433 Signed-off-by: Jussi Kukkonen <[email protected]>
2aeed7d
to
13e20e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for narrowing scope to make the PR easier to reason about. LGTM.
EDIT: the current proposal only contains generic annotations, not the runtime enforcing that is also described below. I'm leaving the original text here for posterity, but see the commit message to see current state.
This is a PR but I invite discussion about whether it is a good idea: I implemented it first because I did not know what it would end up looking like so discussion before implementation seemed futile.
Please see comment below for detailed reasoning and a comparison with alternatives.
copying description from one of the commits:
Fixes #1433
Please verify and check that the pull request fulfills the following
requirements: