Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve signed typing #1457

Merged

Conversation

jku
Copy link
Member

@jku jku commented Jun 21, 2021

EDIT: the current proposal only contains generic annotations, not the runtime enforcing that is also described below. I'm leaving the original text here for posterity, but see the commit message to see current state.


This is a PR but I invite discussion about whether it is a good idea: I implemented it first because I did not know what it would end up looking like so discussion before implementation seemed futile.

Please see comment below for detailed reasoning and a comparison with alternatives.

copying description from one of the commits:

The purpose is two-fold:
1. When we deserialize metadata, we usually know what signed type we
   expect: make it easy to enforce that
2. When we use Metadata, it is helpful if the specific signed type (and
   all of the classes attribute types are correctly annotated

Making Metadata Generic over T, where

    T = TypeVar("T", "Root", "Timestamp", "Snapshot", "Targets")

allows both of these cases to work. Using Generics is completely
optional so all existing code still works. For case 1, the following
calls will now raise a Deserialization error if the expected type is
incorrect:

    md = Metadata.from_bytes(data, signed_type=Snapshot)
    md = Metadata.from_file(filename, signed_type=Snapshot)

For case 2, the return value md of those calls is now of type
"Metadata[Snapshot]", and md.signed is now of type "Snapshot" allowing
IDE annotations and static type checking.

Adding a type argument is an unconventional way to do this: the reason
for it is that the specific type (e.g. Snapshot) is not otherwise
available at runtime. A call like this works fine and md is annotated:

    md = Metadata[Snapshot].from_bytes(data)

but it's not possible to validate that "data" contains a "Snapshot",
because the value "Snapshot" is not defined at runtime at all, it is
purely an annotation. So an actual argument is needed.

Fixes #1433

Please verify and check that the pull request fulfills the following
requirements
:

  • The code follows the Code Style Guidelines
  • Tests have been added for the bug fix or new feature
  • [] Docs have been added for the bug fix or new feature

@jku
Copy link
Member Author

jku commented Jun 21, 2021

To get a real-world impression of how this would help us, see e.g. https://github.com/theupdateframework/tuf/blob/74fd891677817320a2f5701d436ac9f98161bc18/tuf/ngclient/_internal/metadata_bundle.py#L432

  • Many of the variables are currently not statically typed (so can't be type checked)
  • there's five lines in every method that raises if new_delegate.signed.type != "targets"

md = deserializer.deserialize(data)

# Ensure deserialized signed type matches the requested type
if signed_type is not None and signed_type != type(md.signed):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Generic needed for the signed_type check to work or they serve two different purposes (independent of each other)?

  • making Metadata class Generic over T for type checking
  • signed_type argument for runtime check

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the runtime check certainly works without any generics, yes.

They are still related in that providing signed_type also defines what the specific type T is: if the signed_type argument is not used, some other means needs to be used to get a specific type for static typing (like the syntax md = Metadata[Snapshot].from_bytes(data))

@joshuagl
Copy link
Member

Few observations to stimulate discussion, as I'm heading AFK for a few days.

  1. This is just complex and controversial enough that it might be worth creating an ADR
  2. T is not very descriptive, can we do better? Should we?
  3. Are generics available in all of the Python versions we want to support?

@jku
Copy link
Member Author

jku commented Jun 22, 2021

  1. This is just complex and controversial enough that it might be worth creating an ADR

This is probably true...

  1. T is not very descriptive, can we do better? Should we?

We can (I don't object to it) but

  • we can't call it Signed since we already have a class of that name so it would have to be SignedType or something
  • Single letter type aliases are very common (I guess because usually there is no good name for them)
  1. Are generics available in all of the Python versions we want to support?

They were included along with other annotations in 3.5. I don't see anything important in the API that has changed since then.

I believe the performance of some operations in typing has dramatically improved over time (as the implementation has moved more into CPython core) but our uses cases are unlikely to ever notice that in any way.

@jku
Copy link
Member Author

jku commented Jun 23, 2021

So I wrote an ADR, but then started thinking is this really appropriate? It's a one-off decision as far as I can tell, not something that needs to be referred to to later on...

Anyway, I think the text may be useful so here it is:

ADR: Use Generics in Metadata API to enable better static typing

Technical Story:
#1433
#1457

Context and Problem Statement

Metadata API is type annotated quite well. The annotations are a major benefit to both users and developers of the API as they enable:

  • static type checking run by developers and by CI: this prevents errors from getting integrated into the code base and improves the code quality
  • productivity improving IDE integrations (live type checking in IDE)

The Metadata API design naturally leads to usages like this:

root_md = Metadata.from_file("root.json")
for keyid in root_md.signed.role.keyids:
    key = root_md.signed.keys[keyid]
    key.verify_signature(unverified_metadata)

The issue here is that while the developer knows from context that signed type is (or should be) Root, static typing tools cannot know this (and it is not verified at runtime). Because signed type is not known, almost no types in above example are known, and the protections of static type checking are not available.

Considered Options

  • Document other usage patterns
  • Add More Metadata-class derivatives
  • Make Metadata a Generic container

Decision Outcome

Chosen option: Make Metadata a Generic container

The slight increase in code complexity is outweighed by the advantages:

  • Existing code continues to work
  • static typing starts working with minimal code changes
  • deserialized metadata types can be checked at runtime as well
  • API surface does not grow meaningfully

For users of the API the change is either invisible (as the feature is completely optional) or easy to use: It typically requires either

  • Deserializing with e.g. Root.metadata_from_*() instead of Metadata.from_*()
  • if Metadata needs to be annotated, using the format familiar from standard library containers: e.g. Metadata[Root]

Pros and Cons of the Options

Document other usage patterns

We could tell users to do this instead:

root_md = Metadata.from_file("root.json")
root:Root = root_md.signed
for keyid in root.role.keyids:
    key = root.keys[keyid]
    key.verify_signature(unverified_metadata)

This does let static type checker see the types here, but unfortunately the types are completely based on the type specified by the programmer. Also, the type chosen by the programmer cannot be verified at runtime in any way.

Add more Metadata-class derivatives

We could add four new classes (RootMetadata, SnapshotMetadata, ...) that derive from Metadata and that define their signed attributes with the correct type. This would allow creating objects that are type checkable:

root_md = RootMetadata.from_file("root.json")
for keyid in root_md.signed.role.keyids:
    key = root_md.signed.keys[keyid]
    key.verify_signature(unverified_metadata)

This also allows adding code in the new classes deserialization paths that would raise if the type in "root.json" is not what was expected.

The downside here is adding four new classes to the most visible part of the API for practically no runtime purpose at all.

Make Metadata a Generic container

Making Metadata a Generic container (much like List or Dict) allows static typing to automatically work. While Metadata is "generic" it is also constrained to contain only one of the four types we know are valid. Types in the example below are fully understood by static type checker:

md = Root.metadata_from_file("root.json")
for keyid in md.signed.role.keyids:
    key = md.signed.keys[keyid]
    key.verify_signature(unverified_metadata)

This also allows the Signed.metadata_from_*() constructors to verify the type at runtime.

The downsides are

  • using an advanced Python feature, generics, in a code base that is mostly quite simple: while generics is well used in the Python standard library (all containers are Generic) it's not very common outside it.
  • The Signed.metadata_from_*() constructors do not add new code but there is a lot of boilerplate (signatures and docstrings)

@jku
Copy link
Member Author

jku commented Jun 28, 2021

Added an explanation in Metadata docstring

@jku
Copy link
Member Author

jku commented Jun 29, 2021

Based on discussion on slack, I added a commit that moves the "typed constructors" to Signed, so

root_md = Root.metadata_from_file("root.json")

Functionality is the same as before but now we do not need the ugly signed_type variable. Using the API now looks quite good IMO (and Metadata.from_file() still exists so "unknown" metadata can be loaded if the need arises)

There's not any more code but unfortunately adding two abstract methods to Signed means a lot of boilerplate (10 method signatures and docstrings) ... I think this is still better than adding new Metadata-derivatives

I also edited the ADR-lookalike comment above to reflect the current style

@jku jku force-pushed the use-generics-to-improve-signed-typing branch from ee98aa3 to 1758fbb Compare June 29, 2021 14:41
@joshuagl joshuagl changed the title Use generics to improve signed typing Improve signed typing Jun 30, 2021
Copy link
Contributor

@sechkova sechkova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how Generics help with determining the exact "Signed" type and I also prefer the changes from the latest commit rather than having an additional argument.

Before:
    root_md = Metadata._from_file(filename, signed_type=Root)
after:
    root_md = Root.metadata_from_file(filename)

Unfortunately it seems like we have already started abusing the call stack when constructing a Metadata object. It now roughly looks like this:

Root.metadata_from_bytes()
-> Metadata.from_bytes()
    -> deserializer.deserialize()
        -> Metadata.from_dict()
            ...
            _type = metadata["signed"]["type"]
            inner_cls = Root
            -> Root.from_dict()
             # Here we do another check of _type

       # return Metadata object
  
   # back to Root.metadata_from_bytes()
   if not isinstance (metadata.signed, cls)
       raise
   return Metadata

One option is to revert to the initial form of metadata proposed while it was still under development and have all methods creating metadata objects as part of Signed. The call stack will remain similar but we get rid of the hoping from Signed to Metadata and back. This means an inconsistency in the to/from methods pairs, though:

class Metadata(Generic[T]):
     sign()
     verify_delegate()
     to_dict()
     to_file()
    
class Signed:
     metadata_from_dict() -> Metadata
     metadata_from_file() -> Metadata
     metadata_from_bytes() -> Metadata
     ...
Root.metadata_from_bytes()
    -> deserializer.deserialize() # need to pass cls type to deserializer
        -> Root.from_dict()
           # Signed type check here
   return Metadata

I haven't actually tested this proposal so maybe there are pitfalls that I haven't spotted.

@jku
Copy link
Member Author

jku commented Jul 5, 2021

Root.metadata_from_bytes()
-> deserializer.deserialize() # need to pass cls type to deserializer

Jumping over Metadata.from_bytes() like this is totally possible, but it does mean that there are now 5 (or 9?) places that initialize a default JsonDeserializer() instead of one place.

Root.metadata_from_bytes()
-> deserializer.deserialize() # need to pass cls type to deserializer
-> Root.from_dict()
# Signed type check here
return Metadata

You still can't remove the last type check in Root.metadata_from_bytes(): deserializer returns any type -- whatever the data contains (could be Root, but could be something else). The check in _common_fields_from_dict() is just a sanity check that the Signed type and _type match, it does not check that the type is what we wanted.

Copy link
Collaborator

@MVrachev MVrachev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked that we were able to remove the signed_type argument.
Still, I agree we have soo many _type checks...
I just remembered we added additional _type checks in the TrustedMetadataSet...
Maybe this is a good time to rethink how we do initialization through Metadata_from_bytes() as Teodora suggested.

@@ -127,7 +124,8 @@ def from_dict(cls, metadata: Dict[str, Any]) -> "Metadata":
signatures[sig.keyid] = sig

return cls(
signed=inner_cls.from_dict(metadata.pop("signed")),
# Specific type T is not known at static type check time: cast
signed=cast(T, inner_cls.from_dict(metadata.pop("signed"))),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, you are using cast here to signal the type checker that you expect the type of the returned metadata.signed to be of type T
Wondering when does T gets actually assigned?

Copy link
Member Author

@jku jku Jul 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, it's useful to remember that T and cast() have zero runtime effect: cast() is me promising to mypy (not cpython) that signed will be one of the types T once the code runs, and telling mypy to not worry about it. The cpython runtime doesn't know anything about T or that promise.

For the static typing to work __init__() / from_dict() themselves do not need to define what type T is: calling methods like Root.metadata_from_file() do define it (as the return value is Metadata[Root]).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see this in action if you use the current constructor with this branch: Metadata.from_file() still works but return value is not statically typed with the correct type (it's just Metadata[T]) because there is nothing the static type check could use to figure it out.

conveniently at least VS Code offers me all the possible completions (Root,Targets,Snapshot,Timestamp) when T is not defined so that's a win-win.

@jku
Copy link
Member Author

jku commented Jul 12, 2021

The issues in Teodoras mypy branch are a pretty good example of what this is about:
#1489 (comment)

It also gives us a place to see the alternatives. There are surprisingly few places where this is an issue in ngclient after all: maybe 10-15 well placed calls like assert(isinstance(self._trusted_set.snapshot.signed, Snapshot)) would make the checks work... it just looks so ugly

@joshuagl
Copy link
Member

The issues in Teodoras mypy branch are a pretty good example of what this is about:
#1489 (comment)

It also gives us a place to see the alternatives. There are surprisingly few places where this is an issue in ngclient after all: maybe 10-15 well placed calls like assert(isinstance(self._trusted_set.snapshot.signed, Snapshot)) would make the checks work... it just looks so ugly

Given that the alternative (assert()s) offloads responsibility to the API user, I am not a fan of that. The fact that it looks ugly makes the option even easier to argue against.

The work in this PR to use generics feels like the best solution we have to a problem we want to solve, that is – ensuring that the API is as easy to use as possible. I'm not entirely fond of the contained object constructing the container, but I don't yet have a better alternative to propose.

I will try to spend some time with this PR and give it a more thorough review next week.

@sechkova
Copy link
Contributor

bandit doesn't fully agree with asserts too:
#1489 (comment)

@jku jku force-pushed the use-generics-to-improve-signed-typing branch 3 times, most recently from d9fbb55 to 2aeed7d Compare August 16, 2021 13:30
@jku
Copy link
Member Author

jku commented Aug 16, 2021

New attempt: leave out the runtime type checks, just include the minimal generics:

  • this allows using static typing using multiple annotation methods:
    md = Metadata[Root].from_bytes(data)
    md:Metadata[Root] = Metadata.from_bytes(data)
  • this does not enforce that the type constraint [Root] is correct at runtime.

If this seems reasonable I can make another PR with the runtime-type-checking Metadata constructors

When we use Metadata, it is helpful if the specific signed type (and all of
the signed types attribute types are correctly annotated. Currently this is
not possible.

Making Metadata Generic with constraint T, where

    T = TypeVar("T", "Root", "Timestamp", "Snapshot", "Targets")

allows these annotations. Using Generic annotations is completely
optional so all existing code still works -- the changes in test code
are done to make IDE annotations more useful in the test code, not
because they are required.

Examples:

    md = Metadata[Root].from_bytes(data)
    md:Metadata[Root] = Metadata.from_bytes(data)

In both examples md.signed is now statically typed as "Root" allowing IDE
annotations and static type checking by mypy.

Note that it's not possible to validate that "data" actually contains a
root metadata at runtime in these examples as the annotations are _not_
visible at runtime at all: new constructors would have to be added for that.

from_file() is now a class method like from_bytes() to make sure both
have the same definition of "T" when from_file() calls from_bytes():
This makes mypy happy.

Partially fixes theupdateframework#1433

Signed-off-by: Jussi Kukkonen <[email protected]>
@jku jku force-pushed the use-generics-to-improve-signed-typing branch from 2aeed7d to 13e20e9 Compare August 16, 2021 13:39
@jku jku requested a review from sechkova August 16, 2021 14:26
Copy link
Member

@joshuagl joshuagl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for narrowing scope to make the PR easier to reason about. LGTM.

@jku jku merged commit d3441f0 into theupdateframework:develop Aug 18, 2021
@jku jku deleted the use-generics-to-improve-signed-typing branch December 30, 2024 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metadata API: Improve metadata.signed "typing"
4 participants