Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype descriptors and validation functions #1366

Closed

Conversation

MVrachev
Copy link
Collaborator

Description of the changes being introduced by the pull request:

This experiment tries to help us envision how validation will look like
if we decide to use validation functions + descriptors.

What I found with this experiment is that python descriptors can't
handle dictionary keys validation as well as dictionary values.
More precisely, we can't validate that each of the "roles" in Root is
one of the 4 metadata types or that Root role keyids are unique.

The reason is that for example if we are to run this:
"root.roles = 3" this will invoke validation for root.roles, but
if we invoke: "root.roles["dwad"] = 3" then we are getting an element
of the dictionary roles and then assigning it a new value.

Please verify and check that the pull request fulfills the following
requirements
:

  • The code follows the Code Style Guidelines
  • Tests have been added for the bug fix or new feature
  • Docs have been added for the bug fix or new feature

Those functions are created to be used during model initialization.
They are the stepping stone for other validation solutions (like
python descriptors, decorators, and ValidationMixin) which will
reuse part of them.

For more context read ADR 0007.

Signed-off-by: Martin Vrachev <[email protected]>
Add validation functions for all Root dictionary key and values as
described in the spec.

Signed-off-by: Martin Vrachev <[email protected]>
This experiment tries to help us envision how validation will look like
if we decide to use validation functions + descriptors.

What I found with this experiment is that python descriptors can't
handle dictionary keys validation as well as dictionary values.
More precisely, we can't validate that each of the "roles" in Root is
one of the 4 metadata types or that Root role keyids are unique.

The reason is that for example if we are to run this:
"root.roles = 3" this will invoke validation for root.roles, but
if we invoke: "root.roles["dwad"] = 3" then we are getting an element
of the dictionary roles and then assigning it a new value.

Signed-off-by: Martin Vrachev <[email protected]>
@MVrachev MVrachev marked this pull request as draft April 28, 2021 12:41
@jku
Copy link
Member

jku commented Apr 29, 2021

Writing down some thoughts on this (wall of text sorry).

The task at hand is not "write validation code" -- or if it was that I want to now change it to "1. improve security against malicious json, and 2. make API easy to use correctly and hard to use incorrectly (with least amount of most obvious code)". This is abstract but I do have practical suggestions below. The reason I say this is that I fear we are concentrating too much on adding validation when there are other solutions (like modifying the implementation) to achieve the same goals.

How should we proceed with this task? I believe we need to one by one analyze individual items in the API (starting with the small bits like strings and continuing to larger items). For each item we should consider and document

  • what can go wrong with this item
  • where is this used? where is it set? Should it be used in these places?
  • what can we do to improve the safety/usability?
  • if we choose validation, define what exactly do we want to validate

and only then implement chosen improvement, if any. Many of these improvements will include validation but we should not add validation if we can achieve the goals without it and validation may be only part of the solution to a specific issue. Also what exactly is validated should be spelled out somewhere -- otherwise it's impossible to tell if the validation implementation is correct or not.

This is hard work but I believe it's the path to better code. Trying to implement a lot validation at once will not lead to good quality.

As examples I went through analysis for _type and spec_version:

  • _type should not be user-settable at all (it does not make sense), and deserialization validation of _type happens in Metadata already (the fact that type is a Signed constructor argument is IMO a bug that we should fix: each Signed implementation should know its type without it being told what it is): no validation should be required here
  • spec_version: deserialization should be validated but there does not seem any reason for spec_version attribute to be writable: we probably want to prevent modifying that instead of validating the new value. Calling a validation function from constructor only seems fine to me (if modifying the public attribute is not possible) but other options do exist. Validation should include
    • value can be parsed as three ints separated by dots (specified by semver)
    • maybe comparison to library supported version? -- this is implementation defined, not specified so we can choose. Updater currently raises on major version mismatch and logs on minor

I would start with individual PRs with that sort of analysis and proposed fixes. I would probably start by just adding the validation calls in constructors and maybe making the attributes properties with setters and see how it goes from there -- but I can see the value in trying other integration methods as well (descriptors/decorations)... I'm assuming a single solution is not going to cover all cases and starting simple is good (so we can change our minds).

More detail comments:

  • keeping validation functions outside of metadata.py seems fine to me
  • looking at the example, descriptors IMO hide the validation and are still magic: When I'm reading metadata.py I can't see what gets validated and when, and I think that's bad. I don't need to know details of the validation but I would like to see e.g. validate_spec_version(spec_version) in the constructor or a property setter (or maybe it's a @validate_spec_version instead) -- anything that makes it easy to see what is validated, and easy to follow through to the validation code.
  • I don't like the type checks -- they may be useful in specific situations but generally speaking I suggest letting the validation fail with TypeErrors or AttributeErrors instead
  • I do like the idea of testing how validation reacts to completely wrong types though (I suggest only covering types that json can contain) -- this should give us a good idea where we might need better type checks

@MVrachev
Copy link
Collaborator Author

MVrachev commented Apr 29, 2021

I agree with the general idea to rethink how we store and operate with the different metadata fields.

I plan to follow this process for each of the metadata attributes:

  1. How do we use it?
  2. What might go wrong?
  3. Rethink how we store it?
  4. If, after the changes in step 4 there are concerns, then add a validation function.
  5. Make the necessary changes for points 1 - 5., think and propose a way to integrate it into the class/classes (decorators, descriptors, etc.)
  6. create a pr for all additional changes on top of the validation functions.
  7. document my findings in a commit message and/or somewhere else

@MVrachev
Copy link
Collaborator Author

Closing this one as it's no longer applicable.
We decided we will research each attribute individually and this decide on its validation.

@MVrachev MVrachev closed this Jun 22, 2021
MVrachev added a commit to MVrachev/tuf that referenced this pull request Jun 22, 2021
A while ago we decided that it's best to research each of the individuals
attributes one by one and identify what level of validation it needs
compared to how we use it:
theupdateframework#1366 (comment).

This work is ongoing and there are a couple of commits already merged
for this:
- theupdateframework@6c5d970
- theupdateframework@f20664d
- theupdateframework@41afb1e

We want to be able to test the attributes validation against known bad
values.
The way we want to do that is with table testing we have added
using decorators for our metadata classes defined in New API:
theupdateframework#1416.
This gives us an easy way to add new cases for each of the attributes and
not depend on external files.

Signed-off-by: Martin Vrachev <[email protected]>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Jun 23, 2021
A while ago we decided that it's best to research each of the individuals
attributes one by one and identify what level of validation it needs
compared to how we use it:
theupdateframework#1366 (comment).

This work is ongoing and there are a couple of commits already merged
for this:
- theupdateframework@6c5d970
- theupdateframework@f20664d
- theupdateframework@41afb1e

We want to be able to test the attributes validation against known bad
values.
The way we want to do that is with table testing we have added
using decorators for our metadata classes defined in New API:
theupdateframework#1416.
This gives us an easy way to add new cases for each of the attributes and
not depend on external files.

Signed-off-by: Martin Vrachev <[email protected]>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Jul 5, 2021
A while ago we decided that it's best to research each of the individuals
attributes one by one and identify what level of validation it needs
compared to how we use it:
theupdateframework#1366 (comment).

This work is ongoing and there are a couple of commits already merged
for this:
- theupdateframework@6c5d970
- theupdateframework@f20664d
- theupdateframework@41afb1e

We want to be able to test the attributes validation against known bad
values.
The way we want to do that is with table testing we have added
using decorators for our metadata classes defined in New API:
theupdateframework#1416.
This gives us an easy way to add new cases for each of the attributes and
not depend on external files.

Signed-off-by: Martin Vrachev <[email protected]>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Jul 6, 2021
A while ago we decided that it's best to research each of the individuals
attributes one by one and identify what level of validation it needs
compared to how we use it:
theupdateframework#1366 (comment).

This work is ongoing and there are a couple of commits already merged
for this:
- theupdateframework@6c5d970
- theupdateframework@f20664d
- theupdateframework@41afb1e

We want to be able to test the attributes validation against known bad
values.
The way we want to do that is with table testing we have added
using decorators for our metadata classes defined in New API:
theupdateframework#1416.
This gives us an easy way to add new cases for each of the attributes and
not depend on external files.

Signed-off-by: Martin Vrachev <[email protected]>
MVrachev added a commit to MVrachev/tuf that referenced this pull request Jul 7, 2021
A while ago we decided that it's best to research each of the individuals
attributes one by one and identify what level of validation it needs
compared to how we use it:
theupdateframework#1366 (comment).

This work is ongoing and there are a couple of commits already merged
for this:
- theupdateframework@6c5d970
- theupdateframework@f20664d
- theupdateframework@41afb1e

We want to be able to test the attributes validation against known bad
values.
The way we want to do that is with table testing we have added
using decorators for our metadata classes defined in New API:
theupdateframework#1416.
This gives us an easy way to add new cases for each of the attributes and
not depend on external files.

Signed-off-by: Martin Vrachev <[email protected]>
@MVrachev MVrachev deleted the prototype-descriptors branch July 19, 2021 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants