Prototype descriptors and validation functions #1366

MVrachev · 2021-04-28T12:41:14Z

Description of the changes being introduced by the pull request:

This experiment tries to help us envision how validation will look like
if we decide to use validation functions + descriptors.

What I found with this experiment is that python descriptors can't
handle dictionary keys validation as well as dictionary values.
More precisely, we can't validate that each of the "roles" in Root is
one of the 4 metadata types or that Root role keyids are unique.

The reason is that for example if we are to run this:
"root.roles = 3" this will invoke validation for root.roles, but
if we invoke: "root.roles["dwad"] = 3" then we are getting an element
of the dictionary roles and then assigning it a new value.

Please verify and check that the pull request fulfills the following
requirements:

The code follows the Code Style Guidelines
Tests have been added for the bug fix or new feature
Docs have been added for the bug fix or new feature

Those functions are created to be used during model initialization. They are the stepping stone for other validation solutions (like python descriptors, decorators, and ValidationMixin) which will reuse part of them. For more context read ADR 0007. Signed-off-by: Martin Vrachev <[email protected]>

Add validation functions for all Root dictionary key and values as described in the spec. Signed-off-by: Martin Vrachev <[email protected]>

This experiment tries to help us envision how validation will look like if we decide to use validation functions + descriptors. What I found with this experiment is that python descriptors can't handle dictionary keys validation as well as dictionary values. More precisely, we can't validate that each of the "roles" in Root is one of the 4 metadata types or that Root role keyids are unique. The reason is that for example if we are to run this: "root.roles = 3" this will invoke validation for root.roles, but if we invoke: "root.roles["dwad"] = 3" then we are getting an element of the dictionary roles and then assigning it a new value. Signed-off-by: Martin Vrachev <[email protected]>

jku · 2021-04-29T13:06:35Z

Writing down some thoughts on this (wall of text sorry).

The task at hand is not "write validation code" -- or if it was that I want to now change it to "1. improve security against malicious json, and 2. make API easy to use correctly and hard to use incorrectly (with least amount of most obvious code)". This is abstract but I do have practical suggestions below. The reason I say this is that I fear we are concentrating too much on adding validation when there are other solutions (like modifying the implementation) to achieve the same goals.

How should we proceed with this task? I believe we need to one by one analyze individual items in the API (starting with the small bits like strings and continuing to larger items). For each item we should consider and document

what can go wrong with this item
where is this used? where is it set? Should it be used in these places?
what can we do to improve the safety/usability?
if we choose validation, define what exactly do we want to validate

and only then implement chosen improvement, if any. Many of these improvements will include validation but we should not add validation if we can achieve the goals without it and validation may be only part of the solution to a specific issue. Also what exactly is validated should be spelled out somewhere -- otherwise it's impossible to tell if the validation implementation is correct or not.

This is hard work but I believe it's the path to better code. Trying to implement a lot validation at once will not lead to good quality.

As examples I went through analysis for _type and spec_version:

_type should not be user-settable at all (it does not make sense), and deserialization validation of _type happens in Metadata already (the fact that type is a Signed constructor argument is IMO a bug that we should fix: each Signed implementation should know its type without it being told what it is): no validation should be required here
spec_version: deserialization should be validated but there does not seem any reason for spec_version attribute to be writable: we probably want to prevent modifying that instead of validating the new value. Calling a validation function from constructor only seems fine to me (if modifying the public attribute is not possible) but other options do exist. Validation should include
- value can be parsed as three ints separated by dots (specified by semver)
- maybe comparison to library supported version? -- this is implementation defined, not specified so we can choose. Updater currently raises on major version mismatch and logs on minor

I would start with individual PRs with that sort of analysis and proposed fixes. I would probably start by just adding the validation calls in constructors and maybe making the attributes properties with setters and see how it goes from there -- but I can see the value in trying other integration methods as well (descriptors/decorations)... I'm assuming a single solution is not going to cover all cases and starting simple is good (so we can change our minds).

More detail comments:

keeping validation functions outside of metadata.py seems fine to me
looking at the example, descriptors IMO hide the validation and are still magic: When I'm reading metadata.py I can't see what gets validated and when, and I think that's bad. I don't need to know details of the validation but I would like to see e.g. validate_spec_version(spec_version) in the constructor or a property setter (or maybe it's a @validate_spec_version instead) -- anything that makes it easy to see what is validated, and easy to follow through to the validation code.
I don't like the type checks -- they may be useful in specific situations but generally speaking I suggest letting the validation fail with TypeErrors or AttributeErrors instead
I do like the idea of testing how validation reacts to completely wrong types though (I suggest only covering types that json can contain) -- this should give us a good idea where we might need better type checks

MVrachev · 2021-04-29T14:32:12Z

I agree with the general idea to rethink how we store and operate with the different metadata fields.

I plan to follow this process for each of the metadata attributes:

How do we use it?
What might go wrong?
Rethink how we store it?
If, after the changes in step 4 there are concerns, then add a validation function.
Make the necessary changes for points 1 - 5., think and propose a way to integrate it into the class/classes (decorators, descriptors, etc.)
create a pr for all additional changes on top of the validation functions.
document my findings in a commit message and/or somewhere else

MVrachev · 2021-06-22T11:39:26Z

Closing this one as it's no longer applicable.
We decided we will research each attribute individually and this decide on its validation.

A while ago we decided that it's best to research each of the individuals attributes one by one and identify what level of validation it needs compared to how we use it: theupdateframework#1366 (comment). This work is ongoing and there are a couple of commits already merged for this: - theupdateframework@6c5d970 - theupdateframework@f20664d - theupdateframework@41afb1e We want to be able to test the attributes validation against known bad values. The way we want to do that is with table testing we have added using decorators for our metadata classes defined in New API: theupdateframework#1416. This gives us an easy way to add new cases for each of the attributes and not depend on external files. Signed-off-by: Martin Vrachev <[email protected]>

MVrachev added 3 commits April 28, 2021 13:21

Validate Root dictionary key and value attributes

863fed3

Add validation functions for all Root dictionary key and values as described in the spec. Signed-off-by: Martin Vrachev <[email protected]>

MVrachev marked this pull request as draft April 28, 2021 12:41

jku mentioned this pull request May 4, 2021

API: Make _type read-only #1371

Closed

This was referenced Jun 7, 2021

Metadata Attribute research: KEY #1438

Closed

Metadata Attribute research: Threshold #1439

Closed

MVrachev closed this Jun 22, 2021

MVrachev mentioned this pull request Jun 22, 2021

Metadata API: test invalid input values for attributes #1460

Merged

3 tasks

MVrachev deleted the prototype-descriptors branch July 19, 2021 13:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototype descriptors and validation functions #1366

Prototype descriptors and validation functions #1366

MVrachev commented Apr 28, 2021

jku commented Apr 29, 2021 •

edited

Loading

MVrachev commented Apr 29, 2021 •

edited

Loading

MVrachev commented Jun 22, 2021

Prototype descriptors and validation functions #1366

Prototype descriptors and validation functions #1366

Conversation

MVrachev commented Apr 28, 2021

jku commented Apr 29, 2021 • edited Loading

MVrachev commented Apr 29, 2021 • edited Loading

MVrachev commented Jun 22, 2021

jku commented Apr 29, 2021 •

edited

Loading

MVrachev commented Apr 29, 2021 •

edited

Loading