Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add input validation to simple metadata api #1140

Closed
lukpueh opened this issue Sep 10, 2020 · 12 comments
Closed

Add input validation to simple metadata api #1140

lukpueh opened this issue Sep 10, 2020 · 12 comments
Milestone

Comments

@lukpueh
Copy link
Member

lukpueh commented Sep 10, 2020

Coordinate with validation guidelines #1130

Description of issue or feature request:
Some suggestions:

  • Avoid schema, see Revise schema and formats facility secure-systems-lab/securesystemslib#183 (just don't use it)
  • Make use of type hints for simple type validation
  • Perform additional non-metadata parameter validation at user boundary
  • Provide methods to validate JSON representation at user boundary, i.e. fail on bad json metadata in from_json_file/to_json_file method, but with option to
    disable check as there might be a justified reason to read or write WIP
    metadata to json.
  • Be lenient on bad/invalid metadata objects in memory, they might be
    work in progress. E.g. it might be convenient to create empty metadata
    and assign attributes later on.
  • Consider using in-toto style ValidationMixin (see the mixin and it's usage for details).

Current behavior:
No input validation

Expected behavior:
Add input validation

@lukpueh lukpueh added this to the Foundations milestone Sep 10, 2020
lukpueh added a commit to lukpueh/tuf that referenced this issue Sep 10, 2020
See:
Add root metadata class to new TUF metadata model theupdateframework#1137
Add classes for complex metadata fields theupdateframework#1139
Add input validation to simple metadata api theupdateframework#1140

Signed-off-by: Lukas Puehringer <[email protected]>
MVrachev pushed a commit to MVrachev/tuf that referenced this issue Sep 17, 2020
See:
Add root metadata class to new TUF metadata model theupdateframework#1137
Add classes for complex metadata fields theupdateframework#1139
Add input validation to simple metadata api theupdateframework#1140

Signed-off-by: Lukas Puehringer <[email protected]>
@joshuagl joshuagl closed this as completed Dec 1, 2020
@joshuagl joshuagl reopened this Dec 1, 2020
@joshuagl
Copy link
Member

joshuagl commented Dec 1, 2020

  • Consider using in-toto style ValidationMixin (see the mixin and it's usage for details).

This looks like a nice pattern. Each class defines appropriate validation methods, which are called by the Mixin's validate() method at the end of the object's construction.

Any thoughts on whether a decorator might be a more idiomatic approach than a Mixin? We could wrap classes with a decorator which performs validation after init, before returning the instantiated object.

@trishankatdatadog
Copy link
Member

Another idea is to copy validation techniques Django/Rails use...

@joshuagl
Copy link
Member

joshuagl commented Dec 1, 2020

Nice idea. I took a brief look those look to be quite complicated and not equivalent to what we're trying to do here Django Model validation.

@lukpueh
Copy link
Member Author

lukpueh commented Dec 3, 2020

Good thinking, @trishankatdatadog. But I agree with @joshuagl that we might not need something as powerful (Django's validation is really tailored towards web forms and/or ORM, similar is true for WTForms, which I also briefly considered).

I think the in-toto approach is not so bad. It's also featured in this quite interesting blog post about different instance attribute validation techniques, at least the "individual validation functions" aspect (not the neat self-inspecting mixin part).

The blog also mentions two promising validation libraries, marshmallow and pydantic, which both are actually de/serialization libraries with validation features. Given that we want to minimize dependencies (see #1165), I'm leaning towards rolling our own validation, which doesn't even have to be that generic (see secure-systems-lab/securesystemslib#183).

The blog also mentions a Python built-in feature, i.e. Descriptors, that as per official docs seems well-suited for attribute validators. Although it looks interesting, I'm unsure if it gives us the flexibility of e.g. initializing empty objects, assigning values, and only then calling validate, which might be a desirable usage pattern (see snippet 2 in #1223 (comment)). Same reservation goes for decorators, which the blog also mentions in conjunction with descriptors. But I can check if there is a solution that involves decorators and/or descriptors that allows for said flexibility.

What I like about both the decorator and descriptor approach(es) is that they make the constraints on the attributes more visible (in the head of the class definition) than vanilla validation methods.

@joshuagl
Copy link
Member

joshuagl commented Dec 3, 2020

The blog also mentions a Python built-in feature, i.e. Descriptors, that as per official docs seems well-suited for attribute validators. Although it looks interesting, I'm unsure if it gives us the flexibility of e.g. initializing empty objects, assigning values, and only then calling validate, which might be a desirable usage pattern (see snippet 2 in #1223 (comment)).

Descriptors look nice, and I'm all for avoid additional dependencies. I believe the pattern of: initialising empty objects, assigning values, and then validating should still work – so long as our empty objects have sane defaults. Based on 5mins experimentation in the Python interpreter:

>>> class MetadataType:
...     def __get__(self, obj, objtype=None):
...         return self.value
...     def __set__(self, obj, value):
...         if value not in ['root', 'timestamp', 'snapshot', 'targets']:
...             raise ValueError('Invalid _type field')
...         self.value = value
...
>>> class Metadata:
...     type = MetadataType()
...     def __init__(self, type):
...         self.type = type
...
>>> class Targets(Metadata):
...     def __init__(self):
...       super().__init__('targets')
...
>>> md = Metadata('badger')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in __init__
  File "<stdin>", line 6, in __set__
ValueError: Invalid _type field
>>> md = Metadata('root')
>>> md.type = 'badger'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 6, in __set__
ValueError: Invalid _type field
>>> t = Targets()
>>> t.type
'targets'

@joshuagl
Copy link
Member

joshuagl commented Dec 7, 2020

The __slots__ mechanism, described in the Descriptor HowTo, could be something worth including in our new classes also. hat-tip @sechkova

@lukpueh
Copy link
Member Author

lukpueh commented Dec 7, 2020

I wonder if sane defaults will always be possible, e.g. when thinking of the newly added MetadataInfo or TargetInfo in #1223.

But maybe the usage pattern that requires initialization of empty objects is suboptimal. I must say that it would be quite nice to always have certainty about the validity of tuf objects.

MVrachev added a commit to MVrachev/tuf that referenced this issue Dec 10, 2020
I had to give default function arguments for all class arguments
assigned in __init__ for Root, Snapshot, Targets, and Timestamp classes.
I chose "None" as the easiest solution as a default argument, but
we definitely want to add proper validation which will ensure
we are not creating empty or partially populated objects.
I didn't want to create a discussion for sensible defaults
and argument validation here.
There is already existing issue for that:
theupdateframework#1140

Signed-off-by: Martin Vrachev <[email protected]>
MVrachev added a commit to MVrachev/tuf that referenced this issue Dec 14, 2020
In the Signed.from_dict function we are passing signed - a dictionary.
Dictionaries are passed by reference and thus we want to make sure
we are not changing the dictionary passed as a function argument.

I had to give default function arguments for all class arguments
assigned in __init__ for Root, Snapshot, Targets, and Timestamp classes.
I chose "None" as the easiest solution as a default argument, but
we definitely want to add proper validation which will ensure
we are not creating empty or partially populated objects.
I didn't want to create a discussion for sensible defaults
and argument validation here.
There is already existing issue for that:
theupdateframework#1140

Signed-off-by: Martin Vrachev <[email protected]>
MVrachev added a commit to MVrachev/tuf that referenced this issue Jan 5, 2021
In the Signed.from_dict function we are passing signed - a dictionary.
Dictionaries are passed by reference and thus we want to make sure
we are not changing the dictionary passed as a function argument.

I had to give default function arguments for all class arguments
assigned in __init__ for Root, Snapshot, Targets, and Timestamp classes.
I chose "None" as the easiest solution as a default argument, but
we definitely want to add proper validation which will ensure
we are not creating empty or partially populated objects.
I didn't want to create a discussion for sensible defaults
and argument validation here.
There is already existing issue for that:
theupdateframework#1140

Signed-off-by: Martin Vrachev <[email protected]>
MVrachev added a commit to MVrachev/tuf that referenced this issue Jan 18, 2021
In the Signed.from_dict function we are passing signed - a dictionary.
Dictionaries are passed by reference and thus we want to make sure
we are not changing the dictionary passed as a function argument.

I had to give default function arguments for all class arguments
assigned in __init__ for Root, Snapshot, Targets, and Timestamp classes.
I chose "None" as the easiest solution as a default argument, but
we definitely want to add proper validation which will ensure
we are not creating empty or partially populated objects.
I didn't want to create a discussion for sensible defaults
and argument validation here.
There is already existing issue for that:
theupdateframework#1140

Signed-off-by: Martin Vrachev <[email protected]>
MVrachev added a commit to MVrachev/tuf that referenced this issue Jan 20, 2021
In the Signed.from_dict function we are passing signed - a dictionary.
Dictionaries are passed by reference and thus we want to make sure
we are not changing the dictionary passed as a function argument.

I had to give default function arguments for all class arguments
assigned in __init__ for Root, Snapshot, Targets, and Timestamp classes.
I chose "None" as the easiest solution as a default argument, but
we definitely want to add proper validation which will ensure
we are not creating empty or partially populated objects.
I didn't want to create a discussion for sensible defaults
and argument validation here.
There is already existing issue for that:
theupdateframework#1140

Signed-off-by: Martin Vrachev <[email protected]>
@joshuagl
Copy link
Member

In 7cfd100#r541018751 we discussed whether JsonDict is the right name for our generic Dict[str, Any] type, and agreed to handle that type's name with this issue instead.

@MVrachev
Copy link
Collaborator

MVrachev commented Apr 26, 2021

In a discussion with @jku, we noticed that our tests for the metadata classes don't test instantiating metadata objects without their required fields.
We should add tests for that.

@MVrachev
Copy link
Collaborator

@MVrachev
Copy link
Collaborator

@lukpueh and/or @joshuagl do you agree with me?

@joshuagl
Copy link
Member

Agreed, thanks Martin. Closing because we have implemented validation on a case-by-case basis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants