-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improv: Check reference prop types for all customs #513
improv: Check reference prop types for all customs #513
Conversation
Codecov Report
@@ Coverage Diff @@
## master #513 +/- ##
==========================================
- Coverage 87.22% 87.20% -0.02%
==========================================
Files 153 153
Lines 17925 17915 -10
==========================================
- Hits 15635 15623 -12
- Misses 2290 2292 +2
Continue to review full report at Codecov.
|
stix2/registration.py
Outdated
# Properties ending in "_ref/s" need to be instances of specific types to meet | ||
# our interpretation of section 3.1 of the spec | ||
_VERSION_REF_TYPES_MAP = { | ||
# TODO: The 2.0 code seems to use `ReferenceProperty`s - is that okay? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 2.0 code seems to use
ReferenceProperty
s - is that okay?
rather than ObjectReferenceProperty
s. AFAICT ReferenceProperty
just seems to be a better version of the other? Is there some spec related reason why it would be invalid in 2.0 documents?
stix2/registration.py
Outdated
# Confirm conformance with STIX 2.1 requirements for property names | ||
# | ||
# TODO: Should this be 2.1 or anything unknown assuming 2.1+, like we do | ||
# for identifier properties above? `version != "2.0"`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: Should this be 2.1 for anything unknown assuming 2.1+, like we do for identifier properties above?
version != "2.0"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this != "2.0"
in the current diff with the expectation that we can amend it later if we need. But LMK if you'd prefer the explicit == "2.1"
.
stix2/registration.py
Outdated
# TODO: We could probably avoid doing a regex here just for the | ||
# first char, but actually there are other constraints in 3.1 which | ||
# we don't check which a regex would be suitable for | ||
# (e.g. don't use hyphens in prop names, length min/max) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an interesting note for work that could be done here or in a future PR
4735cab
to
1f71526
Compare
Lint should be gone now |
1f71526
to
6eed586
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some thoughts on how I might be able to solve my issue with an annotated reference property as described in #511 - these don't need to be resolved for this changeset to be mergable, but it makes sense to make the commentary where the isinstance()
checks exist in the diff
stix2/registration.py
Outdated
_VERSION_REF_TYPES_MAP = { | ||
# TODO: The 2.0 code seems to use `ReferenceProperty`s - is that okay? | ||
"2.0": (ObjectReferenceProperty, ReferenceProperty), | ||
"2.1": (ReferenceProperty,), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be reasonable to make this public (remove the underscore) and turn these values from tuple
s into set
s (or something similarly mutable) so that a downstream developer like me could do something like VERSION_REF_TYPES_MAP["2.1"].add(MyCustomReferenceType)
?
stix2/registration.py
Outdated
|
||
for prop_name, prop_obj in props_map.items(): | ||
tail = prop_name.rsplit("_", 1)[-1] | ||
if tail == "ref" and not isinstance(prop_obj, ref_prop_types): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, if we turned ReferenceProperty
(or all Property
/STIXBase
types?) use the abc.ABCMeta
metaclass, it would be possible to do ReferenceProperty.register(MyCustomReferenceType)
and have this subclass check pass for registered virtual subclasses.
That approach would introduce a new method on those STIX object types which could arguably collide with some STIX (custom or future) attribute, which probably isn't great. Alternatively we could implement a ReferenceProperty.__subclasshook__()
where we codify what a property needs to look like to be considered valid for use as a property ending in _ref(s)
. However, that would result in some non-obvious data validation hidden behind these isinstance()
calls by being implemented in the __subclasshook__()
. Not sure I particularly like it because of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a note for myself, stix2.properties.Property
is not a subclass of _STIXBase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're suggesting having different ReferenceProperty subclasses for enforcing different ref property requirements? I don't think the requirements on ref properties are complicated enough that we need specialized custom code for different cases. The black/whitelist on STIX types and type classes seems sufficient. Can you think of a case where that is not sufficient?
A stylistic note: I don't like using exception handling for non-exceptional circumstances, or if the same thing could be easily done in a different way. Instead of the KeyError exception stuff, how about:
ref_prop_types = _VERSION_REF_TYPES_MAP.get(version, _UNKNOWN_VERSION_REF_TYPES)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're suggesting having different ReferenceProperty subclasses for enforcing different ref property requirements? I don't think the requirements on ref properties are complicated enough that we need specialized custom code for different cases. The black/whitelist on STIX types and type classes seems sufficient. Can you think of a case where that is not sufficient?
I think you're suggesting having different ReferenceProperty subclasses for enforcing different ref property requirements? I don't think the requirements on ref properties are complicated enough that we need specialized custom code for different cases. The black/whitelist on STIX types and type classes seems sufficient. Can you think of a case where that is not sufficient?
The background in #511 which this change is in support of may be illustrative, in particular this #511 (comment) . The tl;dr is that I have a thing which I consider a reference property (reference plus some "value-add") but which is a subclass of DictionaryProperty
rather that ReferenceProperty
. I could do some trickery with multiple inheritance and avoiding the use of super()
but I really don't want to do that and would prefer to be able to register my AnnotatedReference
as a virtual subclass using the ABCMeta machinery. This would be a change which is probably exclusively useful for supporting my exotic use case though, which is why I've not implemented that in this MR without us landing on a mutually acceptable approach :)
Instead of the KeyError exception stuff, how about ...
Sure, will do.
6eed586
to
373f515
Compare
I've rebased this on top of master now that #468 is in to resolve the conflict in |
373f515
to
b15c66e
Compare
I've de-drafted this and fixed/removed the todos (the threads above can still be responded to though) to make life easier if you're satisfied with merging this prior to releasing 3.0.0. I'd like to implement one of the options for adding extra valid reference types, and I have an example of the ABCMeta approach on my staging branch at maybe-sybr/cti-python-stix2@a8fe790 , but if I have to keep a staging branch on top of 3.0.0 just for that one change, I could live with that :) |
b15c66e
to
efc9ccb
Compare
stix2/registration.py
Outdated
# Need to check both toplevel and nested properties | ||
combined_props = { | ||
**new_extension._properties, | ||
**getattr(new_extension, "_toplevel_properties", dict()), | ||
} | ||
if not combined_props: | ||
raise ValueError( | ||
"Invalid extension: must define at least one property: " + | ||
ext_type, | ||
) | ||
_validate_props(combined_props, version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chisholm - per my comment above, could you quickly look over this bit and let me know if I've correctly interpreted how my extra validation should function.
Notably, your change would still raise a ValueError
if the _properties
were empty but the _top_level_properties
were not. I'm not entirely confident that my check against the combined_properties
is correct or if a spec-complicant construction of a custom extension MUST have a property (ie. the extension type top-level-property
) if any _top_level_properties
are defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_properties
musn't be empty. At the very least, you need extension_type
as a nested property, which describes the type of extension. Checking for the existence of that particular nested property could be an additional bit of validation I guess.
I suspect we don't really envision people calling _register_extension()
directly, but if they did, then checking _toplevel_properties
against extension_type
might also make sense. Or maybe toplevel properties are just ignored if extension_type is not "toplevel-property-extension". Or maybe we produce a warning. Anyway, if you use the @CustomExtension
decorator, that should be done correctly for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I didn't mean to suggest that this would be called directly either - it's private so buyer beware IMO.
I think I'd like to keep this change from absorbing too much extra logic for validation if possible and just keep it to what's necessary to enable to checks for reference properties. I think to validate extension properties and potential top-level-properties to make sure they meet section 7.3 or section 11 requirements for custom extensions would need to live in this function anyway so it could be done in a separate MR. Perhaps we can take that on notice if you care to?
What I'm taking from your reply is that I should alter this logic to always ensure that _properties
is not empty, rather than combined_props
, which will restore the logic as it currently exists on v3.0.0/master. I'll still call _validate_props(combined_props)
as the diff currently does, meaning that all property names and ReferenceProperty
(and variant) type validations will still be performed as usual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd like to keep this change from absorbing too much extra logic for validation if possible and just keep it to what's necessary to enable to checks for reference properties. I think to validate extension properties and potential top-level-properties to make sure they meet section 7.3 or section 11 requirements for custom extensions would need to live in this function anyway so it could be done in a separate MR. Perhaps we can take that on notice if you care to?
The extra validation wasn't there before, so evidently no one has so far thought it necessary. So you wouldn't have to include it with this.
What I'm taking from your reply is that I should alter this logic to always ensure that
_properties
is not empty, rather thancombined_props
, which will restore the logic as it currently exists on v3.0.0/master. I'll still call_validate_props(combined_props)
as the diff currently does, meaning that all property names andReferenceProperty
(and variant) type validations will still be performed as usual.
Yeah. I looked to see if the spec had a requirement that a toplevel property extension must define at least one toplevel property, but I couldn't find that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually just put that into the new diff since it seemed meaningless to have a top-level-property extension that didn't include anything. Check out the new diff and see what you think.
Overall, I think doing the ref property checking consistently makes sense. It didn't make much sense to do it only for observables. And I definitely like getting rid of the mro lookups! I always thought those were ugly. It's.... approximately correct I think, but not exactly. E.g. if you registered a custom 2.0 SDO but used |
Great, hopefully with the context around my use case for
I've just had a quick skim through the 2.0 spec (specifically part 3 and 4) and I think I understand how this has evolved now. I suppose the most correct thing would be to have a a nested mapping like |
stix2/registration.py
Outdated
for prop_name in prop_names: | ||
if not re.match(PREFIX_21_REGEX, prop_name): | ||
raise ValueError("Property name '%s' must begin with an alpha character." % prop_name) | ||
tl_props = getattr(new_extension, "_toplevel_properties") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the code below would be more elegant if we changed custom.py
to set the _CustomExtension._toplevel_properties
attribute to None
rather than not defining it at all for non-top-level-property extensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So instead you would have:
tl_props = new_extension._toplevel_properties
And the code below would not change? The change doesn't seem all that significant. What did you have in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. It'd mostly just mean that all extensions have a _top_level_properties
rather than having to do getattr()
s on it. In my head, being able to always access an expected attribute and react when it's set to a value defining "not present on the represented object" feels nice. I'm not stressed about changing this if you're satisfied with the code as it stands though
39fe48d
to
6b89520
Compare
Just marking this as a draft while it needs autosquashing. |
Nah, I think treatment would only need to depend on the "class" of object, e.g. SDO vs SCO. So it would only need to be (notionally) I don't know how important it is now to be fanatically correct w.r.t. STIX 2.0, but this is a reference implementation... :) |
If we're going to do this, let's just do it right. It should be easy to adapt your code to:
Then ensure there are unit tests to:
|
6b89520
to
2f70dc7
Compare
Check out the new diff, @chisholm . I've decided against having a more complex mapping since the only special case we have to deal with is 2.0 observables and it's much more obvious to codify that with a single conditional in the reference property validation helper. I imagine that future iterations of the 2.X spec will aim to maintain coherence for all properties that appear to be references, so building infrastructure to support future deviations is just an exercise in complexity for its own sake. That should be fairly easy to pivot back to in future if the need arises (or in this PR if that's your preference, just LMK). There are tests for custom objects and observables to ensure that reference vs object reference properties are accepted and rejected appropriately. I don't do any testing for acceptance of subclasses, LMK if you'd like to see that. Since supporting that was the point of #511, I don't mind putting it in, but since the use case is now more dubiously useful, I could also see it being valid to omit them. |
stix2/registration.py
Outdated
from .utils import PREFIX_21_REGEX | ||
|
||
|
||
def _validate_ref_props(props_map, version, is_observable20=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
version
here is no longer used. I think this param can be deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, good catch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the new diff
stix2/registration.py
Outdated
elif tail == "refs" and not all(( | ||
isinstance(prop_obj, ListProperty), | ||
isinstance(getattr(prop_obj, "contained", None), ref_prop_type), | ||
)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This use of all()
undermines the normal short-circuiting behavior of boolean evaluation. Here, all args must always be evaluated before all()
itself can be called. But you could take advantage of short-circuiting here to drop the getattr()
and simplify the expression. E.g.:
elif tail == "refs" and not (
isinstance(prop_obj, ListProperty)
and isinstance(prop_obj.contained, ref_prop_type)
):
Here, prop_obj.contained
will not be evaluated unless the previous conjunct (isinstance(prop_obj, ListProperty)
) evaluated to True. Therefore, at the point of the attribute access, prop_obj must be a ListProperty and the access is guaranteed to work. So you don't need getattr()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the new diff
Previously the requirement that properties ending in `_ref(s)` were instances of an appropriate type capable of containing a STIX `identifier` (the current interpretation of section 3.1 of the STIX 2.1 spec) was only enforced for custom observables. This change refactors the property checks for custom objects to enforce this and the STIX 2.1 property name requirement (also from section 3.1) in a common helper, therefore extending the enforcement of both requirements to all custom object types created by downstream projects. There is a special case carved out for STIX 2.0 observables which are required to contain "object references" rather than identifiers. This special logic is encoded in the reference property validation helper and enabled by a kwarg which is passed exclusively by the custom observable registration helper.
2f70dc7
to
bd897c9
Compare
Thanks @chisholm :) |
Previously the requirement that properties ending in
_ref(s)
wereinstances of an appropriate type capable of containing a STIX
identifier
(the current interpretation of section 3.1 of the spec) wasonly enforced for custom observables. This change refactors the property
checks for custom objects to enforce this and the STIX 2.1 property name
requirement (also from section 3.1) in a common helper, therefore
extending the enforcement of both requirements to all custom object
types created by downstream projects.
ref: #511