Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improv: Check reference prop types for all customs #513

Conversation

maybe-sybr
Copy link
Contributor

@maybe-sybr maybe-sybr commented Jun 25, 2021

Previously the requirement that properties ending in _ref(s) were
instances of an appropriate type capable of containing a STIX
identifier (the current interpretation of section 3.1 of the spec) was
only enforced for custom observables. This change refactors the property
checks for custom objects to enforce this and the STIX 2.1 property name
requirement (also from section 3.1) in a common helper, therefore
extending the enforcement of both requirements to all custom object
types created by downstream projects.

ref: #511

@codecov-commenter
Copy link

codecov-commenter commented Jun 25, 2021

Codecov Report

Merging #513 (6b89520) into master (eaf578f) will decrease coverage by 0.01%.
The diff coverage is 81.81%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #513      +/-   ##
==========================================
- Coverage   87.22%   87.20%   -0.02%     
==========================================
  Files         153      153              
  Lines       17925    17915      -10     
==========================================
- Hits        15635    15623      -12     
- Misses       2290     2292       +2     
Impacted Files Coverage Δ
stix2/registration.py 72.30% <77.77%> (-6.36%) ⬇️
stix2/test/v20/test_custom.py 100.00% <100.00%> (ø)
stix2/test/v21/test_custom.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eaf578f...6b89520. Read the comment docs.

# Properties ending in "_ref/s" need to be instances of specific types to meet
# our interpretation of section 3.1 of the spec
_VERSION_REF_TYPES_MAP = {
# TODO: The 2.0 code seems to use `ReferenceProperty`s - is that okay?
Copy link
Contributor Author

@maybe-sybr maybe-sybr Jun 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2.0 code seems to use ReferencePropertys - is that okay?

rather than ObjectReferencePropertys. AFAICT ReferenceProperty just seems to be a better version of the other? Is there some spec related reason why it would be invalid in 2.0 documents?

# Confirm conformance with STIX 2.1 requirements for property names
#
# TODO: Should this be 2.1 or anything unknown assuming 2.1+, like we do
# for identifier properties above? `version != "2.0"`?
Copy link
Contributor Author

@maybe-sybr maybe-sybr Jun 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Should this be 2.1 for anything unknown assuming 2.1+, like we do for identifier properties above? version != "2.0"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this != "2.0" in the current diff with the expectation that we can amend it later if we need. But LMK if you'd prefer the explicit == "2.1".

# TODO: We could probably avoid doing a regex here just for the
# first char, but actually there are other constraints in 3.1 which
# we don't check which a regex would be suitable for
# (e.g. don't use hyphens in prop names, length min/max)
Copy link
Contributor Author

@maybe-sybr maybe-sybr Jun 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an interesting note for work that could be done here or in a future PR

@maybe-sybr maybe-sybr force-pushed the fix/all-refs-are-identifiers-or-quack-alike branch from 4735cab to 1f71526 Compare June 25, 2021 00:38
@maybe-sybr
Copy link
Contributor Author

Lint should be gone now

@maybe-sybr maybe-sybr force-pushed the fix/all-refs-are-identifiers-or-quack-alike branch from 1f71526 to 6eed586 Compare June 25, 2021 01:10
stix2/registration.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@maybe-sybr maybe-sybr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some thoughts on how I might be able to solve my issue with an annotated reference property as described in #511 - these don't need to be resolved for this changeset to be mergable, but it makes sense to make the commentary where the isinstance() checks exist in the diff

Comment on lines 14 to 16
_VERSION_REF_TYPES_MAP = {
# TODO: The 2.0 code seems to use `ReferenceProperty`s - is that okay?
"2.0": (ObjectReferenceProperty, ReferenceProperty),
"2.1": (ReferenceProperty,),
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be reasonable to make this public (remove the underscore) and turn these values from tuples into sets (or something similarly mutable) so that a downstream developer like me could do something like VERSION_REF_TYPES_MAP["2.1"].add(MyCustomReferenceType)?


for prop_name, prop_obj in props_map.items():
tail = prop_name.rsplit("_", 1)[-1]
if tail == "ref" and not isinstance(prop_obj, ref_prop_types):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, if we turned ReferenceProperty (or all Property/STIXBase types?) use the abc.ABCMeta metaclass, it would be possible to do ReferenceProperty.register(MyCustomReferenceType) and have this subclass check pass for registered virtual subclasses.

That approach would introduce a new method on those STIX object types which could arguably collide with some STIX (custom or future) attribute, which probably isn't great. Alternatively we could implement a ReferenceProperty.__subclasshook__() where we codify what a property needs to look like to be considered valid for use as a property ending in _ref(s). However, that would result in some non-obvious data validation hidden behind these isinstance() calls by being implemented in the __subclasshook__(). Not sure I particularly like it because of that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a note for myself, stix2.properties.Property is not a subclass of _STIXBase

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're suggesting having different ReferenceProperty subclasses for enforcing different ref property requirements? I don't think the requirements on ref properties are complicated enough that we need specialized custom code for different cases. The black/whitelist on STIX types and type classes seems sufficient. Can you think of a case where that is not sufficient?

A stylistic note: I don't like using exception handling for non-exceptional circumstances, or if the same thing could be easily done in a different way. Instead of the KeyError exception stuff, how about:

ref_prop_types = _VERSION_REF_TYPES_MAP.get(version, _UNKNOWN_VERSION_REF_TYPES)

?

Copy link
Contributor Author

@maybe-sybr maybe-sybr Jul 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're suggesting having different ReferenceProperty subclasses for enforcing different ref property requirements? I don't think the requirements on ref properties are complicated enough that we need specialized custom code for different cases. The black/whitelist on STIX types and type classes seems sufficient. Can you think of a case where that is not sufficient?
I think you're suggesting having different ReferenceProperty subclasses for enforcing different ref property requirements? I don't think the requirements on ref properties are complicated enough that we need specialized custom code for different cases. The black/whitelist on STIX types and type classes seems sufficient. Can you think of a case where that is not sufficient?

The background in #511 which this change is in support of may be illustrative, in particular this #511 (comment) . The tl;dr is that I have a thing which I consider a reference property (reference plus some "value-add") but which is a subclass of DictionaryProperty rather that ReferenceProperty. I could do some trickery with multiple inheritance and avoiding the use of super() but I really don't want to do that and would prefer to be able to register my AnnotatedReference as a virtual subclass using the ABCMeta machinery. This would be a change which is probably exclusively useful for supporting my exotic use case though, which is why I've not implemented that in this MR without us landing on a mutually acceptable approach :)

Instead of the KeyError exception stuff, how about ...

Sure, will do.

@maybe-sybr maybe-sybr force-pushed the fix/all-refs-are-identifiers-or-quack-alike branch from 6eed586 to 373f515 Compare July 8, 2021 01:06
@maybe-sybr
Copy link
Contributor Author

I've rebased this on top of master now that #468 is in to resolve the conflict in _register_extension. Everything appears to still be behaving as expected.

@maybe-sybr maybe-sybr force-pushed the fix/all-refs-are-identifiers-or-quack-alike branch from 373f515 to b15c66e Compare July 9, 2021 01:32
@maybe-sybr maybe-sybr marked this pull request as ready for review July 9, 2021 01:32
@maybe-sybr
Copy link
Contributor Author

I've de-drafted this and fixed/removed the todos (the threads above can still be responded to though) to make life easier if you're satisfied with merging this prior to releasing 3.0.0.

I'd like to implement one of the options for adding extra valid reference types, and I have an example of the ABCMeta approach on my staging branch at maybe-sybr/cti-python-stix2@a8fe790 , but if I have to keep a staging branch on top of 3.0.0 just for that one change, I could live with that :)

@maybe-sybr maybe-sybr force-pushed the fix/all-refs-are-identifiers-or-quack-alike branch from b15c66e to efc9ccb Compare July 15, 2021 06:22
@maybe-sybr
Copy link
Contributor Author

@rpiazza I've rebased this on top of v3.0.0 and resolved a conflict with the new top level property validation done for custom extensions. The tests pass but I'll make a quick comment on the diff for @chisholm to sanity check to make sure I didn't mess the expected behaviour.

Comment on lines 170 to 185
# Need to check both toplevel and nested properties
combined_props = {
**new_extension._properties,
**getattr(new_extension, "_toplevel_properties", dict()),
}
if not combined_props:
raise ValueError(
"Invalid extension: must define at least one property: " +
ext_type,
)
_validate_props(combined_props, version)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chisholm - per my comment above, could you quickly look over this bit and let me know if I've correctly interpreted how my extra validation should function.

Notably, your change would still raise a ValueError if the _properties were empty but the _top_level_properties were not. I'm not entirely confident that my check against the combined_properties is correct or if a spec-complicant construction of a custom extension MUST have a property (ie. the extension type top-level-property) if any _top_level_properties are defined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_properties musn't be empty. At the very least, you need extension_type as a nested property, which describes the type of extension. Checking for the existence of that particular nested property could be an additional bit of validation I guess.

I suspect we don't really envision people calling _register_extension() directly, but if they did, then checking _toplevel_properties against extension_type might also make sense. Or maybe toplevel properties are just ignored if extension_type is not "toplevel-property-extension". Or maybe we produce a warning. Anyway, if you use the @CustomExtension decorator, that should be done correctly for you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I didn't mean to suggest that this would be called directly either - it's private so buyer beware IMO.

I think I'd like to keep this change from absorbing too much extra logic for validation if possible and just keep it to what's necessary to enable to checks for reference properties. I think to validate extension properties and potential top-level-properties to make sure they meet section 7.3 or section 11 requirements for custom extensions would need to live in this function anyway so it could be done in a separate MR. Perhaps we can take that on notice if you care to?

What I'm taking from your reply is that I should alter this logic to always ensure that _properties is not empty, rather than combined_props, which will restore the logic as it currently exists on v3.0.0/master. I'll still call _validate_props(combined_props) as the diff currently does, meaning that all property names and ReferenceProperty (and variant) type validations will still be performed as usual.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd like to keep this change from absorbing too much extra logic for validation if possible and just keep it to what's necessary to enable to checks for reference properties. I think to validate extension properties and potential top-level-properties to make sure they meet section 7.3 or section 11 requirements for custom extensions would need to live in this function anyway so it could be done in a separate MR. Perhaps we can take that on notice if you care to?

The extra validation wasn't there before, so evidently no one has so far thought it necessary. So you wouldn't have to include it with this.

What I'm taking from your reply is that I should alter this logic to always ensure that _properties is not empty, rather than combined_props, which will restore the logic as it currently exists on v3.0.0/master. I'll still call _validate_props(combined_props) as the diff currently does, meaning that all property names and ReferenceProperty (and variant) type validations will still be performed as usual.

Yeah. I looked to see if the spec had a requirement that a toplevel property extension must define at least one toplevel property, but I couldn't find that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually just put that into the new diff since it seemed meaningless to have a top-level-property extension that didn't include anything. Check out the new diff and see what you think.

@chisholm
Copy link
Contributor

Overall, I think doing the ref property checking consistently makes sense. It didn't make much sense to do it only for observables. And I definitely like getting rid of the mro lookups! I always thought those were ugly. It's.... approximately correct I think, but not exactly. E.g. if you registered a custom 2.0 SDO but used ObjectReferenceProperty for its ref properties, I think it would pass the checks. But it shouldn't, since that property class is intended only for 2.0 SCO ref properties.

@maybe-sybr
Copy link
Contributor Author

Overall, I think doing the ref property checking consistently makes sense. It didn't make much sense to do it only for observables. And I definitely like getting rid of the mro lookups! I always thought those were ugly.

Great, hopefully with the context around my use case for ReferenceProperty variants as well, my suggestions in #513 (review) will make some more sense. LMK what you think.

It's.... approximately correct I think, but not exactly. E.g. if you registered a custom 2.0 SDO but used ObjectReferenceProperty for its ref properties, I think it would pass the checks. But it shouldn't, since that property class is intended only for 2.0 SCO ref properties.

I've just had a quick skim through the 2.0 spec (specifically part 3 and 4) and I think I understand how this has evolved now. I suppose the most correct thing would be to have a a nested mapping like VALID_REF_TYPES[stix_version][stix_object_type], but that does sound pretty clunky for a single case. Alternatively, we could allow this logic to stand since I assume the 2.0 code will be relatively static in future, and the only risk is for downstream authors doing the "wrong thing". Happy to make any changes at your suggestion.

for prop_name in prop_names:
if not re.match(PREFIX_21_REGEX, prop_name):
raise ValueError("Property name '%s' must begin with an alpha character." % prop_name)
tl_props = getattr(new_extension, "_toplevel_properties")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the code below would be more elegant if we changed custom.py to set the _CustomExtension._toplevel_properties attribute to None rather than not defining it at all for non-top-level-property extensions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So instead you would have:

tl_props = new_extension._toplevel_properties

And the code below would not change? The change doesn't seem all that significant. What did you have in mind?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. It'd mostly just mean that all extensions have a _top_level_properties rather than having to do getattr()s on it. In my head, being able to always access an expected attribute and react when it's set to a value defining "not present on the represented object" feels nice. I'm not stressed about changing this if you're satisfied with the code as it stands though

stix2/registration.py Outdated Show resolved Hide resolved
@maybe-sybr maybe-sybr force-pushed the fix/all-refs-are-identifiers-or-quack-alike branch from 39fe48d to 6b89520 Compare July 16, 2021 00:33
@maybe-sybr maybe-sybr marked this pull request as draft July 16, 2021 00:34
@maybe-sybr
Copy link
Contributor Author

Just marking this as a draft while it needs autosquashing.

@chisholm
Copy link
Contributor

It's.... approximately correct I think, but not exactly. E.g. if you registered a custom 2.0 SDO but used ObjectReferenceProperty for its ref properties, I think it would pass the checks. But it shouldn't, since that property class is intended only for 2.0 SCO ref properties.

I've just had a quick skim through the 2.0 spec (specifically part 3 and 4) and I think I understand how this has evolved now. I suppose the most correct thing would be to have a a nested mapping like VALID_REF_TYPES[stix_version][stix_object_type], but that does sound pretty clunky for a single case. Alternatively, we could allow this logic to stand since I assume the 2.0 code will be relatively static in future, and the only risk is for downstream authors doing the "wrong thing". Happy to make any changes at your suggestion.

Nah, I think treatment would only need to depend on the "class" of object, e.g. SDO vs SCO. So it would only need to be (notionally) VALID_REF_TYPES[stix_version][stix_type_class]. And maybe as simple as VALID_REF_TYPES[2.0 SCOs] and VALID_REF_TYPES[everything else].

I don't know how important it is now to be fanatically correct w.r.t. STIX 2.0, but this is a reference implementation... :)

@chisholm
Copy link
Contributor

If we're going to do this, let's just do it right. It should be easy to adapt your code to:

  • Check for ObjectReferenceProperty for 2.0 SCOs
  • Check for ReferenceProperty for all other objects

Then ensure there are unit tests to:

  • ensure a 2.0 SCO with ReferenceProperty errors
  • ensure a 2.0 non-SCO with ObjectReferenceProperty errors
  • ensure a 2.1 object errors with ObjectReferenceProperty

@maybe-sybr maybe-sybr force-pushed the fix/all-refs-are-identifiers-or-quack-alike branch from 6b89520 to 2f70dc7 Compare July 20, 2021 01:12
@maybe-sybr maybe-sybr marked this pull request as ready for review July 20, 2021 01:12
@maybe-sybr
Copy link
Contributor Author

Check out the new diff, @chisholm . I've decided against having a more complex mapping since the only special case we have to deal with is 2.0 observables and it's much more obvious to codify that with a single conditional in the reference property validation helper. I imagine that future iterations of the 2.X spec will aim to maintain coherence for all properties that appear to be references, so building infrastructure to support future deviations is just an exercise in complexity for its own sake. That should be fairly easy to pivot back to in future if the need arises (or in this PR if that's your preference, just LMK).

There are tests for custom objects and observables to ensure that reference vs object reference properties are accepted and rejected appropriately. I don't do any testing for acceptance of subclasses, LMK if you'd like to see that. Since supporting that was the point of #511, I don't mind putting it in, but since the use case is now more dubiously useful, I could also see it being valid to omit them.

from .utils import PREFIX_21_REGEX


def _validate_ref_props(props_map, version, is_observable20=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

version here is no longer used. I think this param can be deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, good catch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the new diff

elif tail == "refs" and not all((
isinstance(prop_obj, ListProperty),
isinstance(getattr(prop_obj, "contained", None), ref_prop_type),
)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This use of all() undermines the normal short-circuiting behavior of boolean evaluation. Here, all args must always be evaluated before all() itself can be called. But you could take advantage of short-circuiting here to drop the getattr() and simplify the expression. E.g.:

        elif tail == "refs" and not (
            isinstance(prop_obj, ListProperty)
            and isinstance(prop_obj.contained, ref_prop_type)
        ):

Here, prop_obj.contained will not be evaluated unless the previous conjunct (isinstance(prop_obj, ListProperty)) evaluated to True. Therefore, at the point of the attribute access, prop_obj must be a ListProperty and the access is guaranteed to work. So you don't need getattr().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the new diff

Previously the requirement that properties ending in `_ref(s)` were
instances of an appropriate type capable of containing a STIX
`identifier` (the current interpretation of section 3.1 of the STIX 2.1
spec) was only enforced for custom observables. This change refactors
the property checks for custom objects to enforce this and the STIX 2.1
property name requirement (also from section 3.1) in a common helper,
therefore extending the enforcement of both requirements to all custom
object types created by downstream projects.

There is a special case carved out for STIX 2.0 observables which are
required to contain "object references" rather than identifiers. This
special logic is encoded in the reference property validation helper and
enabled by a kwarg which is passed exclusively by the custom observable
registration helper.
@maybe-sybr maybe-sybr force-pushed the fix/all-refs-are-identifiers-or-quack-alike branch from 2f70dc7 to bd897c9 Compare July 22, 2021 04:55
@chisholm chisholm merged commit 6e7e9dd into oasis-open:master Jul 22, 2021
@maybe-sybr maybe-sybr deleted the fix/all-refs-are-identifiers-or-quack-alike branch July 22, 2021 23:41
@maybe-sybr
Copy link
Contributor Author

Thanks @chisholm :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants