-
Notifications
You must be signed in to change notification settings - Fork 557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the ability to select which functions or processes you which to extract capabilities from #2156
Add the ability to select which functions or processes you which to extract capabilities from #2156
Conversation
First, I think this is a very reasonable feature to add, especially with the Drakvuf sandbox support! I'm happy that we should be able to remove similar logic in There are two things to discuss:
For (1), I don't imagine any major problems, though we may want to consider if these arguments are commonly used enough to be shown all the time, or only in an "expert" mode, or suggested when the Drakvuf sandbox is encountered, etc. That being said, let's be careful not to get derailed by tiny details and hold up the merge. If there's debate, the arguments could always be considered "experimental" for a bit until we stabilize them. For (2), the question is: how do we filter the functions/scopes/features, particularly in a way that enables further extension, if necessary? That is, say we want to also filter on threads or basic blocks, could we build this easily? The primary place to consider is the function signature to An alternative design is to introduce wrappers around the feature extractors that can filter the scopes/features. They would act just like the underlying feature extractor, but would yield only the entries that are requested. Then, the wrapped feature extractor can be passed around as-is today. Notably, we can trivially create wrappers for different scopes/features without threading optional arguments around. Also, they could potentially be combined (though I don't think we're likely to really need this functionality). For example: class StaticFeatureExtractorFilter:
def __init__(self, inner: StaticFeatureExtractor):
self.inner = inner
def __getattr__(self, name):
if hasattr(self, name):
# if the filter has an override, use that
return getattr(self, name)
# otherwise use the inner feature extractor
return getattr(self.inner, name)
class FunctionFilter(StaticFeatureExtractorFilter):
def __init__(self, inner: StaticFeatureExtractor, functions: Set[Address]):
super().__init__(self, inner)
self.functions = functions
def get_functions(self):
yield from (f for f in self.inner.get_functions() if f.address in self.functions) Then we can use the filter like so: wanted_functions: Set[Address] = get_wanted_functions_from_cli(args)
if wanted_functions:
# if the user wants to only show specific functions, we filter them down
feature_extractor = FunctionFilter(feature_extractor, wanted_functions)
else:
# otherwise, we use the full feature extractor
pass
...
# use either the filtered extractor or the full extractor interchangably
find_capabilities(feature_extractor, ...) Of course, we can filter processes/threads/basic blocks/etc. in the same way. Would you be open to discuss alternative designs like this @yelhamer? Or any thoughts @mr-tz @mike-hunhoff |
@williballenthin I really like the design you proposed and I'm willing to implement it. Would the filter classes reside in the base_extractor.py file? Also, should we make the base "StaticFeatureExtractorFilter" a child of the "StaticFeatureExtractor" class? since I think there are some cases (if I am not mistaken) where we use the instance of the extractor we're passing to determine whether the analysis is static or dynamic. |
Good point. Also I think this will better satisfy mypy. OTOH, I'm not sure that the hasattr checks will work as written (since the base class has empty implementations of these) so that will take some tweaking. Should still be possible. We should also pass along the inner name appropriately, since I think the metadata structure includes the feature extractor name. |
@williballenthin I have also thought of the following possible implementation using a function factory: def FunctionFilter(extractor: StaticFeatureExtractor, functions: Set) -> StaticFeatureExtractor:
from types import MethodType
get_functions = extractor.get_functions # fetch original get_functions()
def filtered_get_functions(self):
yield from (f for f in get_functions() if f.address in functions)
extractor.get_functions = MethodType(filtered_get_functions, extractor)
return extractor Then we can do as you suggested:
Another question remains which is whether we want to register which filters an extractor has installed in it. If we want to do so then we might just consider storing the set of desired functions as an attribute in the extractor, then reference it internally in the get_functions() routine (without needing any wrapping or so). |
This looks like it would also work. I'm not sure of the pros/cons right now, so perhaps try one of the new implementations and let's see how it feels? |
…ing function factories
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation looks pretty good! I've left comments inline. Please address the comments tagged 👀 and optionally the remaining ones. We can merge this as soon as tomorrow assuming the points are addressed :-)
Co-authored-by: Willi Ballenthin <[email protected]>
Co-authored-by: Willi Ballenthin <[email protected]>
Co-authored-by: Willi Ballenthin <[email protected]>
Co-authored-by: Willi Ballenthin <[email protected]>
Co-authored-by: Willi Ballenthin <[email protected]>
Co-authored-by: Willi Ballenthin <[email protected]>
@williballenthin I think I have addressed all of the the comments. Let me know if there's anything else I need to do :) |
Co-authored-by: Willi Ballenthin <[email protected]>
Co-authored-by: Willi Ballenthin <[email protected]>
Co-authored-by: Willi Ballenthin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥳
ah, CLA failure is cause GH associated the wrong email with my suggestions. well, its me, and i'm covered under the CLA. |
let's go! |
This PR adds the ability to select which function/process capa should extract capabilities from. The proposed syntax is as follows:
I haven't added a testcase for dynamic analysis. I am planning to do so once the Drakvuf feature extractor gets merged (#2143) since that's a big motive for this PR.
Nuance: I couldn't find the right names for some of the internal variables, so please feel free to set them as you wish.
Thanks :)
Checklist