-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize rule matching by better indexing rule by features #2125
Conversation
Implement the "tighten rule pre-selection" algorithm described here: #2063 (comment) In summary: > Rather than indexing all features from all rules, > we should pick and index the minimal set (ideally, one) of > features from each rule that must be present for the rule to match. > When we have multiple candidates, pick the feature that is > probably most uncommon and therefore "selective". This seems to work pretty well. Total evaluations when running against mimikatz drop from 19M to 1.1M (wow!) and capa seems to match around 3x more functions per second (wow wow). When doing large scale runs, capa is about 25% faster when using the vivisect backend (analysis heavy) or 3x faster when using the upcoming BinExport2 backend (minimal analysis).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤟
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
thanks for the detailed and constructive reviews along the way @mike-hunhoff @mr-tz @s-ff ! |
(continuation of #2080 rebased against master)
Implement the "tighten rule pre-selection" algorithm described here: #2063 (comment)
In summary:
This seems to work pretty well. Total evaluations when running against mimikatz drop from 19M to 1.1M (wow!) and capa seems to match around 3x more functions per second (wow wow).
When doing large scale runs, capa is about 25% faster when using the vivisect backend (analysis heavy) or 3x faster when using the upcoming BinExport2 backend (minimal analysis).
closes #2074