perf: add query optimizer #829

williballenthin · 2021-11-08T22:53:58Z

please review and merge #828 and #827 before this.

This PR adds a rule optimizer that re-orders the nodes in the rule logic tree to try simpler/faster cases before complex cases. For example, it prefers OS checks before mnemonic checks before regex checks.

In practice, this seems to make a small, but measurable difference in execution time:

label	count(evaluations)	avg(time)	min(time)	max(time)
`18c30e4` base	108,121	0.38s	0.34s	0.55s
`a6e2cfc` base short circuiting	69,401	0.23s	0.22s	0.26s
`152d0f3` de-optimizer	78,900	0.28s	0.26s	0.36s
`e287dc9` optimizer	68,275	0.22s	0.21s	0.25s

(via: PMA01-01, 30 iterations)

Note that originally, in 152d0f3, I had the sign of the cost function inverted, so the optimizer was actually a de-optimizer: it picked approximately the worst possible order of evaluation. This led to a 13% increase in feature evaluations, whereas the correct ordering improves evaluation performance by about 2%. The mistake demonstrates that evaluation order can have a substantial impact on performance, though our rules are already fairly well structured (e.g. we typically have OS checks as the first line).

My opinion is that we should probably merge this PR because it does provide some performance benefit, code is very localized, and it doesn't change any of our public APIs/behaviors.

Further perf metrics, using k32 (2 iterations):

label	count(evaluations)	avg(time)	min(time)	max(time)
`18c30e4` base	66,561,622	177.59s	172.27s	182.92s
`6909d6a` optimizer	41,772,519	100.44s	99.38s	101.49s

About 44% faster with 38% fewer feature evaluations.

Checklist

No CHANGELOG update needed

No new tests needed

No documentation update needed

mr-tz

nice, only thing worth considering would be some test cases showing the optimizer works as expected

mr-tz · 2021-11-09T17:54:16Z

I think for samples with many functions and large functions this could make a big difference (see K32 results).

mr-tz · 2021-11-10T10:03:58Z

tests/test_optimizer.py

+from capa.features.common import Arch, Bytes, Substring
+
+
+def test_optimizer_order():


great, thanks a lot!

williballenthin added 2 commits November 8, 2021 15:34

ruleset: add query optimizer

152d0f3

optimizer: fix sort order

e287dc9

williballenthin requested review from Ana06, mr-tz and mike-hunhoff November 8, 2021 23:02

williballenthin added the enhancement New feature or request label Nov 8, 2021

changelog

6909d6a

williballenthin mentioned this pull request Nov 9, 2021

perf: don't try to match rules that will never match #830

Merged

3 tasks

mr-tz approved these changes Nov 9, 2021

View reviewed changes

williballenthin added 2 commits November 9, 2021 16:12

Merge branch 'master' into perf/query-optimizer

77cac63

tests: add test demonstrating optimizer

ea386d0

williballenthin merged commit 84ba32a into master Nov 9, 2021

williballenthin deleted the perf/query-optimizer branch November 9, 2021 23:25

mr-tz reviewed Nov 10, 2021

View reviewed changes

tests/test_optimizer.py

from capa.features.common import Arch, Bytes, Substring

def test_optimizer_order():

Copy link

Collaborator

mr-tz Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: add query optimizer #829

perf: add query optimizer #829

williballenthin commented Nov 8, 2021 •

edited

Loading

mr-tz left a comment

mr-tz commented Nov 9, 2021

mr-tz Nov 10, 2021

		from capa.features.common import Arch, Bytes, Substring


		def test_optimizer_order():

perf: add query optimizer #829

perf: add query optimizer #829

Conversation

williballenthin commented Nov 8, 2021 • edited Loading

Checklist

mr-tz left a comment

Choose a reason for hiding this comment

mr-tz commented Nov 9, 2021

mr-tz Nov 10, 2021

Choose a reason for hiding this comment

williballenthin commented Nov 8, 2021 •

edited

Loading