-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
binary ninja: optimize feature extraction #2402
Comments
0953cc3b77ed2974b09e3a00708f88de931d681e2d0cb64afbaf714610beabe6 (100KB or so) takes a huge amount of time to load into Binary Ninja. Maybe there's an infinite loop somewhere. |
To run capa against 321338196a46b600ea330fc5d98d0699, it takes 2:48. But :36 is spent just in We can also see that
edit: maybe we can cache the results of fetching the llil/mlil to save some time. Still is surprising that it takes 3x longer to fetch the llil than do the complete analysis. Maybe its Python serialization overhead? |
I opened the file in binja GUI and the analysis only took 4.3 seconds:
My machine is probably faster then the CI box used by GitHub, still quite surprising to see such a huge difference |
@xusheng6 on my test rig it took maybe 13s to load the binary. Then lots longer to extract the features (minutes). So accessing the LLIR/MLIR is taking integer multiples of the total load time 😕 Maybe 3s vs 13s comes from only having about two cores available in the test environment. |
thx for letting me know about it, it seems either I wrote the backend in a bad way, or the Python wrapping adds significant overhead to it |
The profiler didn't expose any invocation counts, so I'm not yet sure if we're calling the API way too many times or if the API itself is slow. Given that it's both LLIR and MLIR, I sorta suspect the latter. But, in the few minutes I looked at the bindings, it didn't seem like all that much was happening (on the Python side). |
While looking into #2406, I noticed the IL of the function 0x8082d40 (the largest function in b5f0524e69b3a3cf636c7ac366ca57bf5e3a8fdc8a9f01caf196c611a7918a87.elf_) is requested multiple times. This is not expected, since I thought the IL should be requested at most once and then cached. I will check how we cache the analysis data to ensure things are working as expected |
Here is how we handle the caching of the IL function. On the one hand, we cannot afford to cache all of them because that can eat all RAM easily. On the other hand, we do want to cache some of them to avoid frequently regenerating them. Here is how are actually do it: we cache the IL of 64 functions. This number is governed by a setting: A function's IL cache gets discarded when another function gets added to the cache bucket and evicts it from it. This strategy apparently works very well for the UI usage scenario, but it is NOT very good for headless usage |
ah, and from within capa, we go function by function (cache friendly) but request the IL from each caller and callee of the function (cache unfriendly). In capa, we could maybe pre-compute some analysis in a function-major direction (ie. cache friendly way). I think it's possible but the code might become less intuitive. |
I would suggest you not do bother that from capa. I will see if I can do anything from binja. Apparently the caching thing is unique to binja and it does not really make sense to put the burden on your shoulders |
Here is a brief recap of my recent findings:
@williballenthin for awareness |
Ok I finally figured out what is happening: Vector35/binaryninja-api#6171 (comment). And indeed the two issues are related and they contribute together to what I see. That is the beauty of debugging! |
@xusheng6 can you think of another way to phrase the check here:
Currently, we go: for f in functions:
for x in f.caller_sites:
if x.il.operation in {...}:
... and this is cache-unfriendly, due to the order that we access the function llil members. Does Binja have another way that we could ask for the call graph? Worst case, we could do something like: for f in functions:
for op in f.llil:
if op.operation is "call":
calls_to[op.address].add(f)
calls_from[f].add(op.address) And build the call graph ourselves in a single pass up front, which would be cache friendly (but require a complete pass through all the IL). |
also, @xusheng6 can we use LLIL instead of MLIL to recover basic blocks, such as here (
|
I am using MLIL because the stack string detection is done with ease at MLIL. It is not used for basic block recovery |
for cache friendliness. see #2402
During some initial profiling, I'm finding that the Binary Ninja backend is substantially slower than vivisect or IDA. This thread will enumerate all the things we discover. It might include: bugs in Binary Ninja, things we're doing wrong, workarounds, etc.
Given how good Binary Ninja's code analysis is, we'd really like to be able to use it widely. So, let's prepare the code for this.
The text was updated successfully, but these errors were encountered: