Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binary ninja: optimize feature extraction #2402

Open
williballenthin opened this issue Sep 25, 2024 · 15 comments
Open

binary ninja: optimize feature extraction #2402

williballenthin opened this issue Sep 25, 2024 · 15 comments
Labels
binary-ninja performance Related to capa's performance

Comments

@williballenthin
Copy link
Collaborator

During some initial profiling, I'm finding that the Binary Ninja backend is substantially slower than vivisect or IDA. This thread will enumerate all the things we discover. It might include: bugs in Binary Ninja, things we're doing wrong, workarounds, etc.

Given how good Binary Ninja's code analysis is, we'd really like to be able to use it widely. So, let's prepare the code for this.

@williballenthin williballenthin added binary-ninja performance Related to capa's performance labels Sep 25, 2024
@williballenthin
Copy link
Collaborator Author

Vector35/binaryninja-api#5951

0953cc3b77ed2974b09e3a00708f88de931d681e2d0cb64afbaf714610beabe6 (100KB or so) takes a huge amount of time to load into Binary Ninja. Maybe there's an infinite loop somewhere.

@williballenthin
Copy link
Collaborator Author

williballenthin commented Sep 25, 2024

To run capa against 321338196a46b600ea330fc5d98d0699, it takes 2:48. But :36 is spent just in BNGetLowLevelILForInstruction to help recover calls to functions. I have expected this to be extremely fast (less than :01). Need to triage if this is a bug in the API or we're using it incorrectly and/or there's a workaround.

We can also see that BNGetFunctionMediumLevelIL takes quite a bit of time (:13, and its called twice).

  Austin  TUI   Wall Time Profile                                                                                         
   _________   Command python -m capa.main --json --backend=binja tests/data/321338196a46b600ea330fc5d98d0699.exe_                     
   ⎝__⎠ ⎝__⎠   Python 3.12.3    PID 3946467     PID:TID 3946467:0:3946467   
               Samples 1060897  ⏲️    2'48"       Threshold 1%   
  OWN    TOTAL    %OWN   %TOTAL  FUNCTION                                                                                              
  00"    2'09"     0.0%   76.6%  ├─ _run_module_as_main (<frozen runpy>:199)                                                           │
  00"    2'09"     0.0%   76.6%  │  └─ _run_code (<frozen runpy>:88)                                                                   │
  00"    2'08"     0.0%   76.3%  │     └─ <module> (~/code/public/capa/capa/main.py:1096)                                              │
  00"     14"      0.0%    8.4%  │        ├─ main (~/code/public/capa/capa/main.py:983)                                                │
  00"     14"      0.0%    8.4%  │        │  └─ get_extractor_from_cli (~/code/public/capa/capa/main.py:830)                           │
  00"     13"      0.0%    7.7%  │        │     ├─ get_extractor (~/code/public/capa/capa/loader.py:264)                               │
  00"     13"      0.0%    7.7%  │        │     │  └─ load (~/software/binaryninja/python/binaryninja/__init__.py:414)                 │
  00"     13"      0.0%    7.6%  │        │     │     └─ BinaryView.load (~/software/binaryninja/python/binaryninja/binaryview.py:2758)│
  13"     13"      7.6%    7.6%  │        │     │        └─ BNLoadFilename (~/software/binaryninja/python/binaryninja/_binaryninjacore.│
  00"    1'39"     0.0%   58.7%  │        ├─ main (~/code/public/capa/capa/main.py:987)                                                │
  00"    1'39"     0.0%   58.7%  │        │  └─ find_capabilities (~/code/public/capa/capa/capabilities/common.py:75)                  │
  00"    1'38"     0.0%   58.3%  │        │     ├─ find_static_capabilities (~/code/public/capa/capa/capabilities/static.py:184)       │
  00"     11"      0.0%    6.3%  │        │     │  ├─ find_code_capabilities (~/code/public/capa/capa/capabilities/static.py:117)      │
  00"     10"      0.0%    6.2%  │        │     │  │  ├─ BinjaFeatureExtractor.get_basic_blocks (~/code/public/capa/capa/features/extra│
  00"     10"      0.0%    6.0%  │        │     │  │  │  ├─ Function.mlil (~/software/binaryninja/python/binaryninja/function.py:1395) │
  10"     10"      6.0%    6.0%  │        │     │  │  │  │  └─ BNGetFunctionMediumLevelIL (~/software/binaryninja/python/binaryninja/_b│
  00"     48"      0.0%   28.9%  │        │     │  ├─ find_code_capabilities (~/code/public/capa/capa/capabilities/static.py:118)      │
  00"     08"      0.1%    4.6%  │        │     │  │  ├─ find_basic_block_capabilities (~/code/public/capa/capa/capabilities/static.py:│
  00"     07"      0.0%    4.2%  │        │     │  │  │  ├─ BinjaFeatureExtractor.get_instructions (~/code/public/capa/capa/features/ex│
  00"     02"      0.1%    1.4%  │        │     │  │  │  │  ├─ BasicBlock.__iter__ (~/software/binaryninja/python/binaryninja/basicbloc│
  00"     05"      0.0%    2.8%  │        │     │  │  │  │  ├─ BasicBlock.__iter__ (~/software/binaryninja/python/binaryninja/basicbloc│
  00"     02"      0.0%    1.5%  │        │     │  │  │  │  │  ├─ CoreArchitecture.get_instruction_text (~/software/binaryninja/python/│
  00"     35"      0.1%   20.8%  │        │     │  │  ├─ find_basic_block_capabilities (~/code/public/capa/capa/capabilities/static.py:│
  01"     31"      0.5%   18.6%  │        │     │  │  │  ├─ find_instruction_capabilities (~/code/public/capa/capa/capabilities/static.│
  00"     30"      0.0%   18.1%  │        │     │  │  │  │  ├─ BinjaFeatureExtractor.extract_insn_features (~/code/public/capa/capa/fea│
  01"     30"      0.8%   18.0%  │        │     │  │  │  │  │  ├─ extract_features (~/code/public/capa/capa/features/extractors/binja/i│
  00"     02"      0.0%    1.0%  │        │     │  │  │  │  │  │  ├─ extract_insn_number_features (~/code/public/capa/capa/features/ext│
  00"     02"      0.0%    1.1%  │        │     │  │  │  │  │  │  ├─ extract_insn_bytes_features (~/code/public/capa/capa/features/extr│
  00"     02"      0.0%    1.3%  │        │     │  │  │  │  │  │  ├─ extract_insn_api_features (~/code/public/capa/capa/features/extrac│
  00"     02"      0.0%    1.2%  │        │     │  │  │  │  │  │  │  ├─ is_stub_function (~/code/public/capa/capa/features/extractors/b│
  00"     02"      0.0%    1.0%  │        │     │  │  │  │  │  │  │  │  ├─ Function.llil (~/software/binaryninja/python/binaryninja/fun│
  02"     02"      1.0%    1.0%  │        │     │  │  │  │  │  │  │  │  │  ├─ BNGetFunctionLowLevelIL (~/software/binaryninja/python/bi│
  00"     03"      0.1%    2.1%  │        │     │  │  │  ├─ find_instruction_capabilities (~/code/public/capa/capa/capabilities/static.│
  00"     03"      0.0%    2.0%  │        │     │  │  │  │  ├─ RuleSet.match (~/code/public/capa/capa/rules/__init__.py:2053)          │
  02"     02"      1.4%    1.4%  │        │     │  │  │  │  │  ├─ RuleSet._match (~/code/public/capa/capa/rules/__init__.py:1879)      │
  00"     04"      0.0%    2.4%  │        │     │  │  ├─ find_basic_block_capabilities (~/code/public/capa/capa/capabilities/static.py:│
  00"     04"      0.0%    2.4%  │        │     │  │  │  ├─ RuleSet.match (~/code/public/capa/capa/rules/__init__.py:2053)             │
  00"     37"      0.0%   21.9%  │        │     │  ├─ find_code_capabilities (~/code/public/capa/capa/capabilities/static.py:128)      │
  00"     37"      0.0%   21.9%  │        │     │  │  ├─ BinjaFeatureExtractor.extract_function_features (~/code/public/capa/capa/featu│
  00"     37"      0.0%   21.9%  │        │     │  │  │  ├─ extract_features (~/code/public/capa/capa/features/extractors/binja/functio│
>>00"     36"      0.0%   21.5%  │        │     │  │  │  │  ├─ extract_function_calls_to (~/code/public/capa/capa/features/extractors/b│
>>00"     36"      0.0%   21.4%  │        │     │  │  │  │  │  ├─ ReferenceSource.llil (~/software/binaryninja/python/binaryninja/binar│
>>00"     36"      0.0%   21.3%  │        │     │  │  │  │  │  │  ├─ Function.get_low_level_il_at (~/software/binaryninja/python/binary│
>>36"     36"     21.3%   21.3%  │        │     │  │  │  │  │  │  │  └─ BNGetLowLevelILForInstruction (~/software/binaryninja/python/bi│
  00"     14"      0.0%    8.6%  │        ├─ main (~/code/public/capa/capa/main.py:990)                                                │
  00"     14"      0.0%    8.6%  │        │  └─ compute_layout (~/code/public/capa/capa/loader.py:662)                                 │
  00"     14"      0.0%    8.4%  │        │     ├─ compute_static_layout (~/code/public/capa/capa/loader.py:631)                       │
  00"     14"      0.0%    8.3%  │        │     │  ├─ BinjaFeatureExtractor.get_basic_blocks (~/code/public/capa/capa/features/extracto│
  00"     14"      0.0%    8.2%  │        │     │  │  ├─ Function.mlil (~/software/binaryninja/python/binaryninja/function.py:1395)    │
  14"     14"      8.2%    8.2%  │        │     │  │  │  ├─ BNGetFunctionMediumLevelIL (~/software/binaryninja/python/binaryninja/_bin

edit: maybe we can cache the results of fetching the llil/mlil to save some time. Still is surprising that it takes 3x longer to fetch the llil than do the complete analysis. Maybe its Python serialization overhead?

@xusheng6
Copy link
Contributor

To run capa against 321338196a46b600ea330fc5d98d0699, it takes 2:48. But :36 is spent just in BNGetLowLevelILForInstruction to help recover calls to functions. I have expected this to be extremely fast (less than :01). Need to triage if this is a bug in the API or we're using it incorrectly and/or there's a workaround.

We can also see that BNGetFunctionMediumLevelIL takes quite a bit of time (:13, and its called twice).

  Austin  TUI   Wall Time Profile                                                                                         
   _________   Command python -m capa.main --json --backend=binja tests/data/321338196a46b600ea330fc5d98d0699.exe_                     
   ⎝__⎠ ⎝__⎠   Python 3.12.3    PID 3946467     PID:TID 3946467:0:3946467   
               Samples 1060897  ⏲️    2'48"       Threshold 1%   
  OWN    TOTAL    %OWN   %TOTAL  FUNCTION                                                                                              
  00"    2'09"     0.0%   76.6%  ├─ _run_module_as_main (<frozen runpy>:199)                                                           │
  00"    2'09"     0.0%   76.6%  │  └─ _run_code (<frozen runpy>:88)                                                                   │
  00"    2'08"     0.0%   76.3%  │     └─ <module> (~/code/public/capa/capa/main.py:1096)                                              │
  00"     14"      0.0%    8.4%  │        ├─ main (~/code/public/capa/capa/main.py:983)                                                │
  00"     14"      0.0%    8.4%  │        │  └─ get_extractor_from_cli (~/code/public/capa/capa/main.py:830)                           │
  00"     13"      0.0%    7.7%  │        │     ├─ get_extractor (~/code/public/capa/capa/loader.py:264)                               │
  00"     13"      0.0%    7.7%  │        │     │  └─ load (~/software/binaryninja/python/binaryninja/__init__.py:414)                 │
  00"     13"      0.0%    7.6%  │        │     │     └─ BinaryView.load (~/software/binaryninja/python/binaryninja/binaryview.py:2758)│
  13"     13"      7.6%    7.6%  │        │     │        └─ BNLoadFilename (~/software/binaryninja/python/binaryninja/_binaryninjacore.│
  00"    1'39"     0.0%   58.7%  │        ├─ main (~/code/public/capa/capa/main.py:987)                                                │
  00"    1'39"     0.0%   58.7%  │        │  └─ find_capabilities (~/code/public/capa/capa/capabilities/common.py:75)                  │
  00"    1'38"     0.0%   58.3%  │        │     ├─ find_static_capabilities (~/code/public/capa/capa/capabilities/static.py:184)       │
  00"     11"      0.0%    6.3%  │        │     │  ├─ find_code_capabilities (~/code/public/capa/capa/capabilities/static.py:117)      │
  00"     10"      0.0%    6.2%  │        │     │  │  ├─ BinjaFeatureExtractor.get_basic_blocks (~/code/public/capa/capa/features/extra│
  00"     10"      0.0%    6.0%  │        │     │  │  │  ├─ Function.mlil (~/software/binaryninja/python/binaryninja/function.py:1395) │
  10"     10"      6.0%    6.0%  │        │     │  │  │  │  └─ BNGetFunctionMediumLevelIL (~/software/binaryninja/python/binaryninja/_b│
  00"     48"      0.0%   28.9%  │        │     │  ├─ find_code_capabilities (~/code/public/capa/capa/capabilities/static.py:118)      │
  00"     08"      0.1%    4.6%  │        │     │  │  ├─ find_basic_block_capabilities (~/code/public/capa/capa/capabilities/static.py:│
  00"     07"      0.0%    4.2%  │        │     │  │  │  ├─ BinjaFeatureExtractor.get_instructions (~/code/public/capa/capa/features/ex│
  00"     02"      0.1%    1.4%  │        │     │  │  │  │  ├─ BasicBlock.__iter__ (~/software/binaryninja/python/binaryninja/basicbloc│
  00"     05"      0.0%    2.8%  │        │     │  │  │  │  ├─ BasicBlock.__iter__ (~/software/binaryninja/python/binaryninja/basicbloc│
  00"     02"      0.0%    1.5%  │        │     │  │  │  │  │  ├─ CoreArchitecture.get_instruction_text (~/software/binaryninja/python/│
  00"     35"      0.1%   20.8%  │        │     │  │  ├─ find_basic_block_capabilities (~/code/public/capa/capa/capabilities/static.py:│
  01"     31"      0.5%   18.6%  │        │     │  │  │  ├─ find_instruction_capabilities (~/code/public/capa/capa/capabilities/static.│
  00"     30"      0.0%   18.1%  │        │     │  │  │  │  ├─ BinjaFeatureExtractor.extract_insn_features (~/code/public/capa/capa/fea│
  01"     30"      0.8%   18.0%  │        │     │  │  │  │  │  ├─ extract_features (~/code/public/capa/capa/features/extractors/binja/i│
  00"     02"      0.0%    1.0%  │        │     │  │  │  │  │  │  ├─ extract_insn_number_features (~/code/public/capa/capa/features/ext│
  00"     02"      0.0%    1.1%  │        │     │  │  │  │  │  │  ├─ extract_insn_bytes_features (~/code/public/capa/capa/features/extr│
  00"     02"      0.0%    1.3%  │        │     │  │  │  │  │  │  ├─ extract_insn_api_features (~/code/public/capa/capa/features/extrac│
  00"     02"      0.0%    1.2%  │        │     │  │  │  │  │  │  │  ├─ is_stub_function (~/code/public/capa/capa/features/extractors/b│
  00"     02"      0.0%    1.0%  │        │     │  │  │  │  │  │  │  │  ├─ Function.llil (~/software/binaryninja/python/binaryninja/fun│
  02"     02"      1.0%    1.0%  │        │     │  │  │  │  │  │  │  │  │  ├─ BNGetFunctionLowLevelIL (~/software/binaryninja/python/bi│
  00"     03"      0.1%    2.1%  │        │     │  │  │  ├─ find_instruction_capabilities (~/code/public/capa/capa/capabilities/static.│
  00"     03"      0.0%    2.0%  │        │     │  │  │  │  ├─ RuleSet.match (~/code/public/capa/capa/rules/__init__.py:2053)          │
  02"     02"      1.4%    1.4%  │        │     │  │  │  │  │  ├─ RuleSet._match (~/code/public/capa/capa/rules/__init__.py:1879)      │
  00"     04"      0.0%    2.4%  │        │     │  │  ├─ find_basic_block_capabilities (~/code/public/capa/capa/capabilities/static.py:│
  00"     04"      0.0%    2.4%  │        │     │  │  │  ├─ RuleSet.match (~/code/public/capa/capa/rules/__init__.py:2053)             │
  00"     37"      0.0%   21.9%  │        │     │  ├─ find_code_capabilities (~/code/public/capa/capa/capabilities/static.py:128)      │
  00"     37"      0.0%   21.9%  │        │     │  │  ├─ BinjaFeatureExtractor.extract_function_features (~/code/public/capa/capa/featu│
  00"     37"      0.0%   21.9%  │        │     │  │  │  ├─ extract_features (~/code/public/capa/capa/features/extractors/binja/functio│
>>00"     36"      0.0%   21.5%  │        │     │  │  │  │  ├─ extract_function_calls_to (~/code/public/capa/capa/features/extractors/b│
>>00"     36"      0.0%   21.4%  │        │     │  │  │  │  │  ├─ ReferenceSource.llil (~/software/binaryninja/python/binaryninja/binar│
>>00"     36"      0.0%   21.3%  │        │     │  │  │  │  │  │  ├─ Function.get_low_level_il_at (~/software/binaryninja/python/binary│
>>36"     36"     21.3%   21.3%  │        │     │  │  │  │  │  │  │  └─ BNGetLowLevelILForInstruction (~/software/binaryninja/python/bi│
  00"     14"      0.0%    8.6%  │        ├─ main (~/code/public/capa/capa/main.py:990)                                                │
  00"     14"      0.0%    8.6%  │        │  └─ compute_layout (~/code/public/capa/capa/loader.py:662)                                 │
  00"     14"      0.0%    8.4%  │        │     ├─ compute_static_layout (~/code/public/capa/capa/loader.py:631)                       │
  00"     14"      0.0%    8.3%  │        │     │  ├─ BinjaFeatureExtractor.get_basic_blocks (~/code/public/capa/capa/features/extracto│
  00"     14"      0.0%    8.2%  │        │     │  │  ├─ Function.mlil (~/software/binaryninja/python/binaryninja/function.py:1395)    │
  14"     14"      8.2%    8.2%  │        │     │  │  │  ├─ BNGetFunctionMediumLevelIL (~/software/binaryninja/python/binaryninja/_bin

edit: maybe we can cache the results of fetching the llil/mlil to save some time. Still is surprising that it takes 3x longer to fetch the llil than do the complete analysis. Maybe its Python serialization overhead?

I opened the file in binja GUI and the analysis only took 4.3 seconds:

[Analysis] Analysis update took 4.325 seconds

My machine is probably faster then the CI box used by GitHub, still quite surprising to see such a huge difference

@williballenthin
Copy link
Collaborator Author

williballenthin commented Sep 26, 2024

@xusheng6 on my test rig it took maybe 13s to load the binary. Then lots longer to extract the features (minutes). So accessing the LLIR/MLIR is taking integer multiples of the total load time 😕

Maybe 3s vs 13s comes from only having about two cores available in the test environment.

@xusheng6
Copy link
Contributor

@xusheng6 on my test rig it took maybe 13s to load the binary. Then lots longer to extract the features (minutes). So accessing the LLIR/MLIR is taking integer multiples of the total load time 😕

Maybe 3s vs 13s comes from only having about two cores available in the test environment.

thx for letting me know about it, it seems either I wrote the backend in a bad way, or the Python wrapping adds significant overhead to it

@williballenthin
Copy link
Collaborator Author

The profiler didn't expose any invocation counts, so I'm not yet sure if we're calling the API way too many times or if the API itself is slow. Given that it's both LLIR and MLIR, I sorta suspect the latter. But, in the few minutes I looked at the bindings, it didn't seem like all that much was happening (on the Python side).

@xusheng6
Copy link
Contributor

xusheng6 commented Nov 21, 2024

While looking into #2406, I noticed the IL of the function 0x8082d40 (the largest function in b5f0524e69b3a3cf636c7ac366ca57bf5e3a8fdc8a9f01caf196c611a7918a87.elf_) is requested multiple times. This is not expected, since I thought the IL should be requested at most once and then cached. I will check how we cache the analysis data to ensure things are working as expected

@xusheng6
Copy link
Contributor

While looking into #2406, I noticed the IL of the function 0x8082d40 (the largest function in b5f0524e69b3a3cf636c7ac366ca57bf5e3a8fdc8a9f01caf196c611a7918a87.elf_) is requested multiple times. This is not expected, since I thought the IL should be requested at most once and then cached. I will check how we cache the analysis data to ensure things are working as expected

Here is how we handle the caching of the IL function. On the one hand, we cannot afford to cache all of them because that can eat all RAM easily. On the other hand, we do want to cache some of them to avoid frequently regenerating them. Here is how are actually do it: we cache the IL of 64 functions. This number is governed by a setting:

image

A function's IL cache gets discarded when another function gets added to the cache bucket and evicts it from it.

This strategy apparently works very well for the UI usage scenario, but it is NOT very good for headless usage

@williballenthin
Copy link
Collaborator Author

ah, and from within capa, we go function by function (cache friendly) but request the IL from each caller and callee of the function (cache unfriendly).

In capa, we could maybe pre-compute some analysis in a function-major direction (ie. cache friendly way). I think it's possible but the code might become less intuitive.

@xusheng6
Copy link
Contributor

I would suggest you not do bother that from capa. I will see if I can do anything from binja.

Apparently the caching thing is unique to binja and it does not really make sense to put the burden on your shoulders

@xusheng6
Copy link
Contributor

xusheng6 commented Nov 22, 2024

Here is a brief recap of my recent findings:

  1. Repetitive IL retrieval caused by the binja's cache handling of the IL function. See binary ninja: optimize feature extraction #2402 (comment) for more info. This contributes to the performance issue
  2. Failure to retrieve the IL function when it should be retrievable. This used to contribute to many of the crashes in triage binary ninja backend failures #2406. I have created an upstream issue for it: On-Demand Function Analysis is Triggering Time and Update Count Limits Vector35/binaryninja-api#6171. Note, although the crash itself is fixed by the PR Various binja backend fixes #2500, it is not a perfect fix. Since it only fixes the crash with a try/except, and the IL is still irretrievable for known reasons. This means while capa can now process these samples, it will miss some features and detection. I will continue to look into it
  3. These two issues could be related. In fact, I wrote a single script to reproduce 2) by iterating over the functions but failed. It might be the case that the repetitive retrieval of the IL somehow contributed to the failure, in a way not yet understood by me

@williballenthin for awareness

@xusheng6
Copy link
Contributor

Ok I finally figured out what is happening: Vector35/binaryninja-api#6171 (comment). And indeed the two issues are related and they contribute together to what I see. That is the beauty of debugging!

@williballenthin
Copy link
Collaborator Author

@xusheng6 can you think of another way to phrase the check here:

Currently, we go:

for f in functions:
  for x in f.caller_sites:
    if x.il.operation in {...}:
      ...

and this is cache-unfriendly, due to the order that we access the function llil members.

Does Binja have another way that we could ask for the call graph?

Worst case, we could do something like:

for f in functions:
  for op in f.llil:
    if op.operation is "call":
      calls_to[op.address].add(f)
      calls_from[f].add(op.address)

And build the call graph ourselves in a single pass up front, which would be cache friendly (but require a complete pass through all the IL).

@williballenthin
Copy link
Collaborator Author

also, @xusheng6 can we use LLIL instead of MLIL to recover basic blocks, such as here (

) so that we're using LLIL everywhere?

@xusheng6
Copy link
Contributor

also, @xusheng6 can we use LLIL instead of MLIL to recover basic blocks, such as here (


) so that we're using LLIL everywhere?

I am using MLIL because the stack string detection is done with ease at MLIL. It is not used for basic block recovery

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binary-ninja performance Related to capa's performance
Projects
None yet
Development

No branches or pull requests

2 participants