Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YARA patterns in the static-code directory are very large #3

Closed
timokau opened this issue Oct 10, 2018 · 6 comments
Closed

YARA patterns in the static-code directory are very large #3

timokau opened this issue Oct 10, 2018 · 6 comments

Comments

@timokau
Copy link

timokau commented Oct 10, 2018

The static-code directory in the compiled archive is 3.9 GiB big (most of that is in static-code/pe/32/le/x86/delphi). In comparison the static-code directory from the uncompiled yara archive is only 710MiB.

Is that to be expected or is there some kind of issue causing this blowup?

If I understand the wiki correctly, those files are non-essential and only used for statically-linked code removal correct? And it sounds like retdec could even use the uncompiled patterns with some performance penalty?

@s3rvac s3rvac changed the title pattern size YARA patterns in the static-code directory are very large Oct 10, 2018
@PeterMatula
Copy link
Collaborator

  1. Yes, the files are not essential and decompilation would work without them. They are only used for statically-linked code removal.
  2. The size blowup is to be expected. It is caused by their compilation by YARA itself. Nothing we can do about it.
  3. Yes, I think we could use text YARA files (not the compiled ones) and it would work. But the parsing and compilation would happen at runtime and there would be some measurable performance penalty.

@timokau
Copy link
Author

timokau commented Oct 17, 2018

As I laid out here, shipping uncompiled files and compiling them as-needed would be a good compromise. I'm not sure if it is better to keep the discussion here or there.

Yes, I think we could use text YARA files (not the compiled ones) and it would work. But the parsing and compilation would happen at runtime and there would be some measurable performance penalty.

Is that possible right now without changes to the retdec source? That could be a reasonable alternative to just stripping the files altogether, which is what we currently do by default.

@PeterMatula
Copy link
Collaborator

I closed this but forgot to write an explanation, so here it goes.

From now on, YARA rules in the support package are in a text form. They are compiled at installation step by default. You can disable the compilation by setting RETDEC_COMPILE_YARA to OFF (it is ON by default). Installation step will install retdec-yarac binary. It can be used later to compile YARA rules if they were not compiled at installation. If you want, you can call support/install-yara.py script later to compile YARA rules - see the comment in the script on how to use it.

We will release the v3.3 version as it is now, but if you have more suggestions how to make this better, please let us know. I'm aware of the other suggestions about packaging RetDec, but I didn't get around to fix them yet. Pull request are welcomed :-D

Btw, even in the text form, YARA rules are probably much bigger than they need to be. Most of this is caused by Delphi signatures. The current solution is kind of an experiment on how to use YARA to this purpose. Maybe the rules do not have too be so long - it could work just as well with much shorter rules. It could be investigated further, but we do not have resources at the moment, maybe some university student will explore this in the future.

@timokau
Copy link
Author

timokau commented Mar 20, 2019

Sounds neat, thank you!

So if RETDEC_COMPILE_YARA is set to OFF, how will that affect decompilation results? Will it somehow use the text version (with a runtime cost), will it compile on-demand or will it perform as if no yara patterns were available?

@PeterMatula
Copy link
Collaborator

If RETDEC_COMPILE_YARA is OFF, it will use text YARA rules with a runtime cost - but the decompilaton result should not be affected.
It will NOT compile on demand - in a sense that rules would stay compiled for the subsequent runs - rules get compiled inside YARA every time it is triggered.

FYI, runtime cost of parsing YARA rules on my machine (Linux, 3 runs):

  • ELF:
    • compiled [s]: 0.038, 0.037, 0.035
    • text [s]: 0.432, 0.445, 0.423
  • PE gcc:
    • compiled [s]: 0.021, 0.020, 0.020
    • text [s]: 0.250, 0.235, 0.244
  • PE msvc:
    • compiled [s]: 0.143, 0.141, 0.143
    • text [s]: 1.032, 1.061, 1.040
  • PE delphi:
    • compiled [s]: 0.195, 0.172, 0.178
    • text [s]: 1.260, 1.268, 1.273

So as you can see, it is nothing terrible, but it is a little bit annoying - especially for Delphi.

@timokau
Copy link
Author

timokau commented Mar 20, 2019

Awesome! Thanks for the benchmarks.

The even-nicer-to-have feature request would be caching of the just-in-time compilation, but it is perfectly workable as it is now :)

katrinafyi added a commit to katrinafyi/nixpkgs that referenced this issue Feb 5, 2024
- Bumps vendored dependencies and remove ones no longer needed.
- Since 3.3, compiled patterns are not shipped in the support file, obviating the postFetch strip. (avast/retdec-support#3)
- Now, patterns may be compiled at build time and an argument is provided to control this (on by default).
- As such, retdec-full is no longer needed and removed. The 60MB increase seems more preferred than duplicating the 500MB size.
- We use cmake _URL variables to insert dependencies and we are able to use nixpkgs googletest.
- Fix build with current gcc 13.
- Remove i686 from platforms, as derivation needs to specify lib64.
- Maintainers: remove timokau, add katrinafyi.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants