Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement toolchain provenance on radare #341

Open
gogo2464 opened this issue Apr 14, 2022 · 0 comments
Open

Implement toolchain provenance on radare #341

gogo2464 opened this issue Apr 14, 2022 · 0 comments

Comments

@gogo2464
Copy link

gogo2464 commented Apr 14, 2022

Description

Currently radare2 only parses the executable to look at the gcc version in the hardcoded metadata. This only works on elf. I would like to implement a toolchain provenance tool on radare. The idea is to determine the exact version of the compilator, disassemble the malware, decompile it and then get the same signature to proove we caught exactly the same source code.

toolchain analysis is a work in progress in academic research. We even do not have it on ghidra.

Please describe what are you missing or wanting to be improved

The command i | grep cc should give the real version of gcc not based on program metadata but on toolchain provenance. Then we will get:
-the compilator name
-the very exact commit of the gcc release if open source compiler, else the compilator version
-the compiler options used by gcc (-O0, etc...)
-work even if binary is stripped or if it is a firmware/driver binary

Provide images, ascii-art, test files and anything that may help us understand your request

repo example:

https://github.com/dyninst/toolchain-origin

With neural network:

https://yuede.github.io/files/21_ACNS_Vestige.pdf
https://www.youtube.com/watch?v=wdzjVfwFAPc&ab_channel=IEEESANER2021

With no neural neutwork:

https://www.researchgate.net/profile/Barton-Miller/publication/220854600_Recovering_the_Toolchain_Provenance_of_Binary_Code/links/0deec52ceab25bf292000000/Recovering-the-Toolchain-Provenance-of-Binary-Code.pdf?origin=publication_detail

stack overflow ressources:

https://reverseengineering.stackexchange.com/questions/11/what-hints-in-machine-code-can-point-me-to-the-compiler-which-was-used-to-genera

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant