-
Notifications
You must be signed in to change notification settings - Fork 952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How good is the support of command set extensions? (MMX, SSE, SSE2, SSE3...) #193
Comments
This is a good question. I should probably write a few words about it on wiki. Currently? None. Capstone2llvmir currently translates instructions in these modes:
What I would like to do next in #115, and discussion about future extensions:
Decompilation is one thing, but if someone would like to use RetDec framework for other purposes, he/she might want to have semantics for even very complex instructions. Right now, I would say that projects like QEMU or McSema are a better alternative in such a case. However, it might happen that someone will add complex semantics to Capstone2llvmir on their own - we currently have no such plans. This would not be easy, but if good groundworks are prepared, it might not be so bad. After all, someone had to hand-write these things in QEMU as well. Even if this happens, it would not be beneficial for decompilation (as already explained). So we would either have to keep these Capstone2llvmir translators separate, or have it all in one translator but be able to tell it what should and should not be translated - or which mode to use for which instructions. |
@PeterMatula |
QEMU is not producing LLVM IR, they have their own intermediate language. However, as I understand it (which is not all that much, so I might be wrong) matters there are even more complicated - they do not model all instructions directly in TCG. More complicated instructions (basically everything you asked about) is modeled as routines in C, that get somehow compiled and used - I'm not really sure how. If I'm wrong, and there is someone with more insights reading this, please correct me. I would be interested to know more. rev.ng project is using QEMU to produce LLVM IR. McSema is producing LLVM IR. It is also translating some of those extensions you asked about. However, as I understand it, it is more focused on QEMU-like stuff than human readable decompilation. Again, I might be wrong about this. But like I said, translating these sets is not really beneficial to decompilation output quality. Just look how McSema handles x86 FPU. There are benefits to this approach if you want to emulate programs, or check them with tools like klee. But C produced from it would look terrible. Moreover, they are using IDA for control flow recovery, so it is questionable if this is only for convenience, or it would be hard/impossible to write a recursive traversal disassembler on top of LLVM IR they produce. LLVM IR produced by our capstone2llvmir is designed with this in mind. To conclude:
|
Now it's all clear to me! Thanks again for this detailed version. But if even here IDA is not acting to support these sets of commands, then it is clear to me. And you're right about floating point operations! |
Few notes how IDA does this: We should look into this and come up with a solution that will let us deal with these instructions without cluttering the output, but in a way that provides enough information on what is going on to RetDec analyses and human users. This issue might get solved as part of a bachelor thesis - see milestone. |
#115 have been closed. Now we are generating assembly pseudo calls for all unhandled instructions. Further improvements using intrinsics or full semantic models are possible. See https://github.com/avast-tl/retdec/wiki/Capstone2LlvmIr. |
First of all, yes, I know that here is no forum. But I still have a stupid question. What about command set extensions in general? Which ones are already supported and which ones should be supported? I have now mainly thought of the following ones:
Many programs and modern compilers use them automatically to speed up certain operations. I don't have a compiler that covers all the options. Maybe a wiki page wouldn't be wrong, because this question will surely come up again and again.
The text was updated successfully, but these errors were encountered: