-
-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARM ELF wrong detection of the mode #4357
Comments
Just making sure I fully understand the task before I attempt this, but the instruction at I'm a bit rusty on ARM, but from what I can tell it moves a value into the lower-half of the APSR (which contains the CPSR) and sets bit 5 of the CPSR to enable thumb mode. |
@TheN00bBuilder no, in this particular case the whole function (as many others in that file) is Thumb:
|
Gotcha, makes sense - it sounds like we're more worried about finding the ARM/THUMB switches. If anyone wants to work this go for it, else I'll give it a shot later this week. |
So if anyone has any ideas (I asked in the Rizen dev Mattermost) or wants to take care of this issue, here's what I've got so far:
I'd appreciate if someone who's more familiar with Rizen would know what has been previously done to track register values during analysis like this, or if there's some class / member that I can access that may give hints to what's happening. |
Are you sure this assumption is correct? Is this defined in the ISA somewhere? Because |
According to page A47 in the ARM v7 reference manual, section A2.3.2, Thumb mode context switch happens by writing an address with the LSB set to 1. It doesn’t start executing that address however (I should have made that clear in my first post). https://developer.arm.com/documentation/ddi0406/latest/ Also looking at the binary included on the question, look at the entry point where the first instructions load 0x81F1 into IP which is written to PC, but Rizen still disassembles as ARM mode. |
I think for the first detection method, it is enough if it only works on binaries with given entry points. Otherwise, it gets too complicated (for the beginning). I think the way I would address it is roughly the following:
Now on loading the binary you can disassemble the instruction at the entry point and check if Also limit yourself to ARM for now. I just mentioned AArch64 because it is probably affected as well. |
Ah, okay! Thank you so much for the guidance, having feedback from someone who’s very familiar with this codebase is extremely valuable. I will get a branch up for this and start work tonight! |
Currently working this in my dev branch. |
From what I know AARch64 doesn't have thumb mode, unless you count in the mode for running 32bit code on 64bit CPUs. Overall I am bit skeptical towards an approach which focus on annotating the jumps instead of the code itself. There will not always be direct jumps, there can be indirect jumps, there can be pointers in vtable, there can be symbols in the symbol table (especially for dynamic libraries). From what I understand in all of those case the LSB could be set indicating that target contains THUMB code. We already have rz_anlysis_hint_set_bits which can be used to mark certain target address as being thumb code. So for me a potentially more succefull strategy could be:
With regards to code xref handling it might be necesarry to afterwards clear the LSB in code xref, otherwise I have seen some instances of code xref pointing to the address with LSB set which is in the middle of instruction and thus producing garbage disassembly. Not sure if it happens with all types of code xrefs or only some. I personally don't see too much value in adding adding flag for the instruction which performs the jump. Two cases where it might be useful:
It's more of a naming thing but calling it RZ_ANALYSIS_OP_TYPE_CTX_SWITCH when target is thumb seems weird. That would mean that thumb->thumb jump is also ctx switch, but thumb->arm isn't. That feels like potential source of misunderstandings. Looking at the existing code more, seems rizin already has
After reading a bit more the commented out_hint_set_bits, might be due to other piece of code doing in single place
With that said, I have no idea if all code paths which set hint.new_bits, later reach the part with code that transfers it to analysis_hint. One minor drawback to focusing on bits hint being associated with target address instead of jump source is that in theory a single piece of code could in theory have dual use. But I don't think it's too much of a problem as no normal compiler would produce such stuff. And even if you intentionally are trying to write such code it would be an impressive challenge to do it for any nontrivial piece of code without diverging the control flow into ARM and Thumb parts. The problem is somewhat similar to how in x86 you can have overlapping instructions, I don't think we are spending any serious effort towards supporting that either. The main cases in which rizin should behave reasonably without manual hints:
Cases 1) and 3) can partially be handled by setting global asm.bits hint before analysis, but even with that I have seen some cases xref pointing in the middle of instruction. |
The more I read the related code, less it makes sense. Either I am misunderstanding something, or it never worked and was never tested. It looks like it's doing something similar to stuff it should be doing but not quite.
|
One detail I might have misunderstood with regards to how mixed mode executables interact with symbol table. Looking at the example XVIlka gave, seems like all the symbol entries are always even, but some of them are 2 bytes aligned. So at least in the ELF symbol table thumb functions aren't marked by LSB. Supposedly ELF files uses special "$t" and "$a" symbols for marking thumb and arm regions. No idea how mixed arm/thumb mode executables interact with dynamic linking. Supposedly rizin already has code which sets hint_bits based on "$t" and "$a" flags. And there are indeed a bunch of 16/32 bit hints set after analysis (although not sure which ones are set by which source of information). But something isn't right. Even if I manually set the analysis hint bits for some part of code to 16, it still shows the arm mode disassembly instead of thumb. Just a guess but this might be closer to the true cause of errors. If bit hints are ignored, then there is no surprise that further analysis based on wrong disassembly mode produce even more junk. |
Ok, sometimes changing hint bits work and start producing thumb disassembly but only after few instructions of incorrect disassembly. Or is XVIilka is just trolling us with some weird CTF executable that does funny stuff with endianes switching not just mixing thumb and arm mode. |
No, this binary doesn't change endianness, at least not from what I know. In fact, I noticed similar behavior on other binaries for Cortex-M cores. |
Work environment
rizin -v
full output, not truncated (mandatory)Expected behavior
Detect instructions mode automatically
Actual behavior
Steps to reproduce the behavior
2048-P2K-AHI_EP1.elf.gz
The text was updated successfully, but these errors were encountered: