Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arm thumb: addw identified as add #1630

Open
ekilmer opened this issue May 8, 2020 · 3 comments
Open

arm thumb: addw identified as add #1630

ekilmer opened this issue May 8, 2020 · 3 comments

Comments

@ekilmer
Copy link
Contributor

ekilmer commented May 8, 2020

On next branch (f7efa08), addw has an ID to that of ADD, but it should be its own ID, since there is a comment indicating a difference https://github.com/aquynh/capstone/blob/f7efa08ecaacf9adfbef7c8bd85b02b256a66adc/arch/ARM/ARMMappingInsnName.inc#L6-L7

./cstool -d thumb '\x0f\xf2\x2a\x00'
 0  0f f2 2a 00  addw   r0, pc, #0x2a
        ID: 2 (add)
        op_count: 3
                operands[0].type: REG = r0
                operands[0].access: WRITE
                operands[1].type: REG = pc
                operands[1].access: READ
                operands[2].type: IMM = 0x2a
        Registers read: pc
        Registers modified: r0
        Groups: thumb2

I also don't see any entries related to addw (ARM_INS_ADDW) in https://github.com/aquynh/capstone/blob/f7efa08ecaacf9adfbef7c8bd85b02b256a66adc/arch/ARM/ARMMappingInsn.inc

@ekilmer
Copy link
Contributor Author

ekilmer commented May 8, 2020

There is a very similar issue, I think, with SUBW being identified as SUB

@ekilmer
Copy link
Contributor Author

ekilmer commented May 10, 2020

In this particular case, since the addw is using pc as the base register, LLVM uses a different instruction for disassembly, but assembly uses addw. It's a bit weird to me, and I'm not sure how to reconcile what's correct:

$ echo "addw  r0, pc, #0x2a" | llvm-mc --assemble --show-encoding --show-inst-operands --show-inst --triple thumbv8
        .text
<stdin>:1:1: note: parsed instruction: ['addw', <ARMCC::al>, <register r0>, <register pc>, 42]
addw  r0, pc, #0x2a
^
        addw    r0, pc, #42             @ encoding: [0x0f,0xf2,0x2a,0x00]
                                        @ <MCInst #3747 t2ADDri12
                                        @  <MCOperand Reg:72>
                                        @  <MCOperand Reg:14>
                                        @  <MCOperand Imm:42>
                                        @  <MCOperand Imm:14>
                                        @  <MCOperand Reg:0>>


$ echo "0x0f 0xf2 0x2a 0x00" | llvm-mc --disassemble --show-encoding --show-inst-operands --show-inst --triple thumbv8
        .text
        adr.w   r0, #42                 @ encoding: [0x0f,0xf2,0x2a,0x00]
                                        @ <MCInst #3752 t2ADR
                                        @  <MCOperand Reg:72>
                                        @  <MCOperand Imm:42>
                                        @  <MCOperand Imm:14>
                                        @  <MCOperand Reg:0>>


$ echo "adr.w   r0, #42" | llvm-mc --assemble --show-encoding --show-inst-operands --show-inst --triple thumbv8
        .text
<stdin>:1:1: note: parsed instruction: ['adr', <ARMCC::al>, '.w', <register r0>, 42]
adr.w   r0, #42
^
        adr.w   r0, #42                 @ encoding: [0x0f,0xf2,0x2a,0x00]
                                        @ <MCInst #3752 t2ADR
                                        @  <MCOperand Reg:72>
                                        @  <MCOperand Imm:42>
                                        @  <MCOperand Imm:14>
                                        @  <MCOperand Reg:0>>

LLVM-10

The ADR pseudo-instruction loads a program-relative or register-relative address into a register.

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0041c/Babcjaii.html

@Rot127
Copy link
Collaborator

Rot127 commented Apr 3, 2023

This answer comes a little late but:

LLVM groups together several CodeGenInstruction (instruction objects more or less equivalent to the definitions in the target description files) into a MatchableInfo class.
The purpose of MatchableInfo is to have a 1 to 1 mapping of an assembly instructions to a well defined machine code.
This is necessary because several encodings can exist for a single instructions.

The Capstone mappings (LLVM instructions to its Capstone representation) are simplified in a way.
Capstones enum value for an instruction is set by the mnemonic given by a MatchableInfo class.

Don't ask me why, but addw is probably part of the same MatchableInfo object as adr is.

If it is still relevant to you, please take a look at the LLVM code (https://github.com/llvm/llvm-project/blob/fa95f20f98c8dfd4d35590a724eb0eb7df64146a/llvm/utils/TableGen/AsmMatcherEmitter.cpp#L7-L96).
If there addw is indeed associated with a MatchableInfo class where MatchableInfo::Mnemonic == "addw", please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants