Skip to content
This repository has been archived by the owner on Oct 15, 2023. It is now read-only.

confusion around x86 "and" instructions #16

Open
robertmuth opened this issue Nov 23, 2021 · 8 comments
Open

confusion around x86 "and" instructions #16

robertmuth opened this issue Nov 23, 2021 · 8 comments

Comments

@robertmuth
Copy link

These two seem to conflict:

["and" , "X:r32/m32, id/ud" , "MI" , "81 /4 id" , "ANY _XLock OF=0 SF=W ZF=W AF=U PF=W CF=0"],
["and" , "X:r64, ud" , "MI" , "81 /4 id" , "X64 _XLock OF=0 SF=W ZF=W AF=U PF=W CF=0"],

@kobalicek
Copy link
Member

This is on purpose - you can encode 64-bit AND with unsigned immediate by not promoting the instruction to 64-bit. Then it's basically the same as the former - it's only possible when the operand is a register though.

@robertmuth
Copy link
Author

Suppose I am looking at this from the perspective of a decode and I encounter a byte sequence that matches

"81 /4 id"

how do I know whether this which "rule" applies. In other words: is this a 32bit or a 64bit instructions.

Maybe this is dependent on the processor mode?

I would also expect that the "or" instruction has similar/symmetric rules but I did not see any.

@kobalicek
Copy link
Member

In case of decode, you should always decode to an original instruction and consider all other aliases as just aliases. The encoder would support the alias (or not, depending on how you see it), but the decoder would always decode to a canonical representation.

OR doesn't have that capability, because it would zero extend the high part of 32-bit reg, which is what AND r64, ud does, but OR r64, ud encoded as 32-bit would essentially do (r64 | ud) & 0xFFFFFFFF

@robertmuth
Copy link
Author

Ah I see. Is there a programmatic way to determine which instructions are "original" .
I noticed some instructions have an AltFrom tag but that seems to be something slightly different.

@robertmuth
Copy link
Author

I found another conflict:

  ["and"              , "X:eax, id/ud" , "I"       , "25 id"                        , "ANY AltForm      OF=0 SF=W ZF=W AF=U PF=W CF=0"],
  ["and"              , "X:rax, ud"  , "I"       , "25 id"                        , "X64 AltForm      OF=0 SF=W ZF=W AF=U PF=W CF=0"],

These are the only two such cases I found in the fairly large part of the tables that I process.

This is seems like an odd exception given that this pattern is not repeated with another ALU type instruction.

@robertmuth
Copy link
Author

I spoke to soon. Here is another ambiguity of a slightly different flavor:

    ["movss"            , "w:xmm[31:0], xmm[31:0]"                          , "RM"      , "F3 0F 10 /r"                  , "SSE"],
    ["movss"            , "W:xmm[31:0], m32"                                , "RM"      , "F3 0F 10 /r"                  , "SSE"],

    ["movsd"            , "w:xmm[63:0], xmm[63:0]"                          , "RM"      , "F2 0F 10 /r"                  , "SSE2"],
    ["movsd"            , "W:xmm[63:0], m64"                                , "RM"      , "F2 0F 10 /r"                  , "SSE2"],

@kobalicek
Copy link
Member

kobalicek commented Nov 24, 2021

Can you describe what is ambiguous in movss / movsd case?

The instructions really do what is described. movss|movsd from memory clears the rest of the register, movss|movsd between registers won't (that's the W vs w). X86 is full of such little differences. You can see this also in AVX case vmovss and vmovsd - there are basically two versions of the instruction depending on whether it has a memory operand or not.

@robertmuth
Copy link
Author

robertmuth commented Nov 24, 2021

I see. I think the problems is that I am currently mostly focused on the decoding part while asmdb is more focused on encoding.

If I encounter F3 0F 10 xx xx ... I do not know what rule to chose based on only the bytes and the format ("RM").
This is similar to the ambiguity I reported with the "and" instructions further up.

What I have done on my side to deal with this is

  1. ignore the rules for
and              , "X:rax, ud"
and"              "X:r64, ud" , 
  1. change the movss/movsd rules slightly:
movss      "w:xmm[31:0], xmm[31:0]"       "RM"   =>   ......  "Rr"
movss      "w:xmm[31:0], m32"                 "RM" =>     ......  "Rm"

where r = M format restricted to reg; m = M format restricted to m

This gets rid of the ambiguity for me. Not sure if this makes sense for asmdb, though

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants