From b6a15c90b96533b636e529327d4681b3d831897b Mon Sep 17 00:00:00 2001 From: BeaEngine Date: Fri, 20 Nov 2020 10:31:14 +0100 Subject: [PATCH] update doc --- doc/beaengine.md | 170 +++++++++++++++++++++++++---------------------- 1 file changed, 91 insertions(+), 79 deletions(-) diff --git a/doc/beaengine.md b/doc/beaengine.md index 3937994..d746ba3 100644 --- a/doc/beaengine.md +++ b/doc/beaengine.md @@ -1,12 +1,12 @@ ![beaengine](./beaengine-logo.png){ width=50px } **BeaEngine documentation** -- BEAENGINE_VERSION : 5.2 -- DOC_RELEASE : 1.2 +- BEAENGINE_VERSION: 5.2 +- DOC_RELEASE: 1.2 # 1. Disasm function -The Disasm function allows to decode all instructions coded according to the rules of IA-32 and Intel 64 architectures. It makes a precise analysis of the focused instruction and sends back a complete structure that is usable to make data-flow and control-flow studies. +The Disasm function disassembles one instruction from the Intel ISA. It makes a precise analysis of the focused instruction and sends back a complete structure that is usable to make data-flow and control-flow studies. **Syntax** @@ -18,12 +18,12 @@ int Disasm( **Parameters** - - **&infos** : Pointer to a structure [PDISASM](#2-disasm-infos) + - **&infos**: Pointer to a structure [PDISASM](#2-disasm-infos) **Return** -The function may sends you back 3 values. if it has analyzed an invalid opcode, it sends back UNKNOWN_OPCODE (-1). If it tried to read a byte located outside the Security Block, it sends back OUT_OF_BLOCK (-2). In others cases, it sends back the length instruction. Thus, you can use it as a LDE. To have a detailed status, use **infos.Error** field. +The function may sends you back 3 values. if the analyzed bytes sequence is an invalid opcode, it sends back UNKNOWN_OPCODE (-1). If it tried to read a byte located outside the Security Block, it sends back OUT_OF_BLOCK (-2). In others cases, it sends back the instruction length. Thus, you can use it as a LDE. To have a detailed status, use **infos.Error** field. # 2. Disasm infos @@ -56,29 +56,34 @@ struct PDISASM { **Members** - - **EIP** : *[in]* The address where the engine will make the decoding. - - **VirtualAddr** : *[in]* optional - (For instructions CALL - JMP - conditional JMP - LOOP) By default, this value is 0 (disable). The disassembler calculates the destination address of the branch instruction by using VirtualAddr (not EIP). This address can be 64 bits long. This option allows us to decode instructions located anywhere in memory even if they are not at their original place. - - **CompleteInstr** : *[out]* String used to store the representation of instruction. - - **SecurityBlock** : *[in]* By default, this value is 0. (disabled option). In other cases, this number is the number of bytes the engine is allowed to read since EIP. Thus, we can make a read block to avoid some Access violation. On INTEL processors, (in IA-32 or intel 64 modes) , instruction never exceeds 15 bytes. A SecurityBlock value outside this range is useless. - - **Archi** : *[in]* This field is used to specify the architecture used for the decoding. If it is set to 0 or 64 (0x20), the architecture used is 64 bits. If it is set to 32 (0x20), the architecture used is IA-32. If set to 16 (0x10), architecture is 16 bits. - - **Options** : *[in]* This field allows to define some display options. You can specify the syntax : masm, nasm ,goasm. You can specify the number format you want to use : prefixed numbers or suffixed ones. You can even add a tabulation between the mnemonic and the first operand or display the segment registers used by the memory addressing. Constants used are the following : - - **Tabulation** : add a tabulation between mnemonic and first operand (default has no tabulation) - - **GoAsmSyntax / NasmSyntax** : change the intel syntax (default is Masm syntax) - - **PrefixedNumeral** : 200h is written 0x200 (default is suffixed numeral) - - **ShowSegmentRegs** : show segment registers used (default is hidden) - - **ShowEVEXMasking** : show opmask and merging/zeroing applyed on first operand for AVX512 instructions (default is hidden) - - **Instruction** : *[out]* Structure **[INSTRTYPE](#3-instruction-infos)**. - - **Operand1** : *[out]* Structure **[OPTYPE](#4-operand-infos)** that concerns the first operand. - - **Operand2** : *[out]* Structure **[OPTYPE](#4-operand-infos)** that concerns the second operand. - - **Operand3** : *[out]* Structure **[OPTYPE](#4-operand-infos)** that concerns the third operand. - - **Operand4** : *[out]* Structure **[OPTYPE](#4-operand-infos)** that concerns the fourth operand. - - **Prefix** : *[out]* Structure **[PREFIXINFO](#5-prefixes-infos)** containing an exhaustive list of used prefixes. - - **Error** : *[out]* This field returns the status of the disassemble process : - - **Success** : (0) instruction has been recognized by the engine - - **Out of block** : (-2) instruction length is out of SecurityBlock - - **Unknown opcode** : (-1) instruction is not recognized by the engine - - **Exception #UD** : (2) instruction has been decoded properly but sends #UD exception if executed. - - **Exception #DE** : (3) instruction has been decoded properly but sends #DE exception if executed + - **EIP**: *[in]* Offset of bytes sequence we want to disassemble + - **VirtualAddr**: *[in]* optional - (For instructions CALL - JMP - conditional JMP - LOOP) By default, this value is 0 (disable). The disassembler calculates the destination address of the branch instruction using VirtualAddr (not EIP). This address can be 64 bits long. This option allows us to decode instructions located anywhere in memory even if they are not at their original place + - **CompleteInstr**: *[out]* String used to store the instruction representation + - **SecurityBlock**: *[in]* By default, this value is 0. (disabled option). In other cases, this number is the number of bytes the engine is allowed to read since EIP. Thus, we can make a read block to avoid some Access violation. On INTEL processors, (in IA-32 or intel 64 modes) , instruction never exceeds 15 bytes. A SecurityBlock value outside this range is useless. + - **Archi**: *[in]* This field is used to specify the architecture used for the decoding. If it is set to 0 or 64 (0x20), the architecture used is 64 bits. If it is set to 32 (0x20), the architecture used is IA-32. If set to 16 (0x10), architecture is 16 bits. + - **Options**: *[in]* This field allows to define some display options. You can specify the syntax: masm, nasm ,goasm. You can specify the number format you want to use: prefixed numbers or suffixed ones. You can even add a tabulation between the mnemonic and the first operand or display the segment registers used by the memory addressing. Constants used are the following : + - **Tabulation**: add a tabulation between mnemonic and first operand (default has no tabulation) + - **GoAsmSyntax / NasmSyntax**: change the intel syntax (default is Masm syntax) + - **PrefixedNumeral**: 200h is written 0x200 (default is suffixed numeral) + - **ShowSegmentRegs**: show segment registers used (default is hidden) + - **ShowEVEXMasking**: show opmask and merging/zeroing applyed on first operand for AVX512 instructions (default is hidden) + - **Instruction**: *[out]* Structure **[INSTRTYPE](#3-instruction-infos)**. + - **Operand1**: *[out]* Structure **[OPTYPE](#4-operand-infos)** that concerns the first operand. + - **Operand2**: *[out]* Structure **[OPTYPE](#4-operand-infos)** that concerns the second operand. + - **Operand3**: *[out]* Structure **[OPTYPE](#4-operand-infos)** that concerns the third operand. + - **Operand4**: *[out]* Structure **[OPTYPE](#4-operand-infos)** that concerns the fourth operand. + - **Operand5**: *[out]* Structure **[OPTYPE](#4-operand-infos)** that concerns the fifth operand. + - **Operand6**: *[out]* Structure **[OPTYPE](#4-operand-infos)** that concerns the sixth operand. + - **Operand7**: *[out]* Structure **[OPTYPE](#4-operand-infos)** that concerns the seventh operand. + - **Operand8**: *[out]* Structure **[OPTYPE](#4-operand-infos)** that concerns the eighth operand. + - **Operand9**: *[out]* Structure **[OPTYPE](#4-operand-infos)** that concerns the ninth operand. + - **Prefix**: *[out]* Structure **[PREFIXINFO](#5-prefixes-infos)** containing an exhaustive list of used prefixes. + - **Error**: *[out]* This field returns the status of the disassemble process : + - **Success**: (0) instruction has been recognized by the engine + - **Out of block**: (-2) instruction length is out of SecurityBlock + - **Unknown opcode**: (-1) instruction is not recognized by the engine + - **Exception #UD**: (2) instruction has been decoded properly but sends #UD exception if executed. + - **Exception #DE**: (3) instruction has been decoded properly but sends #DE exception if executed # 3. Instruction infos this structure gives informations on the analyzed instruction. @@ -98,14 +103,14 @@ struct INSTRTYPE { **Members** - - **Category** : *[out]* Specify the family instruction . More precisely, (infos.Instruction.Category & 0xFFFF0000) is used to know if the instruction is a standard one or comes from one of the following technologies : MMX, FPU, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, VMX or SYSTEM. LOWORD(infos.Instruction.Category) is used to know if the instruction is an arithmetic instruction, a logical one, a data transfer one ... To see the complete list of constants used by BeaEngine, go [HERE](#9-constants) . - - **Opcode** : *[out]* This field contains the opcode on 1, 2 or 3 bytes. If the instruction uses a mandatory prefix, this last one is not present here. For that, you have to use the structure infos.Prefix. - - **Mnemonic** : *[out]* This field sends back the instruction mnemonic with an ASCII format. You must know that all mnemonics are followed by a space (0x20). For example , the instruction "add" is written "add ". - - **BranchType** : *[out]* If the decoded instruction is a branch instruction, this field is set to indicate what kind of jump it is (call, ret, unconditional jump, conditional jump). To get a complete list of constants used by BeaEngine, go [HERE](#9-constants) - - **Flags** : *[out]* Structure [EFLStruct](#6-eflags-infos) that specifies the used flags. - - **AddrValue** : *[out]* If the decoded instruction is a branch instruction and if the destination address can be calculated, the result is stored in that field. A "jmp eax" or a "jmp [eax]" will set this field to 0 . - - **Immediat** : *[out]* If the instruction uses a constant, this immediat value is stored here. - - **ImplicitModifiedRegs** : *[out]* Some instructions modify registers implicitly. For example, "push 0" modifies the register RSP. In that case, infos.Instruction.ImplicitModifiedRegs.gpr == REG4. Find more useful informations on that field looking at the Structure [REGISTERTYPE](#8-registers-infos) + - **Category**: *[out]* Specify the family instruction . More precisely, (infos.Instruction.Category & 0xFFFF0000) is used to know if the instruction is a standard one or comes from one of the following technologies: MMX, FPU, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, VMX or SYSTEM. LOWORD(infos.Instruction.Category) is used to know if the instruction is an arithmetic instruction, a logical one, a data transfer one ... To see the complete list of constants used by BeaEngine, go [HERE](#9-constants) . + - **Opcode**: *[out]* This field contains the opcode on 1, 2 or 3 bytes. If the instruction uses a mandatory prefix, this last one is not present here. For that, you have to use the structure infos.Prefix. + - **Mnemonic**: *[out]* This field sends back the instruction mnemonic with an ASCII format. You must know that all mnemonics are followed by a space (0x20). For example , the instruction "add" is written "add ". + - **BranchType**: *[out]* If the decoded instruction is a branch instruction, this field is set to indicate what kind of jump it is (call, ret, unconditional jump, conditional jump). To get a complete list of constants used by BeaEngine, go [HERE](#9-constants) + - **Flags**: *[out]* Structure [EFLStruct](#6-eflags-infos) that specifies the used flags. + - **AddrValue**: *[out]* If the decoded instruction is a branch instruction and if the destination address can be calculated, the result is stored in that field. A "jmp eax" or a "jmp [eax]" will set this field to 0 . + - **Immediat**: *[out]* If the instruction uses a constant, this immediat value is stored here. + - **ImplicitModifiedRegs**: *[out]* Some instructions modify registers implicitly. For example, "push 0" modifies the register RSP. In that case, infos.Instruction.ImplicitModifiedRegs.gpr == REG4. Find more useful informations on that field looking at the Structure [REGISTERTYPE](#8-registers-infos) # 4. Operand infos @@ -127,17 +132,17 @@ struct OPTYPE { **Members** - - **OpMnemonic** : *[out]* This field sends back, when it is possible, the operand in ASCII format. - - **OpType** : *[out]* This field specifies the operand type. infos.Operandxx.OpType indicates if it is one of the following : + - **OpMnemonic**: *[out]* This field sends back, when it is possible, the operand in ASCII format. + - **OpType**: *[out]* This field specifies the operand type. infos.Operandxx.OpType indicates if it is one of the following : - REGISTER_TYPE - MEMORY_TYPE - CONSTANT_TYPE+ABSOLUTE_ - CONSTANT_TYPE+RELATIVE_ - - **OpSize** : *[out]* This field sends back the size of the operand. - - **AccessMode** : *[out]* This field indicates if the operand is modified or not (READ=0x1) or (WRITE=0x2). - - **Memory** : *[out]* Structure [MEMORYTYPE](#7-memory-infos) , filled only if infos.Operandxx.OpType == MEMORY_TYPE. - - **Registers** : *[out]* Structure [REGISTERTYPE](#8-registers-infos) , filled only if infos.Operandxx.OpType == REGISTER_TYPE. - - **SegmentReg** : *[out]* This field indicates, in the case of memory addressing mode, the segment register used : + - **OpSize**: *[out]* This field sends back the size of the operand. + - **AccessMode**: *[out]* This field indicates if the operand is modified or not (READ=0x1) or (WRITE=0x2). + - **Memory**: *[out]* Structure [MEMORYTYPE](#7-memory-infos) , filled only if infos.Operandxx.OpType == MEMORY_TYPE. + - **Registers**: *[out]* Structure [REGISTERTYPE](#8-registers-infos) , filled only if infos.Operandxx.OpType == REGISTER_TYPE. + - **SegmentReg**: *[out]* This field indicates, in the case of memory addressing mode, the segment register used : - ESReg - DSReg - FSReg @@ -173,27 +178,27 @@ struct PREFIXINFO { **Membres** - - **Number** : *[out]* Indicates the number of prefixes used. - - **NbUndefined** : *[out]* Indicates the number of prefixes used in a wrong way (illegal use). - - **LockPrefix** : *[out]* Concerns the LOCK prefix. It can takes one of the following values : + - **Number**: *[out]* Indicates the number of prefixes used. + - **NbUndefined**: *[out]* Indicates the number of prefixes used in a wrong way (illegal use). + - **LockPrefix**: *[out]* Concerns the LOCK prefix. It can takes one of the following values : - NotUsedPrefix = 0 - InUsePrefix = 1 - SuperfluousPrefix = 2 - InvalidPrefix = 4 - MandatoryPrefix = 8 - - **OperandSize** : *[out]* Concerns the prefix used to define the size of operands. - - **AddressSize** : *[out]* Concerns the prefix used to define the AddressSize - - **RepnePrefix** : *[out]* Concerns the prefix used to define the REPNE. - - **RepPrefix** : *[out]* Concerns the prefix used to define the REP. - - **FSPrefix** : *[out]* Concerns the prefix used to define the FS segment . - - **SSPrefix** : *[out]* Concerns the prefix used to define the SS segment . - - **GSPrefix** : *[out]* Concerns the prefix used to define the GS segment . - - **ESPrefix** : *[out]* Concerns the prefix used to define the ES segment . - - **CSPrefix** : *[out]* Concerns the prefix used to define the CS segment . - - **DSPrefix** : *[out]* Concerns the prefix used to define the DS segment . - - **BranchTaken** : *[out]* Concerns branch hint prefix 0x3E (taken). - - **BranchNotTaken** : *[out]* Concerns branch hint prefix 0x2E (not taken). - - **REX** : *[out]* Concerns the prefix used to define the REX in 64 bits mode. The structure sended back is : + - **OperandSize**: *[out]* Concerns the prefix used to define the size of operands. + - **AddressSize**: *[out]* Concerns the prefix used to define the AddressSize + - **RepnePrefix**: *[out]* Concerns the prefix used to define the REPNE. + - **RepPrefix**: *[out]* Concerns the prefix used to define the REP. + - **FSPrefix**: *[out]* Concerns the prefix used to define the FS segment . + - **SSPrefix**: *[out]* Concerns the prefix used to define the SS segment . + - **GSPrefix**: *[out]* Concerns the prefix used to define the GS segment . + - **ESPrefix**: *[out]* Concerns the prefix used to define the ES segment . + - **CSPrefix**: *[out]* Concerns the prefix used to define the CS segment . + - **DSPrefix**: *[out]* Concerns the prefix used to define the DS segment . + - **BranchTaken**: *[out]* Concerns branch hint prefix 0x3E (taken). + - **BranchNotTaken**: *[out]* Concerns branch hint prefix 0x2E (not taken). + - **REX**: *[out]* Concerns the prefix used to define the REX in 64 bits mode. The structure sended back is : ~~~~ {.c} struct REX_Struct { @@ -255,14 +260,17 @@ struct MEMORYTYPE { **Members** - - **BaseRegister** : *[out]* Indicate the base register in the formula : [BaseRegister + IndexRegister*Scale + Displacement]. - - **IndexRegister** : *[out]* Indicate the index register in the formula : [BaseRegister + IndexRegister*Scale + Displacement]. - - **Scale** : *[out]* Indicate the scale : 1, 2, 4 ou 8. - - **Displacement** : *[out]* Value of the displacement in the formula : [BaseRegister + IndexRegister*Scale + Displacement]. + - **BaseRegister**: *[out]* Indicate the base register in the formula: [BaseRegister + IndexRegister*Scale + Displacement]. + - **IndexRegister**: *[out]* Indicate the index register in the formula: [BaseRegister + IndexRegister*Scale + Displacement]. + - **Scale**: *[out]* Indicate the scale: 1, 2, 4 ou 8. + - **Displacement**: *[out]* Value of the displacement in the formula: [BaseRegister + IndexRegister*Scale + Displacement]. # 8. Registers infos -This structure gives informations on operands if `infos.Operandxx.OpType == REGISTER_TYPE` or on `infos.Instruction.ImplicitModifiedRegs`. +This structure gives informations on operands if: + +- `infos.Operandxx.OpType` == REGISTER_TYPE +- or if `infos.Instruction.ImplicitModifiedRegs` is filled ~~~~ {.c} @@ -281,40 +289,42 @@ struct REGISTERTYPE{ Int64 opmask; Int64 segment; Int64 fpu; + Int64 tmm; }; ~~~~ **Members** -- **type** : *[out]* set of flags to define which type of registers are used. For instance, to test if operand1 is a general purpose register, use `infos.Operand1.Registers.type & GENERAL_REG`. -- **gpr** : *[out]* set of flags to define which general purpose register is used. For instance, to test if operand 1 uses RAX, test `infos.Operand1.Registers.gpr & REG0` -- **mmx** : *[out]* set of flags to define which MMX register is used. For instance, to test if operand 1 uses MM0, test `infos.Operand1.Registers.mmx & REG0` -- **xmm** : *[out]* set of flags to define which XMM register is used. For instance, to test if operand 1 uses XMM0, test `infos.Operand1.Registers.xmm & REG0` -- **ymm** : *[out]* set of flags to define which YMM register is used. For instance, to test if operand 1 uses YMM0, test `infos.Operand1.Registers.ymm & REG0` -- **zmm** : *[out]* set of flags to define which ZMM register is used. For instance, to test if operand 1 uses ZMM0, test `infos.Operand1.Registers.zmm & REG0`. -- **special** : *[out]* set of flags to define which special register is used. Special Registers are following : +- **type**: *[out]* set of flags to define which type of registers are used. For instance, to test if operand1 is a general purpose register, use `infos.Operand1.Registers.type & GENERAL_REG`. +- **gpr**: *[out]* set of flags to define which general purpose register is used. For instance, to test if operand 1 uses RAX, test `infos.Operand1.Registers.gpr & REG0` +- **mmx**: *[out]* set of flags to define which MMX register is used. For instance, to test if operand 1 uses MM0, test `infos.Operand1.Registers.mmx & REG0` +- **xmm**: *[out]* set of flags to define which XMM register is used. For instance, to test if operand 1 uses XMM0, test `infos.Operand1.Registers.xmm & REG0` +- **ymm**: *[out]* set of flags to define which YMM register is used. For instance, to test if operand 1 uses YMM0, test `infos.Operand1.Registers.ymm & REG0` +- **zmm**: *[out]* set of flags to define which ZMM register is used. For instance, to test if operand 1 uses ZMM0, test `infos.Operand1.Registers.zmm & REG0`. +- **special**: *[out]* set of flags to define which special register is used. Special Registers are following : - RFLAGS/EFLAGS (REG0) - MXCSR (REG1) - SSP (REG2) - PKRU (REG3) - UIF (REG4) User Interrupt Flag (1 bit in the user interrupt state) is not a MSR but actually, no MSR is used to set/read this flag -- **cr** : *[out]* set of flags to define which CR register is used. For instance, to test if operand 1 uses CR0, test `infos.Operand1.Registers.cr & REG0`. -- **dr** : *[out]* set of flags to define which DR register is used. For instance, to test if operand 1 uses DR0, test `infos.Operand1.Registers.dr & REG0`. -- **mem_management** : *[out]* set of flags to define which memory management register is used. +- **cr**: *[out]* set of flags to define which CR register is used. For instance, to test if operand 1 uses CR0, test `infos.Operand1.Registers.cr & REG0`. +- **dr**: *[out]* set of flags to define which DR register is used. For instance, to test if operand 1 uses DR0, test `infos.Operand1.Registers.dr & REG0`. +- **mem_management**: *[out]* set of flags to define which memory management register is used. - GDTR (REG0) - LDTR (REG1) - IDTR (REG2) - TR (REG3) -- **mpx** : *[out]* set of flags to define which bound register is used. For instance, to test if operand 1 uses *BND0*, test `infos.Operand1.Registers.mpx & REG0`. -- **opmask** : *[out]* set of flags to define which opmask register is used. For instance, to test if operand 1 uses *k0*, test `infos.Operand1.Registers.opmask & REG0`. -- **segment** : *[out]* set of flags to define which segment register is used. +- **mpx**: *[out]* set of flags to define which bound register is used. For instance, to test if operand 1 uses *BND0*, test `infos.Operand1.Registers.mpx & REG0`. +- **opmask**: *[out]* set of flags to define which opmask register is used. For instance, to test if operand 1 uses *k0*, test `infos.Operand1.Registers.opmask & REG0`. +- **segment**: *[out]* set of flags to define which segment register is used. - ES (REG0) - CS (REG1) - SS (REG2) - DS (REG3) - FS (REG4) - GS (REG5) -- **fpu** : *[out]* set of flags to define which FPU register is used. For instance, to test if operand 1 uses *st(0)*, test `infos.Operand1.Registers.fpu & REG0`. +- **fpu**: *[out]* set of flags to define which FPU register is used. For instance, to test if operand 1 uses *st(0)*, test `infos.Operand1.Registers.fpu & REG0`. +- **tmm**: *[out]* set of flags to define which TMM register is used (intel AMX extension). For instance, to test if operand 1 uses *tmm0*, test `infos.Operand1.Registers.tmm & REG0`. # 9. Constants @@ -355,7 +365,9 @@ FXSR_INSTRUCTION = 0x1d0000, XSAVE_INSTRUCTION = 0x1e0000, SGX_INSTRUCTION = 0x1f0000, PCONFIG_INSTRUCTION = 0x200000, - +UINTR_INSTRUCTION = 0x210000, +KL_INSTRUCTION = 0x220000, +AMX_INSTRUCTION = 0x230000, ~~~~ Values taken by LOWORD(infos.Instruction.Category)