Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add architecture and imphash for PE field set #763

Merged
merged 4 commits into from
Apr 6, 2020

Conversation

andrewstucki
Copy link
Contributor

So, this PR adds the fields imphash and architecture to the PE field set. Both are commonly used in PE parsing tools and in the security industry (see fields for Imphash and Target Machine). A couple of things to throw out there that people may have in mind:

  1. Imphash isn't really a pure "hash" per se, it's more of a feature fingerprint, and it's highly PE-specific, so I didn't extend hash with it.
  2. Architecture could also go under dll or process, but:
    a. we'd have to dup the field
    b. most of the time under process it's going to be the same as the host architecture unless you're running in some sort of execution subsystem like WSL or WINE for linux or something like that.
    c. there's some differences between file formats that support multiple architectures (i.e. fat binaries) and those that don't

So, the thought was due to the above reasons, these fields should exist as a subset of pe since they are tied to the file format itself. Thoughts on getting this in for 1.5?

type: keyword
ignore_above: 1024
description: CPU architecture target for the file.
example: x64
Copy link

@gabriellandau gabriellandau Feb 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care if we make our own values here or should we use the ones that Microsoft defined? For example, in the sensor outputs x64 but Microsoft uses the nomenclature AMD64 (IMAGE_FILE_MACHINE_AMD64)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was to normalize it like VirusTotal does, but not entirely sure if we'd have to be strict about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a clear set of instructions we can give on how this should be normalized (e.g. linking to another source) we should do that now.

If there isn't, we can leave this up to the source, and only address later, only if needed.

The thinking: we have to balance the amount of work required by sources to get the normalization right. So I think it's fine to tighten this later, only if needed.

@marshallmain
Copy link
Contributor

The PE field set is windows specific though, right? Wouldn't we want architecture to be OS agnostic and not tied to the PE fields?

@gabriellandau
Copy link

b. most of the time under process it's going to be the same as the host architecture

In many cases yes. In every WOW64 Windows process, however, you will have a combination of 32- and 64-bit DLLs loaded. The x64 DLLs implement the WOW64 emulation layer, among other things. Some security products will inject x64 hook DLLs into WOW64 processes.

Here's an example WOW64 process where you can see several x64 DLLs loaded from C:\Windows\System32

C:\Users\user\Desktop>listdlls 7064

Listdlls v3.2 - Listdlls
Copyright (C) 1997-2016 Mark Russinovich
Sysinternals

------------------------------------------------------------------------------
powershell.exe pid: 7064
Command line: "C:\Windows\SysWOW64\WindowsPowerShell\v1.0\powershell.exe"

Base                Size      Path
0x0000000000fc0000  0x6c000   C:\Windows\SysWOW64\WindowsPowerShell\v1.0\powershell.exe
0x0000000081cf0000  0x1e1000  C:\Windows\SYSTEM32\ntdll.dll
0x0000000077310000  0x52000   C:\Windows\System32\wow64.dll
0x0000000077370000  0x78000   C:\Windows\System32\wow64win.dll
0x0000000077300000  0xa000    C:\Windows\System32\wow64cpu.dll
0x0000000000fc0000  0x6c000   C:\Windows\SysWOW64\WindowsPowerShell\v1.0\powershell.exe
0x00000000773f0000  0x190000  C:\Windows\SysWOW64\ntdll.dll
0x00000000751b0000  0xe0000   C:\Windows\SysWOW64\KERNEL32.DLL
0x0000000073e40000  0x1e4000  C:\Windows\SysWOW64\KERNELBASE.dll
0x0000000074b80000  0xbf000   C:\Windows\SysWOW64\msvcrt.dll
0x0000000074ff0000  0x96000   C:\Windows\SysWOW64\OLEAUT32.dll
0x00000000747a0000  0x7d000   C:\Windows\SysWOW64\msvcp_win.dll
0x0000000075090000  0x11d000  C:\Windows\SysWOW64\ucrtbase.dll
0x0000000075400000  0x25c000  C:\Windows\SysWOW64\combase.dll
0x0000000074080000  0xc0000   C:\Windows\SysWOW64\RPCRT4.dll
0x0000000073cb0000  0x20000   C:\Windows\SysWOW64\SspiCli.dll
0x0000000073ca0000  0xa000    C:\Windows\SysWOW64\CRYPTBASE.dll
0x0000000073d50000  0x57000   C:\Windows\SysWOW64\bcryptPrimitives.dll
0x0000000074e10000  0x44000   C:\Windows\SysWOW64\sechost.dll
0x0000000075660000  0x78000   C:\Windows\SysWOW64\ADVAPI32.dll
0x0000000074820000  0xfc000   C:\Windows\SysWOW64\OLE32.dll
0x0000000074030000  0x22000   C:\Windows\SysWOW64\GDI32.dll
0x0000000074e80000  0x164000  C:\Windows\SysWOW64\gdi32full.dll
0x0000000075b70000  0x18d000  C:\Windows\SysWOW64\USER32.dll
0x0000000074e60000  0x17000   C:\Windows\SysWOW64\win32u.dll
0x000000006a040000  0x18000   C:\Windows\SysWOW64\ATL.DLL
0x0000000069fe0000  0x55000   C:\Windows\SysWOW64\mscoree.dll
0x0000000075d80000  0x26000   C:\Windows\SysWOW64\IMM32.DLL
0x0000000074c40000  0xf000    C:\Windows\SysWOW64\kernel.appcore.dll
0x00000000731a0000  0x7c000   C:\Windows\SysWOW64\uxtheme.dll
0x0000000073db0000  0x83000   C:\Windows\SysWOW64\clbcatq.dll
0x00000000741e0000  0x5bc000  C:\Windows\SysWOW64\windows.storage.dll
0x0000000074c50000  0x45000   C:\Windows\SysWOW64\shlwapi.dll
0x0000000074af0000  0x88000   C:\Windows\SysWOW64\shcore.dll
0x0000000073ce0000  0x18000   C:\Windows\SysWOW64\profapi.dll
0x0000000077250000  0x45000   C:\Windows\SysWOW64\powrprof.dll
0x0000000074ac0000  0x8000    C:\Windows\SysWOW64\FLTLIB.DLL
0x0000000073960000  0x181000  C:\Windows\SysWOW64\propsys.dll
0x0000000075f00000  0x134e000  C:\Windows\SysWOW64\shell32.dll
0x00000000741a0000  0x39000   C:\Windows\SysWOW64\cfgmgr32.dll
0x0000000069ec0000  0xb000    C:\Windows\SysWOW64\LINKINFO.dll
0x000000006d490000  0xc8000   C:\Windows\SysWOW64\ntshrui.dll
0x000000006f9d0000  0x1c000   C:\Windows\SysWOW64\srvcli.dll
0x000000006d480000  0xf000    C:\Windows\SysWOW64\cscapi.dll
0x000000006d340000  0x73000   C:\Windows\SysWOW64\policymanager.dll
0x000000006ea70000  0x67000   C:\Windows\SysWOW64\msvcp110_win.dll
0x0000000073370000  0x19000   C:\Windows\SysWOW64\bcrypt.dll
0x0000000069f50000  0x8d000   C:\Windows\Microsoft.NET\Framework\v4.0.30319\mscoreei.dll
0x0000000073940000  0x8000    C:\Windows\SysWOW64\VERSION.dll
0x0000000068f90000  0x7b0000  C:\Windows\Microsoft.NET\Framework\v4.0.30319\clr.dll
0x0000000069f30000  0x14000   C:\Windows\SysWOW64\VCRUNTIME140_CLR0400.dll
0x0000000069e10000  0xab000   C:\Windows\SysWOW64\ucrtbase_clr0400.dll
0x0000000067b80000  0x140e000  C:\Windows\assembly\NativeImages_v4.0.30319_32\mscorlib\92d30f3dd1d092e15dd783b14354d8ea\mscorlib.ni.dll
0x00000000733c0000  0x13000   C:\Windows\SysWOW64\CRYPTSP.dll
0x0000000073390000  0x2f000   C:\Windows\SysWOW64\rsaenh.dll
0x0000000069d80000  0x89000   C:\Windows\Microsoft.NET\Framework\v4.0.30319\clrjit.dll
0x0000000067120000  0xa55000  C:\Windows\assembly\NativeImages_v4.0.30319_32\System\0eeb857977bf8c4a1fcc54528b8f5853\System.ni.dll
0x0000000066900000  0x818000  C:\Windows\assembly\NativeImages_v4.0.30319_32\System.Core\ee717223ade9f8db56b8e1279d794680\System.Core.ni.dll
0x0000000073cd0000  0x6000    C:\Windows\SysWOW64\psapi.dll
0x0000000073d00000  0x47000   C:\Windows\SysWOW64\wintrust.dll
0x0000000074060000  0xe000    C:\Windows\SysWOW64\MSASN1.dll
0x0000000074920000  0x195000  C:\Windows\SysWOW64\CRYPT32.dll
0x0000000066180000  0x774000  C:\Windows\assembly\NativeImages_v4.0.30319_32\System.Xml\8424b694e88098301691974f4abc2099\System.Xml.ni.dll
0x0000000069f10000  0x1e000   C:\Windows\SysWOW64\gpapi.dll
0x000000006d3c0000  0x15000   C:\Windows\SysWOW64\wldp.dll
0x0000000075b10000  0x60000   C:\Windows\SysWOW64\coml2.dll
0x0000000069ef0000  0xf000    C:\Windows\SysWOW64\amsi.dll
0x0000000073430000  0x22000   C:\Windows\SysWOW64\USERENV.dll
0x0000000069a90000  0x80000   C:\Windows\assembly\NativeImages_v4.0.30319_32\Microsoft.Mf49f6405#\8e60287f3ad05ee5e34d1c308fbf7e27\Microsoft.Management.Infrastructure.ni.dll
0x00000000658b0000  0x354000  C:\Windows\Microsoft.Net\assembly\GAC_32\System.Data\v4.0_4.0.0.0__b77a5c561934e089\System.Data.dll
0x0000000075d10000  0x67000   C:\Windows\SysWOW64\WS2_32.dll
0x0000000069990000  0x4c000   C:\Windows\Microsoft.Net\assembly\GAC_32\System.Transactions\v4.0_4.0.0.0__b77a5c561934e089\System.Transactions.dll
0x0000000073950000  0xa000    C:\Windows\SysWOW64\secur32.dll
0x00000000657a0000  0x105000  C:\Windows\assembly\NativeImages_v4.0.30319_32\System.Configuration\076fae3efb073c01a3a155a96928574b\System.Configuration.ni.dll

@andrewstucki
Copy link
Contributor Author

@marshallmain So that's what I was getting at with:

  1. Architecture could also go under dll or process, but:
    a. we'd have to dup the field
    b. most of the time under process it's going to be the same as the host architecture unless you're running in some sort of execution subsystem like WSL or WINE for linux or something like that.
    c. there's some differences between file formats that support multiple architectures (i.e. fat binaries) and those that don't

Basically shared libraries/"dll"s under linux and Mac systems can have multiple architectures tied to them, in Windows, they're single-valued. In the case of process there's not necessarily a one-to-one mapping between the architecture in the pe file itself and what mode the process is actually executing in.

So, the story around architecture gets kind of confusing between what exists at a higher level construct like a "dll" and what is a particular feature of an object file format itself. If for some reason we want to add something like architecture to "dll" instead of here then when the DLL is a PE file we'd just always have a single valued array that we'd be passing back.

@rw-access
Copy link
Contributor

i agree. since PE already exists, anything that belongs in a PE header is fair game to add.
architecture in any other place (other than host) could introduce some ambiguity.

@andrewstucki
Copy link
Contributor Author

going to wait for @webmat 's sign-off since I believe this would add to the 1.5 release scope. Ah, that reminds me... Changelog entry linking this PR...

@webmat webmat added the 1.6.0 label Mar 25, 2020
@webmat
Copy link
Contributor

webmat commented Apr 1, 2020

@elasticmachine, run elasticsearch-ci/docs

- name: architecture
level: extended
type: keyword
description: CPU architecture target for the file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth adding a note that this is not necessarily the architecture of the machine itself?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave this one up to you guys.

From the schema POV, there's sometimes a reflex to over-explain, as if we were telling people how to implement the source itself (e.g. a compiler actually populating PE headers in an executable), when in fact the schema's role is simply to explain where to get the data (e.g. getting the "architecture" header from the PE headers) and how to interpret it when looking at events that populate these.

But if you think there's a disconnect to explain or point out between pe.architecture and host.architecture, for example, yeah I think this may make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rw-access I think that the fact that it says, 'for the file' seems fine to me, and I don't think we should over-explain as I would imagine people who were filling in this field via parsing pe headers would know how to do it. But, if you're thinking it's still vague, we can tighten it up.

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andrewstucki

I noted a few things to adjust or discuss further, but nothing big.
I'll trust Endpoint's instinct on whether .architecture should be normalized or where to add it (sounds like pe.architecture is fine and straightforward, so 👍). But from the schema POV, I think it's fine in some cases to not normalize at first, and add instructions to normalize only if it becomes needed.

CHANGELOG.next.md Outdated Show resolved Hide resolved
type: keyword
ignore_above: 1024
description: CPU architecture target for the file.
example: x64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a clear set of instructions we can give on how this should be normalized (e.g. linking to another source) we should do that now.

If there isn't, we can leave this up to the source, and only address later, only if needed.

The thinking: we have to balance the amount of work required by sources to get the normalization right. So I think it's fine to tighten this later, only if needed.

schemas/pe.yml Show resolved Hide resolved
schemas/pe.yml Outdated Show resolved Hide resolved
@andrewstucki
Copy link
Contributor Author

@webmat updated some of the verbage like you requested and merged master, so if you're 👍 I'll merge

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

You had requested @dcode's review as well so I'm not merging right away. I'll let you merge whenever you're ready (remove @dcode or after his review).

If you want me to do the merge for any reason, let me know.

@andrewstucki andrewstucki removed the request for review from dcode April 6, 2020 18:26
@andrewstucki andrewstucki merged commit 1f7fa10 into elastic:master Apr 6, 2020
@andrewstucki andrewstucki deleted the imphash branch April 6, 2020 18:27
dcode pushed a commit to dcode/ecs that referenced this pull request Apr 15, 2020
* Add architecture and imphash for PE field set

* Add changelog entry

* Update schemas/pe.yml

Co-Authored-By: Mathieu Martin <[email protected]>

Co-authored-by: Mathieu Martin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants