-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add architecture and imphash for PE field set #763
Conversation
type: keyword | ||
ignore_above: 1024 | ||
description: CPU architecture target for the file. | ||
example: x64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we care if we make our own values here or should we use the ones that Microsoft defined? For example, in the sensor outputs x64
but Microsoft uses the nomenclature AMD64
(IMAGE_FILE_MACHINE_AMD64
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought was to normalize it like VirusTotal does, but not entirely sure if we'd have to be strict about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's a clear set of instructions we can give on how this should be normalized (e.g. linking to another source) we should do that now.
If there isn't, we can leave this up to the source, and only address later, only if needed.
The thinking: we have to balance the amount of work required by sources to get the normalization right. So I think it's fine to tighten this later, only if needed.
The PE field set is windows specific though, right? Wouldn't we want architecture to be OS agnostic and not tied to the PE fields? |
In many cases yes. In every WOW64 Windows process, however, you will have a combination of 32- and 64-bit DLLs loaded. The x64 DLLs implement the WOW64 emulation layer, among other things. Some security products will inject x64 hook DLLs into WOW64 processes. Here's an example WOW64 process where you can see several x64 DLLs loaded from
|
@marshallmain So that's what I was getting at with:
Basically shared libraries/"dll"s under linux and Mac systems can have multiple architectures tied to them, in Windows, they're single-valued. In the case of So, the story around |
i agree. since PE already exists, anything that belongs in a PE header is fair game to add. |
going to wait for @webmat 's sign-off since I believe this would add to the 1.5 release scope. Ah, that reminds me... Changelog entry linking this PR... |
@elasticmachine, run elasticsearch-ci/docs |
- name: architecture | ||
level: extended | ||
type: keyword | ||
description: CPU architecture target for the file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth adding a note that this is not necessarily the architecture of the machine itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave this one up to you guys.
From the schema POV, there's sometimes a reflex to over-explain, as if we were telling people how to implement the source itself (e.g. a compiler actually populating PE headers in an executable), when in fact the schema's role is simply to explain where to get the data (e.g. getting the "architecture" header from the PE headers) and how to interpret it when looking at events that populate these.
But if you think there's a disconnect to explain or point out between pe.architecture
and host.architecture
, for example, yeah I think this may make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rw-access I think that the fact that it says, 'for the file' seems fine to me, and I don't think we should over-explain as I would imagine people who were filling in this field via parsing pe headers would know how to do it. But, if you're thinking it's still vague, we can tighten it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @andrewstucki
I noted a few things to adjust or discuss further, but nothing big.
I'll trust Endpoint's instinct on whether .architecture
should be normalized or where to add it (sounds like pe.architecture
is fine and straightforward, so 👍). But from the schema POV, I think it's fine in some cases to not normalize at first, and add instructions to normalize only if it becomes needed.
type: keyword | ||
ignore_above: 1024 | ||
description: CPU architecture target for the file. | ||
example: x64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's a clear set of instructions we can give on how this should be normalized (e.g. linking to another source) we should do that now.
If there isn't, we can leave this up to the source, and only address later, only if needed.
The thinking: we have to balance the amount of work required by sources to get the normalization right. So I think it's fine to tighten this later, only if needed.
Co-Authored-By: Mathieu Martin <[email protected]>
@webmat updated some of the verbage like you requested and merged master, so if you're 👍 I'll merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Add architecture and imphash for PE field set * Add changelog entry * Update schemas/pe.yml Co-Authored-By: Mathieu Martin <[email protected]> Co-authored-by: Mathieu Martin <[email protected]>
So, this PR adds the fields
imphash
andarchitecture
to the PE field set. Both are commonly used in PE parsing tools and in the security industry (see fields forImphash
andTarget Machine
). A couple of things to throw out there that people may have in mind:hash
with it.dll
orprocess
, but:a. we'd have to dup the field
b. most of the time under
process
it's going to be the same as the host architecture unless you're running in some sort of execution subsystem like WSL or WINE for linux or something like that.c. there's some differences between file formats that support multiple architectures (i.e. fat binaries) and those that don't
So, the thought was due to the above reasons, these fields should exist as a subset of
pe
since they are tied to the file format itself. Thoughts on getting this in for 1.5?