Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PE header metadata #676

Closed
rw-access opened this issue Dec 5, 2019 · 7 comments · Fixed by #731
Closed

PE header metadata #676

rw-access opened this issue Dec 5, 2019 · 7 comments · Fixed by #731
Assignees
Labels
endpoint Relevant to elastic endpoint security

Comments

@rw-access
Copy link
Contributor

rw-access commented Dec 5, 2019

Another thing that Sysmon and Endgame both collect for process and DLL (#675) events is PE metadata.
image

There may be an OS agnostic way to represent this, but I think this has value even if it's Windows only. This field set could be nested within process or dll/lib/module and some of the already enumerated fields by Sysmon (see an example event in OSSEM]) are below:

  • original_file_name: wuauclt.exe
  • file_version: 10.0.17134.1 (WinBuild.160101.0800)
  • description: Windows Update
  • product: Microsoft® Windows® Operating System
  • company: Microsoft Corporation

There's also authenticode information for signed binaries. That field set could be nested here, but we first need to solve that issue (#681)

  • signer: Microsoft Corporation
  • status: trusted / untrusted / no_signature, etc. We might want to enumerate some of these values ahead of time
@rw-access rw-access self-assigned this Dec 5, 2019
@andrewstucki
Copy link
Contributor

andrewstucki commented Dec 5, 2019

I was going to open an issue for signature data initially @rw-access , so thanks for adding it here already. For signatures--I can totally see this being useful in contexts other than PE headers, so as you allude to, I'd love to see a signature field set that's embeddable in multiple contexts:

  • packages
  • pe headers
  • containers (for something like DTR?)
  • wherever else makes sense

Should we split out the issue or just figure it out here?

@webmat
Copy link
Contributor

webmat commented Dec 9, 2019

The specifics may be different from an OS to another (and not sure if this even exists for Linux), but I'd really like to see if the concepts can be defined in an OS-agnostic way.

From what I can tell, Windows' PE probably contains analogous information to the equivalent system on OSX. If that's the case, we should take both into account.

@andrewstucki
Copy link
Contributor

We could make this more of an object_file field set, but I'm not sure how well notions like the enumerated

original_file_name
file_version
description
product
company

fields would actually map to Mach-O/ELF header concepts.

Wondering if there's an alternative option of separating out common parsed object file concepts into one field set for behavior-like things (i.e. linkage/symbol tables, PIC flags, versions etc.) and maybe metadata level stuff like this (that is more of an implementation-specific thing of PE files) into something like a vendor field set?

@andrewstucki
Copy link
Contributor

could even have something like vendor embeddable in multiple places--things like packages, OS info, etc.

@rw-access
Copy link
Contributor Author

rw-access commented Dec 9, 2019

I've been looking at some Mach-o metadata, and can't find any worthwhile overlaps. I didn't find any sections that were pure metadata, but both have signatures--which we've covered in #681.

Since we don't have any additional Mach-o metadata in our process events yet, we could always rename this field set to something a bit more generic and future proof (if we can come up with a good name), and we can always populate it as we go. That'll leave room from cross-platform in the future without letting it get in the way much for now.

Windows also has the concept of an application manifest. Should that be encompassed? Feels a bit like scope creep.

I'm struggling to come up with good names. And it's hard to be flexible and future proof while limiting scope creep and keeping the names of field sets well defined.

We could have a file_meta name, for example. But then we have to define the distinction between file.* and file_meta.*, and that sounds like its scope is unending.

@rw-access
Copy link
Contributor Author

rw-access commented Dec 9, 2019

I looked at Virus Total to see how it deals with file metadata.
Here's an example of a Windows binary and another for a Mach-O binary

some of the metadata I see (not including signer information, because that's covered separately #681).

Mach-O

Signature Information: File Version Information

name value
Identifier com.adguard.mac.adguard-install
Authority Apple Root CA
Date Signed Dec 9, 2019 at 8:30:45 AM
Team Identifier TC3Q7MAJXF

Mac OS X Executable Info: File Header

name value
File Type executable file
Magic 0xfeedfacf
Required Architecture x86_64
Sub-architecture X86_64_ALL
Entry Point 0x291a
Reserved 0x0
Contained Load Commands 24
Load Commands Size 3712
Flags DYLDLINKNOUNDEFSPIETWOLEVEL

Windows (PE)

signature info: file version information strangely, this is lumped under the signature but they are generally considered separate

name value
File Version Information
Copyright © Microsoft Corporation. All rights reserved.
Product Microsoft® Windows® Operating System
Description Microsoft T2Embed Font Embedding
Original Name T2EMBED.DLL
Internal Name T2EMBED.DLL
File Version 6.1.7601.17514 (win7sp1_rtm.101119-1850)

PE info: header

name value
Target Machine Intel 386 or later processors and compatible processors
Compilation Timestamp 2016-12-23 05:59:02
Entry Point 15120
Contained Sections 7

exif data (extracted from ExifTool)

name value
CharacterSet Unicode
CodeSize 13312
CompanyName Microsoft Corporation
EntryPoint 0x3b10
FileDescription Microsoft T2Embed Font Embedding
FileFlagsMask 0x003f
FileOS Windows NT 32-bit
FileSubtype 0
FileType Win32 EXE
FileTypeExtension exe
FileVersion 6.1.7601.17514 (win7sp1_rtm.101119-1850)
FileVersionNumber 6.1.7601.17514
ImageFileCharacteristics Executable, 32-bit
ImageVersion 0
InitializedDataSize 301568
InternalName T2EMBED.DLL
LanguageCode English (U.S.)
LegalCopyright Microsoft Corporation. All rights reserved.
LinkerVersion 9
MIMEType application/octet-stream
MachineType Intel 386 or later, and compatibles
OSVersion 5
ObjectFileType Dynamic link library
OriginalFileName T2EMBED.DLL
PEType PE32
ProductName Microsoft Windows Operating System
ProductVersion 6.1.7601.17514
ProductVersionNumber 6.1.7601.17514
Subsystem Windows GUI
SubsystemVersion 5
TimeStamp 2016:12:23 06:59:02+01:00
UninitializedDataSize 0

@andrewstucki
Copy link
Contributor

andrewstucki commented Dec 9, 2019

@webmat -- I was just thinking some more about this, and I'm wondering how the idea of trying to unify on field definitions scales with other file format feature extractions. Object files are just a specific example of a "class" of file formats, but each has fairly different structures. Same with something like image file formats. Take for example trying to unify on an image file format if you're trying to describe something like a jpeg's JFIFthumbJPEG extensionCode value? When it comes to specific file format feature extraction, I'm wondering if we do indeed need to introduce format-specific fields in their own field sets.

That said--I'm really not a fan of having a super flat structure where we open the door for having potentially thousands of top-level field sets--it makes both documentation and, more importantly, payloads harder to grok. So I'm wondering if it makes sense to somehow group field sets that exist in this space? Whether it's somewhere under file.features.parsed_format_features_here.* or something like that?

@rw-access rw-access added the endpoint Relevant to elastic endpoint security label Jan 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
endpoint Relevant to elastic endpoint security
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants