This guide details the process of data dictionary authoring, by describing the structure and organization of events.
Lets recap what a data dictionary event is, before going into the nitty-gritty details.
Data dictionaries are atomic structures of base events emitted by log sources on a platform. These structures contain the definition of a base event and its fields.
The event definition enables the entity to be fully functional by itself. It describes the title, code, platform, log source, and other meta-data that provides context. The event definition plays a major role on how the event is consumed.
The event fields contain the list of fields available. Each field have properties that provide context about the field, and ultimately enable it to be correlated with other OSSEM data like the Common Data Model and Detection Data Model.
OSSEM data dictionaries are structured to be as lean as possible, the reason is twofold: not only to avoid redundant information between events, but also to promote the adoption of external references (pointers).
OSSEM data dictionaries are represented in YAML. Again, the goal is to find the best balance between human readability, and ease of automation, hence YAML was a relatively easy pick when choosing an OSSEM data language.
The event definion fields are:
- Title: the event title if any, otherwise use the event code
- Description: the description of the event
- Platform: the platform where the log source is hosted
- Log source: the log source that generates the event
- Event code: the code or ID of the event
- Event version: the version of the event
- Reference list: text and link for external references relevant to the event
- Tag list: tags applicable to the event
For every field in the event, the properties are:
- Standard name: the standard name assigned to the field name, if applicable
- Name: the field name as per vendor documentation
- Type: the field type as per vendor documentation
- Description: the field description
- Sample value: the field sample value, if applicable
An example of an Windows Security Auditing Event 4616 follows:
title: 'Event ID 4616: The system time was changed.'
description: This event generates every time system time was changed.
platform: windows
log_source: Microsoft-Windows-Security-Auditing
event_code: '4616'
event_version: '1'
event_fields:
- standard_name: user_sid
name: SubjectUserSid
type: SID
description: SID of account that requested the "change system time" operation.
sample_value: S-1-5-21-3457937927-2839227994-823803824-1104
- standard_name: user_name
name: SubjectUserName
type: UnicodeString
description: the name of the account that requested the "change system time" operation.
sample_value: dadmin
- standard_name: user_domain
name: SubjectDomainName
type: UnicodeString
description: subject's domain or computer name.
sample_value: CONTOSO
- standard_name: user_logon_id
name: SubjectLogonId
type: HexInt64
description: 'hexadecimal value that can help you correlate this event with recent
events that might contain the same Logon ID, for example, "4624: An account was
successfully logged on".'
sample_value: '0x48f29'
- standard_name: TBD
name: PreviousTime
type: FILETIME
description: previous time in UTC time zone.
sample_value: '2015-10-09T05:04:30.000941900Z'
- standard_name: TBD
name: NewTime
type: FILETIME
description: new time that was set in UTC time zone.
sample_value: '2015-10-09T05:04:30.000000000Z'
- standard_name: process_id
name: ProcessId
type: Pointer
description: hexadecimal Process ID of the process that changed the system time.
Process ID (PID) is a number used by the operating system to uniquely identify
an active process.
sample_value: '0x1074'
- standard_name: process_path
name: ProcessName
type: UnicodeString
description: full path and the name of the executable for the process.
sample_value: C:\Windows\WinSxS\amd64_microsoft-windows-com-surrogate-core_31bf3856ad364e35_6.3.9600.16384_none_25a8f00faa8f185c\dllhost.exe
references:
- text: MS SOURCE
link: https://github.com/MicrosoftDocs/windows-itpro-docs/blob/public/windows/security/threat-protection/auditing/event-4616.md
- text: MS Security Auditing Category - System
link: https://docs.microsoft.com/en-us/windows/security/threat-protection/auditing/advanced-security-audit-policy-settings#system
- text: MS Security Auditing Sub-category - Audit Security State Change
link: https://github.com/MicrosoftDocs/windows-itpro-docs/tree/master/windows/security/threat-protection/auditing/audit-security-state-change.md
tags:
- etw_level_Informational
- etw_task_task_0
- version_1
- System
- Audit Security State Change
The standard_name is a special property of event fields, as it represents the first layer of data standardization on the event.
In the example above (event 4616), the SubjectUserSid
name was translated to user_logon_id
standard name. This translation ensures the data dictionary is aligned with the Common Data Model (CDM) User schema, reduces complexity, and enhances the development of detection analytics.
Note that its not mandatory that you define a standard name for every field on your event. Some good practices when defining standard names include:
- Search for the field name in other OSSEM events. Its not uncommon that you can apply the same standard name to identical field names, specially if the log source is the same.
- Check if the standard name already exists in one of the Common Data Model entities schema.
OSSEM built-in data dictionaries are primarily organized in a file system folder structure, that ensures the grouping according to data dictionaries characteristics. While there is no limit to the folder depth, the platform and log sources folders must follow a predefined structure.
Data dictionaries are located in the OSSEM-DD sub-repository. The first level of organization is by platform.
An example of a build-in Sysmon data dictionary follows:
.
├── OSSEM-DD <--------- sub-repository
│ ├── README.yml
│ ├── windows <--------- platform (operating system/sensor folder)
│ │ ├── README.yml
│ │ ├── sysmon <--------- log source folder
│ │ │ ├── README.yml
│ │ │ └── events <--------- events folder
│ │ │ ├── event-1.yml
│ │ │ ├── event-7.yml
│ │ │ ├── event-8.yml <- data dictionary entry
...
Each platform folder contain sub-folder for the log source, which in turn always contain an events/
folder where the events are stored.
Since the platform and log source properties are already defined in the event, is fairly straightforward to figure out where to store your events.
Because ensuring the consistency of this folder structure can be tricky, specially when dealing with dozens of log sources, OSSEM provides README files that provide additional information about the current folder. These files are particularly helpful when converting OSSEM to markdown, where they are used as indexes.
Similarly to data dictionaries, README files are also defined in YAML, contain the following properties:
- title: for example the log source title
- description: for example the log source description
- images: text and path to images relevant to the readme
- references: text and link for external references relevant to the readme
An example of a README follows:
title: Sysmon Event Logs
description: System Monitor (Sysmon) is a Windows system service and device
driver that, once installed on a system, remains resident across system
reboots to monitor and log system activity to the Windows event log. It
provides detailed information about process creations, network connections,
and changes to file creation time. By collecting the events it generates
using Windows Event Collection or SIEM agents and subsequently analyzing
them, you can identify malicious or anomalous activity and understand how
intruders and malware operate on your network.
images:
- title: Data model
source: /resources/images/SysmonDataModel.png
references:
- text: Sysmon Source
link: https://docs.microsoft.com/en-us/sysinternals/downloads/sysmon
- text: TrustedSec Sysinternals Sysmon Community Guide
link: https://github.com/trustedsec/SysmonCommunityGuide