From 7e3b0a684be36155467d1ba94a8f702c90642a9b Mon Sep 17 00:00:00 2001 From: Tigran Najaryan Date: Mon, 13 Apr 2020 17:26:02 -0400 Subject: [PATCH 1/8] Define Log Data model This is a proposal of a data model and semantic conventions that allow to represent logs from various sources: application log files, machine generated events, system logs, etc. Existing log formats can be unambiguously mapped to this data model. Reverse mapping from this data model is also possible to the extent that the target log format has equivalent capabilities. The purpose of the data model is to have a common understanding of what a log record is, what data needs to be recorded, transferred, stored and interpreted by a logging system. --- text/0097-log-data-model.md | 952 ++++++++++++++++++++++++++++++++++++ 1 file changed, 952 insertions(+) create mode 100644 text/0097-log-data-model.md diff --git a/text/0097-log-data-model.md b/text/0097-log-data-model.md new file mode 100644 index 000000000..4bb155688 --- /dev/null +++ b/text/0097-log-data-model.md @@ -0,0 +1,952 @@ +# Log Data Model + +Introduce Data Model for Log Records as it is understood by OpenTelemetry. + +* [Motivation](#motivation) +* [Design Notes](#design-notes) + * [Requirements](#requirements) + * [Field Kinds](#field-kinds) +* [Logical Data Model vs Physical Format](#logical-data-model-vs-physical-format) +* [Log and Event Record Definition](#log-and-event-record-definition) + * [Field: `timestamp`](#field-timestamp) + * [Trace Context:](#trace-context) + * [Field: `trace_id`](#field-traceid) + * [Field: `span_id`](#field-spanid) + * [Field: `trace_flags`](#field-traceflags) + * [Severity](#severity) + * [Field: `severity_text`](#field-severitytext) + * [Field: `severity_number`](#field-severitynumber) + * [Displaying Severity](#displaying-severity) + * [Comparing Severity](#comparing-severity) + * [Field: `short_name`](#field-shortname) + * [Field: `body`](#field-body) + * [Field: `resource`](#field-resource) + * [Field: `attributes`](#field-attributes) +* [Representation](#representation) +* [Prior Art](#prior-art) + * [RFC5424 Syslog](#rfc5424-syslog) + * [Fluentd Forward Protocol Model](#fluentd-forward-protocol-model) +* [Appendix A. Example Mappings.](#appendix-a-example-mappings) + * [RFC5424 Syslog](#rfc5424-syslog-1) + * [Windows Event Log](#windows-event-log) + * [SignalFx Events](#signalfx-events) + * [Splunk HEC](#splunk-hec) + * [Log4j](#log4j) + * [Zap](#zap) + * [Apache](#apache) + * [CloudTrail Log Event](#cloudtrail-log-event) +* [Appendix B: `severity_number` example mappings.](#appendix-b-severitynumber-example-mappings) +* [Reference](#reference) + +## Motivation + +This is a proposal of a data model and semantic conventions that allow to +represent logs from various sources: application log files, machine generated +events, system logs, etc. Existing log formats can be unambiguously mapped to +this data model. Reverse mapping from this data model is also possible to the +extent that the target log format has equivalent capabilities. + +The purpose of the data model is to have a common understanding of what a log +record is, what data needs to be recorded, transferred, stored and interpreted +by a logging system. + +## Design Notes + +### Requirements + +This Data Model was designed to satisfy the following requirements: + +- It should be possible to unambiguously map existing log formats to this Data + Model. Translating log data from an arbitrary log format to this Data Model + and back should ideally result in identical data. + +- Mappings of other log formats to this Data Model should be semantically + meaningful. The Data Model must preserve the semantics of particular elements + of existing log formats. + +- Translating log data from an arbitrary log format A to this Data Model and + then translating from the Data Model to another log format B ideally must + result in a meaningful, lossless translation of log data that is no worse than + a reasonable direct translation from log format A to log format B. + +- It should be possible to efficiently represent the Data Model in concrete + implementations that require the data to be stored or transmitted. We + primarily care about 2 aspects of efficiency: CPU usage for + serialization/deserialization and space requirements in serialized form. This + is an indirect requirement that is affected by the specific representation of + the Data Model rather than the Data Model itself, but is still useful to keep + in mind. + +The data model aims to successfully represent 3 sorts of logs and events: + +- System Formats. These are logs and events generated by the operating system + and over which we have no control - we cannot change the format or affect what + information is included (unless the data is generated by an application which + we can modify). An example of system format is Syslog. + +- Third-party Applications. These are generated by third-party applications. We + may have certain control over what information is included, e.g. customize the + format. An example is Apache log file. + +- First-party Applications. These are applications that we develop and we have + some control over how the logs and events are generated and what information + we include in the logs. We can likely modify the source code of the + application if needed. + +### Field Kinds + +This data model defines a logical model for a log record (irrespective of the +physical format and encoding of the record). Each record contains 2 kinds of +fields: + +- Named top-level fields of specific type and meaning. + +- Fields stored in the key/value pair lists, which can contain arbitrary values + of different types. The keys and values for well-known fields follow semantic + conventions for key names and possible values that allow all parties that work + with the field to have the same interpretation of the data (see references to + semantic conventions for Resource and Attributes fields and examples in + Appendix A). + +The reasons for having these 2 kinds of fields is: + +- Ability to efficiently represent named top-level fields, which are almost + always present (e.g. when using encodings like Protocol Buffers where fields + are enumerated but not named on the wire), and to enforce their types and + domain of values, which is very useful for compiled languages with type + checks. + +- Flexibility to represent less frequent data via key/value pair lists. This + includes well-known data that has standardized semantics as well as arbitrary + custom data that the application may want to include in the logs. + +When designing this data model I followed the following reasoning to make a +decision about when to use use a top-level named field: + +- The field needs to be either mandatory for all records or be frequently + present in well-known log and event formats (such as timestamp) or is expected + to be often present in log records in upcoming logging systems (such as + trace_id). + +- The field’s semantics must be the same for all known log and event formats and + can be mapped directly and unambiguously to this data model. + +Both of the above conditions were required to give the field a place in the +top-level structure of the record. + +## Logical Data Model vs Physical Format + +The data model does not define the actual encoding and format of the log record +representation. Format definitions will be done in separate OTEPs (e.g the log +records may be represented as msgpack, JSON, Protocol Buffer messages, etc). + +## Log and Event Record Definition + +Note: below we use type any, which can be a scalar value (number, string or +boolean), or an array or map of values. Arbitrary deep nesting of values for +arrays and maps is allowed (essentially allow to represent an equivalent of a +JSON object). + +Appendix A contains many examples that show how existing log formats map to the +fields defined below. If there are questions about the meaning of the field +reviewing the examples may be helpful. + +Here is the list of fields in a log record: + +Field Name | +---------------| +timestamp | +trace_id | +span_id | +trace_flags | +severity_text | +severity_number| +short_name | +body | +resource | +attributes | + +Below is the detailed description of each field. + + +### Field: `timestamp` + +Type: Timestamp, uint64 nanosecods since Unix epoch + +Description: Time when the event occurred measured by the origin clock. This +field is optional, it may be missing the timestamp is unknown. + +### Trace Context: + +#### Field: `trace_id` + +Type: byte sequence + +Description: Optional request trace id. Can be set for logs that are part of +request processing and have an assigned trace id. + +#### Field: `span_id` + +Type: byte sequence + +Description: Optional span id. Can be set for logs that are part of a particular +processing span. If span_id is present trace_id SHOULD be also present. + +#### Field: `trace_flags` + +Type: byte + +Description: Optional trace flag as defined in W3C trace context specification. +At the time of writing the specification defines one flag - the SAMPLED flag. + +### Severity + +#### Field: `severity_text` + +Type: string + +Description: the severity text (also known as log level). This is an optional +field and is the original string representation as it is known at the source. If +this field is missing and `severity_number` is present then the short name that +corresponds to the `severity_number` can be used as a substitution. + +#### Field: `severity_number` + +Type: number + +Description: numerical value of the severity, normalized to values described in +this document. This is an optional field. If `severity_number` is missing and +severity_text is present then it may be assumed that `severity_number` is equal +to INFO (numeric 9) (see the meaning below). + +`severity_number` is an integer number. Smaller numerical values correspond to +less severe events (such as debug events), larger numerical values correspond to +more severe events (such as errors and critical events). The following table +defines the meaning of `severity_number` value: + +severity_number range|Range name|Meaning +---------------------|----------|------- +1-4 |TRACE |A fine-grained debugging event. Typically disabled in default configurations. +5-8 |DEBUG |A debugging event. Often is not emitted in default configurations. +9-12 |INFO |An informational event. Indicates that an event happened. +13-16 |WARN |A warning event. Not an error but is likely more important than an informational event. +17-20 |ERROR |An error event. Something went wrong. +21-24 |FATAL |A fatal error such as application or system crash. + +Smaller numerical values in each range represent less important (less severe) +events. Larger numerical values in each range represent more important (more +severe) events. For example `severity_number=17` describes an error that is less +critical than an error with `severity_number=20`. + +*Mapping of `severity_number`* + +Mappings from existing logging systems and formats (of source format for short) +must define how severity (or log level) of that particular format corresponds to +`severity_number` of this data model based on the meaning listed for each range in +the above table. + +If the source format has more than one severity that matches a single range in +this table then the severities of the source format must be assigned numerical +values from that range according to how severe (important) the source severity +is. + +For example if the source format defines "Error" and "Critical" as error events +and "Critical" is a more important and more severe situation then we can choose +the following `severity_number` values for the mapping: "Error"->17, +"Critical"->18. + +If the source format has only a single severity that matches the meaning of the +range then it is recommended to assign that severity the initial value of the +range. + +For example if the source format has an "Informational" log level and no other +log levels with similar meaning then it is recommended to use `severity_number=9` +for "Informational". + +Source formats that do not define a concept of severity or log level MAY omit +`severity_number` and severity_text fields. Backend and UIs may represent log +records with missing severity information distinctly or may interpret log +records with missing `severity_number` and `severity_text` fields as if the +`severity_number` was set equal to INFO (numeric value of 9). + +*Reverse Mapping* + +When performing a reverse mapping from `severity_number` to a specific format and +the `severity_number` has no corresponding mapping entry for that format then it +is recommended to choose the target severity that is in the same severity range +and is closest numerically. + +For example Zap has only one severity in the INFO range, called "Info". When +doing reverse mapping all values in INFO range (numeric 9-12) will be mapped to +Log4J’s "Info" level. + +*Error Semantics* + +If `severity_number` is present and has a value of ERROR (numeric 17) or higher +then it is an indication that the log record represents an erroneous situation. +It is up to the reader of this value to make a decision on how to use this fact +(e.g. UIs may display such errors in a different color or have a feature to find +all "errors"). + +If the log record represents an erroneous event and the source format does not +define a severity or log record field it is recommended to set severity_number +to ERROR (numeric 17) during the mapping process. If the log record represents a +non-erroneous event the `severity_number` field may be omitted or may be set to +any numeric value less than ERROR (numeric 17). The recommended value in this +case is INFO (numeric 9). See Appendix for more mapping examples. + +#### Displaying Severity + +The following table defines the recommended short name for each +`severity_number` value (this can be used for example for representing the +`severity_number` in the UI): + +severity_number|Short Name +---------------|---------- +1 |TRACE +2 |TRACE2 +3 |TRACE3 +4 |TRACE4 +5 |DEBUG +6 |DEBUG2 +7 |DEBUG3 +8 |DEBUG4 +9 |INFO +10 |INFO2 +11 |INFO3 +12 |INFO4 +13 |WARN +14 |WARN2 +15 |WARN3 +16 |WARN4 +17 |ERROR +18 |ERROR2 +19 |ERROR3 +20 |ERROR4 +21 |FATAL +22 |FATAL2 +23 |FATAL3 +24 |FATAL4 + +When an individual log record is displayed it is recommended to show both +severity_text and seveirty_number values. A recommended combined string in this +case begins with the short name followed by severity_text in parenthesis. + +For example "Informational" Syslog record will be displayed as INFO +(Informational). When for a particular log record the `severity_number` is +defined but the severity_text is missing it is recommended to only show the +short name, e.g. INFO. + +When drop down lists or other UI elements that are intended to represent the +possible set of values are used for representing the severity it is likely +preferable to use the short names. + +For example a dropdown list of severities that allows filtering log records by +severities is likely to be more usable if it contains the short names of +`severity_number` (and thus has a limited upper bound of elements) compared to a +dropdown list which lists all distinct severity_text values that are known to +the system (which can be a large number of elements, often differing only in +capitalization or abbreviated, e.g. "Info" vs "Information"). + +#### Comparing Severity + +In the contexts where severity participates less-than / greater-than comparisons +`severity_number` field should be used. `severity_number` can be compared to +another `severity_number` or to numbers in the 1..24 range (or to the +corresponding short names). + +When severity is used in equality or inequality comparisons (for example in +filters in the UIs) the recommendation is to attempt to use both severity_text +and short name of `severity_number` to perform matches. For example if we have a +record with severity_text field equal to "Informational" and `severity_number` +field equal to INFO then it may be preferable from the user experience +perspective to ensure that severity="Informational" and severity="INFO" +conditions both to are TRUE for that record. + +### Field: `short_name` + +Type: string + +Description: Short event identifier that does not contain varying parts. +`short_name` describes what happened (e.g. "ProcessStarted"). Recommended to be +no longer than 50 characters. Optional. Not guaranteed to be unique in any way. +Typically used for filtering and grouping purposes in backends. + +### Field: `body` + +Type: any + +Description: A value containing the body of the log record (see the description +of any type above). Can be for example a human-readable string message +(including multi-line) describing the event in a free form or it can be a +structured data composed of arrays and maps of other values. Can vary for each +occurrence of the event coming from the same source. + +### Field: `resource` + +Type: key/value pair list + +Description: Describes the source of the log, aka +[resource](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/overview.md#resources). +"value" is of any type. Multiple occurrences of events coming from the same +event source can happen across time and they all have the same value of +resource. Can contain for example information about the application that emits +the record or about the infrastructure where the application runs. Data formats +that represent this data model may be designed in a manner that allows the +`resource` field to be recorded only once per batch of log records that come +from the same source. SHOULD follow OpenTelemetry +[semantic conventions](https://github.com/open-telemetry/opentelemetry-specification/tree/master/specification/resource/semantic_conventions) +for Resources. + +### Field: `attributes` + +Type: key/value pair list + +Description: Additional information about the specific event occurrence. "value" +is of any type. Unlike the resource field, which is fixed for a particular +source, attributes can vary for each occurrence of the event coming from the same +source. Can contain information about the request context (other than +TraceId/SpanId). SHOULD follow OpenTelemetry +[semantic conventions](https://github.com/open-telemetry/opentelemetry-specification/tree/master/specification/trace/semantic_conventions) +for Attributes. + +## Representation + +Representation of Log Data over the wire and the transmission protocol is to be +defined. + +Below are examples that show one possible representation in JSON. Note: this is +just an example to help understand the data model, don’t treat it as the way to +represent this data model in JSON, that is still to be defined. + +Example 1 + +```json +{ + "timestamp": 1586960586000, // JSON needs to make a decision about + // how to represent nanos. + "attributes": { + "http.status_code": 500, + "http.url": "http://example.com", + "my.custom.application.tag": "hello", + }, + "resource": { + "service.name": "donut_shop", + "service.version": "semver:2.0.0", + "k8s.pod.name": "1138528c-c36e-11e9-a1a7-42010a800198", + }, + "trace_id": "f4dbb3edd765f620", // this is a byte sequence + // (hex-encoded in JSON) + "span_id": "43222c2d51a7abe3", + "severity": "INFO", + "body": "20200415T072306-0700 INFO I like donuts" +} +``` + +Example 2 + +```json +{ + "timestamp": 1586960586000, + ... + "body": { + "i": "am", + "an": "event", + "of": { + "some": "complexity" + } + } +} +``` + +## Prior Art + +### RFC5424 Syslog + +RFC5424 defines structure log data format and protocol. The protocol is +ubiquitous (although unfortunately many implementations don’t follow structured +data recommendations). Here are some drawbacks that do not make Syslog a serious +contender for a data model: + +- While it allows structured attributes the body of the message can be only a + string. + +- Severity is hard-coded to 8 possible numeric values, and does not allow custom + severity labels. + +- Structured data does not allow arbitrary nesting and is 2-level only. + +- No clear separate place to specify data source (aka resource). There are a + couple hard-coded fields that serve this purpose in a limited way (HOSTNAME, + APP-NAME, FACILITY). + +### Fluentd Forward Protocol Model + +Forward protocol defines a log Entry concept as a timestamped record. The record +consists of 2 elements: a tag and a map of arbitrary key/value pairs. + +The model is universal enough to represent any log record. However, here are +some drawbacks: + +- All attributes of a record are represented via generic key/value pairs (except + tag and timestamp). This misses the optimization opportunities (see [Design + Notes](#design-notes)). + +- There is no clear separate place to specify data source (aka resource). + +- There is no mention of how exactly keys should be named and what are expected + values. This lack of any naming convention or standardization of key/value + pairs makes interoperability difficult. + + +## Appendix A. Example Mappings. + +This section contains examples of mapping of legacy events and logs formats to +the new universal data model. + +### RFC5424 Syslog + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PropertyTypeDescriptionMaps to Unified Model Field
TIMESTAMPtimestampTime when an event occurred measured by the origin clock.timestamp
SEVERITYenumDefines the importance of the event. Example: `Debug`severity
FACILITYenumDescribes where the event originated. A predefined list of Unix processes. Part of event source identity. Example: `mail system`attributes["syslog.facility"]
VERSIONnumberMeta: protocol version, orthogonal to the event.attributes["syslog.version"]
HOSTNAMEstringDescribes the location where the event originated. Possible values are FQDN, IP address, etc.resource["host.hostname"]
APP-NAMEstringUser-defined app name. Part of event source identity.resource["service.name"]
PROCIDstringNot well defined. May be used as a meta field for protocol operation purposes or may be part of event source identity.attributes["syslog.procid"]
MSGIDstringDefines the type of the event. Part of event source identity. Example: "TCPIN"short_description
STRUCTURED-DATAarray of maps of string to stringA variety of use cases depending on the SDID: +Can describe event source identity +Can include data that describes particular occurence of the event. +Can be meta-information, e.g. quality of timestamp value.SDID origin.swVersion map to resource["service.version"] + +SDID origin.ip map to attribute[net.host.ip"] + +Rest of SDIDs -> attributes["syslog.*"]
MSGstringFree-form text message about the event. Typically human readable.body
+ + +### Windows Event Log + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PropertyTypeDescriptionMaps to Unified Model Field
TimeCreatedtimestampThe time stamp that identifies when the event was logged.timestamp
LevelenumContains the severity level of the event.severity
ComputerstringThe name of the computer on which the event occurred.resource["host.hostname"]
EventIDuintThe identifier that the provider used to identify the event.short_description
MessagestringThe message string.body
Rest of the fields.anyAll other fields in the event.attributes["winlog.*"]
+ + +### SignalFx Events + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FieldTypeDescriptionMaps to Unified Model Field
TimestamptimestampTime when the event occurred measured by the origin clock.timestamp
EventTypestringShort machine understandable string describing the event type. SignalFx specific concept. Non-namespaced. Example: k8s Event Reason field.short_description
CategoryenumDescribes where the event originated and why. SignalFx specific concept. Example: AGENT. attributes["com.splunk.signalfx.event_category"]
Dimensionsmap of string to stringHelps to define the identity of the event source together with EventType and Category. Multiple occurrences of events coming from the same event source can happen across time and they all have the value of Dimensions. resource
Propertiesmap of string to anyAdditional information about the specific event occurrence. Unlike Dimensions which are fixed for a particular event source, Properties can have different values for each occurence of the event coming from the same event source.attributes
+ + +### Splunk HEC + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FieldTypeDescriptionMaps to Unified Model Field
timenumeric, stringThe event time in epoch time format, in seconds.timestamp
hoststringThe host value to assign to the event data. This is typically the host name of the client that you are sending data from.resource["host.hostname"]
sourcestringThe source value to assign to the event data. For example, if you are sending data from an app you are developing, you could set this key to the name of the app.resource["service.name"]
sourcetypestringThe sourcetype value to assign to the event data.attributes["source.type"]
eventanyThe JSON representation of the raw body of the event. It can be a string, number, string array, number array, JSON object, or a JSON array.body
fieldsMap of anySpecifies a JSON object that contains explicit custom fields.attributes
indexstringThe name of the index by which the event data is to be indexed. The index you specify here must be within the list of allowed indexes if the token has the indexes parameter set.TBD, most like will go to attributes
+ + +### Log4j + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FieldTypeDescriptionMaps to Unified Model Field
InstanttimestampTime when an event occurred measured by the origin clock.timestamp
LevelenumLog level.severity
MessagestringHuman readable message.body
All other fieldsanyStructured data.attributes
+ + +### Zap + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FieldTypeDescriptionMaps to Unified Model Field
tstimestampTime when an event occurred measured by the origin clock.timestamp
levelenumLogging level.severity
callerstringCalling function's filename and line number. +attributes, key=TBD
msgstringHuman readable message.body
All other fieldsanyStructured data.attributes
+ + +### Apache + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FieldTypeDescriptionMaps to Unified Model Field
%ttimestampTime when an event occurred measured by the origin clock.timestamp
%astringClient IPattributes["net.peer.ip"]
%AstringServer IPattributes["net.host.ip"]
%hstringRemote hostname. attributes["net.peer.name"]
%mstringThe request method.attributes["http.method"]
%v,%p,%U,%qstringMultiple fields that can be composed into URL.attributes["http.url"]
%>sstringResponse status.attributes["http.status_code"]
All other fieldsanyStructured data.attributes, key=TBD
+ + +### CloudTrail Log Event + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FieldTypeDescriptionMaps to Unified Model Field
eventTimestringThe date and time the request was made, in coordinated universal time (UTC).timestamp
eventSourcestringThe service that the request was made to. This name is typically a short form of the service name without spaces plus .amazonaws.com.resource["service.name"]?
awsRegionstringThe AWS region that the request was made to, such as us-east-2.resource["cloud.region"]
sourceIPAddressstringThe IP address that the request was made from.resource["net.peer.ip"] or resource["net.host.ip"]? TBD
errorCodestringThe AWS service error if the request returns an error.short_description
errorMessagestringIf the request returns an error, the description of the error.body
All other fields*attributes["cloudtrail.*"]
+ + +## Appendix B: `severity_number` example mappings. + +Syslog |WinEvtLog |Log4j |Zap |severity_number +-------------|-----------|------|------|--------------- +- |- |TRACE |- |TRACE +Debug |Verbose |DEBUG |Debug |DEBUG +Informational|Information|INFO |Info |INFO +Notice | | | |INFO2 +Warning |Warning |WARN |Warn |WARN +Error |Error |ERROR |Error |ERROR +Critical |Critical |- |Dpanic|ERROR2 +- |- |- |Panic |ERROR3 +Alert |- |FATAL |Fatal |FATAL + +## Reference + +- Draft discussion of Data Model: + https://docs.google.com/document/d/1ix9_4TQO3o-qyeyNhcOmqAc1MTyr-wnXxxsdWgCMn9c/edit# + +- Example mappings of existing log formats to this Data Model: + https://docs.google.com/document/d/1ix9_4TQO3o-qyeyNhcOmqAc1MTyr-wnXxxsdWgCMn9c/edit?ts=5e990fe2#heading=h.ud60xroz7j2n + +- Discussion of Severity field: + https://docs.google.com/document/d/1WQDz1jF0yKBXe3OibXWfy3g6lor9SvjZ4xT-8uuDCiA/edit# \ No newline at end of file From 60a6255b39940940a126233e7eff6d6ef006e907 Mon Sep 17 00:00:00 2001 From: Tigran Najaryan Date: Tue, 28 Apr 2020 00:38:16 -0400 Subject: [PATCH 2/8] Move content from Google Doc to markdown here --- text/0097-log-data-model.md | 278 +++++++++++++++++++----------------- 1 file changed, 147 insertions(+), 131 deletions(-) diff --git a/text/0097-log-data-model.md b/text/0097-log-data-model.md index 4bb155688..35a16cd31 100644 --- a/text/0097-log-data-model.md +++ b/text/0097-log-data-model.md @@ -6,14 +6,13 @@ Introduce Data Model for Log Records as it is understood by OpenTelemetry. * [Design Notes](#design-notes) * [Requirements](#requirements) * [Field Kinds](#field-kinds) -* [Logical Data Model vs Physical Format](#logical-data-model-vs-physical-format) * [Log and Event Record Definition](#log-and-event-record-definition) * [Field: `timestamp`](#field-timestamp) - * [Trace Context:](#trace-context) + * [Trace Context Fields](#trace-context-fields) * [Field: `trace_id`](#field-traceid) * [Field: `span_id`](#field-spanid) * [Field: `trace_flags`](#field-traceflags) - * [Severity](#severity) + * [Severity Fields](#severity-fields) * [Field: `severity_text`](#field-severitytext) * [Field: `severity_number`](#field-severitynumber) * [Displaying Severity](#displaying-severity) @@ -22,7 +21,9 @@ Introduce Data Model for Log Records as it is understood by OpenTelemetry. * [Field: `body`](#field-body) * [Field: `resource`](#field-resource) * [Field: `attributes`](#field-attributes) -* [Representation](#representation) +* [Example Log Records](#example-log-records) +* [Open Questions](#open-questions) +* [Alternate Design](#alternate-design) * [Prior Art](#prior-art) * [RFC5424 Syslog](#rfc5424-syslog) * [Fluentd Forward Protocol Model](#fluentd-forward-protocol-model) @@ -36,7 +37,7 @@ Introduce Data Model for Log Records as it is understood by OpenTelemetry. * [Apache](#apache) * [CloudTrail Log Event](#cloudtrail-log-event) * [Appendix B: `severity_number` example mappings.](#appendix-b-severitynumber-example-mappings) -* [Reference](#reference) +* [References](#references) ## Motivation @@ -54,7 +55,7 @@ by a logging system. ### Requirements -This Data Model was designed to satisfy the following requirements: +The Data Model was designed to satisfy the following requirements: - It should be possible to unambiguously map existing log formats to this Data Model. Translating log data from an arbitrary log format to this Data Model @@ -66,8 +67,8 @@ This Data Model was designed to satisfy the following requirements: - Translating log data from an arbitrary log format A to this Data Model and then translating from the Data Model to another log format B ideally must - result in a meaningful, lossless translation of log data that is no worse than - a reasonable direct translation from log format A to log format B. + result in a meaningful translation of log data that is no worse than a + reasonable direct translation from log format A to log format B. - It should be possible to efficiently represent the Data Model in concrete implementations that require the data to be stored or transmitted. We @@ -77,7 +78,7 @@ This Data Model was designed to satisfy the following requirements: the Data Model rather than the Data Model itself, but is still useful to keep in mind. -The data model aims to successfully represent 3 sorts of logs and events: +The Data Model aims to successfully represent 3 sorts of logs and events: - System Formats. These are logs and events generated by the operating system and over which we have no control - we cannot change the format or affect what @@ -95,7 +96,7 @@ The data model aims to successfully represent 3 sorts of logs and events: ### Field Kinds -This data model defines a logical model for a log record (irrespective of the +This Data Model defines a logical model for a log record (irrespective of the physical format and encoding of the record). Each record contains 2 kinds of fields: @@ -104,17 +105,18 @@ fields: - Fields stored in the key/value pair lists, which can contain arbitrary values of different types. The keys and values for well-known fields follow semantic conventions for key names and possible values that allow all parties that work - with the field to have the same interpretation of the data (see references to - semantic conventions for Resource and Attributes fields and examples in - Appendix A). + with the field to have the same interpretation of the data. See references to + semantic conventions for `resource` and `attributes` fields and examples in + [Appendix A](#appendix-a-example-mappings). -The reasons for having these 2 kinds of fields is: +The reasons for having these 2 kinds of fields are: - Ability to efficiently represent named top-level fields, which are almost always present (e.g. when using encodings like Protocol Buffers where fields - are enumerated but not named on the wire), and to enforce their types and - domain of values, which is very useful for compiled languages with type - checks. + are enumerated but not named on the wire). + +- Ability to enforce types of named fields, which is very useful for compiled + languages with type checks. - Flexibility to represent less frequent data via key/value pair lists. This includes well-known data that has standardized semantics as well as arbitrary @@ -124,9 +126,9 @@ When designing this data model I followed the following reasoning to make a decision about when to use use a top-level named field: - The field needs to be either mandatory for all records or be frequently - present in well-known log and event formats (such as timestamp) or is expected - to be often present in log records in upcoming logging systems (such as - trace_id). + present in well-known log and event formats (such as `timestamp`) or is + expected to be often present in log records in upcoming logging systems (such + as `trace_id`). - The field’s semantics must be the same for all known log and event formats and can be mapped directly and unambiguously to this data model. @@ -134,22 +136,16 @@ decision about when to use use a top-level named field: Both of the above conditions were required to give the field a place in the top-level structure of the record. -## Logical Data Model vs Physical Format - -The data model does not define the actual encoding and format of the log record -representation. Format definitions will be done in separate OTEPs (e.g the log -records may be represented as msgpack, JSON, Protocol Buffer messages, etc). - ## Log and Event Record Definition -Note: below we use type any, which can be a scalar value (number, string or +Note: below we use type `any`, which can be a scalar value (number, string or boolean), or an array or map of values. Arbitrary deep nesting of values for arrays and maps is allowed (essentially allow to represent an equivalent of a JSON object). -Appendix A contains many examples that show how existing log formats map to the -fields defined below. If there are questions about the meaning of the field -reviewing the examples may be helpful. +[Appendix A](#appendix-a-example-mappings) contains many examples that show how +existing log formats map to the fields defined below. If there are questions +about the meaning of the field reviewing the examples may be helpful. Here is the list of fields in a log record: @@ -168,42 +164,43 @@ attributes | Below is the detailed description of each field. - ### Field: `timestamp` -Type: Timestamp, uint64 nanosecods since Unix epoch +Type: Timestamp, uint64 nanosecods since Unix epoch. Description: Time when the event occurred measured by the origin clock. This field is optional, it may be missing the timestamp is unknown. -### Trace Context: +### Trace Context Fields #### Field: `trace_id` -Type: byte sequence +Type: byte sequence. Description: Optional request trace id. Can be set for logs that are part of request processing and have an assigned trace id. #### Field: `span_id` -Type: byte sequence +Type: byte sequence. Description: Optional span id. Can be set for logs that are part of a particular processing span. If span_id is present trace_id SHOULD be also present. #### Field: `trace_flags` -Type: byte +Type: byte. -Description: Optional trace flag as defined in W3C trace context specification. -At the time of writing the specification defines one flag - the SAMPLED flag. +Description: Optional trace flag as defined in +[W3C trace context](https://www.w3.org/TR/trace-context/#trace-flags) +specification. At the time of writing the specification defines one flag - the +SAMPLED flag. -### Severity +### Severity Fields #### Field: `severity_text` -Type: string +Type: string. Description: the severity text (also known as log level). This is an optional field and is the original string representation as it is known at the source. If @@ -212,7 +209,7 @@ corresponds to the `severity_number` can be used as a substitution. #### Field: `severity_number` -Type: number +Type: number. Description: numerical value of the severity, normalized to values described in this document. This is an optional field. If `severity_number` is missing and @@ -227,7 +224,7 @@ defines the meaning of `severity_number` value: severity_number range|Range name|Meaning ---------------------|----------|------- 1-4 |TRACE |A fine-grained debugging event. Typically disabled in default configurations. -5-8 |DEBUG |A debugging event. Often is not emitted in default configurations. +5-8 |DEBUG |A debugging event. 9-12 |INFO |An informational event. Indicates that an event happened. 13-16 |WARN |A warning event. Not an error but is likely more important than an informational event. 17-20 |ERROR |An error event. Something went wrong. @@ -240,10 +237,10 @@ critical than an error with `severity_number=20`. *Mapping of `severity_number`* -Mappings from existing logging systems and formats (of source format for short) -must define how severity (or log level) of that particular format corresponds to -`severity_number` of this data model based on the meaning listed for each range in -the above table. +Mappings from existing logging systems and formats (or **source format** for +short) must define how severity (or log level) of that particular format +corresponds to `severity_number` of this data model based on the meaning given +for each range in the above table. If the source format has more than one severity that matches a single range in this table then the severities of the source format must be assigned numerical @@ -256,29 +253,29 @@ the following `severity_number` values for the mapping: "Error"->17, "Critical"->18. If the source format has only a single severity that matches the meaning of the -range then it is recommended to assign that severity the initial value of the -range. +range then it is recommended to assign that severity the smallest value of the +range. For example if the source format has an "Informational" log level and no other -log levels with similar meaning then it is recommended to use `severity_number=9` -for "Informational". +log levels with similar meaning then it is recommended to use +`severity_number=9` for "Informational". Source formats that do not define a concept of severity or log level MAY omit -`severity_number` and severity_text fields. Backend and UIs may represent log +`severity_number` and `severity_text` fields. Backend and UI may represent log records with missing severity information distinctly or may interpret log records with missing `severity_number` and `severity_text` fields as if the `severity_number` was set equal to INFO (numeric value of 9). *Reverse Mapping* -When performing a reverse mapping from `severity_number` to a specific format and -the `severity_number` has no corresponding mapping entry for that format then it -is recommended to choose the target severity that is in the same severity range -and is closest numerically. +When performing a reverse mapping from `severity_number` to a specific format +and the `severity_number` has no corresponding mapping entry for that format +then it is recommended to choose the target severity that is in the same +severity range and is closest numerically. For example Zap has only one severity in the INFO range, called "Info". When -doing reverse mapping all values in INFO range (numeric 9-12) will be mapped to -Log4J’s "Info" level. +doing reverse mapping all `severity_number` values in INFO range (numeric 9-12) +will be mapped to Zap’s "Info" level. *Error Semantics* @@ -286,20 +283,22 @@ If `severity_number` is present and has a value of ERROR (numeric 17) or higher then it is an indication that the log record represents an erroneous situation. It is up to the reader of this value to make a decision on how to use this fact (e.g. UIs may display such errors in a different color or have a feature to find -all "errors"). +all erroneous log records). If the log record represents an erroneous event and the source format does not -define a severity or log record field it is recommended to set severity_number -to ERROR (numeric 17) during the mapping process. If the log record represents a -non-erroneous event the `severity_number` field may be omitted or may be set to -any numeric value less than ERROR (numeric 17). The recommended value in this -case is INFO (numeric 9). See Appendix for more mapping examples. +define a severity or log level concept then it is recommended to set +`severity_number` to ERROR (numeric 17) during the mapping process. If the log +record represents a non-erroneous event the `severity_number` field may be +omitted or may be set to any numeric value less than ERROR (numeric 17). The +recommended value in this case is INFO (numeric 9). See +[Appendix B](#appendix-b-severitynumber-example-mappings) for more mapping +examples. #### Displaying Severity The following table defines the recommended short name for each -`severity_number` value (this can be used for example for representing the -`severity_number` in the UI): +`severity_number` value. The hosrt name can be used for example for representing +the `severity_number` in the UI: severity_number|Short Name ---------------|---------- @@ -329,22 +328,22 @@ severity_number|Short Name 24 |FATAL4 When an individual log record is displayed it is recommended to show both -severity_text and seveirty_number values. A recommended combined string in this -case begins with the short name followed by severity_text in parenthesis. +`severity_text` and `severity_number` values. A recommended combined string in +this case begins with the short name followed by `severity_text` in parenthesis. -For example "Informational" Syslog record will be displayed as INFO -(Informational). When for a particular log record the `severity_number` is -defined but the severity_text is missing it is recommended to only show the -short name, e.g. INFO. +For example "Informational" Syslog record will be displayed as **INFO +(Informational)**. When for a particular log record the `severity_number` is +defined but the `severity_text` is missing it is recommended to only show the +short name, e.g. **INFO**. -When drop down lists or other UI elements that are intended to represent the -possible set of values are used for representing the severity it is likely -preferable to use the short names. +When drop down lists (or other UI elements that are intended to represent the +possible set of values) are used for representing the severity it is preferable +to display the short name in such UI elements. For example a dropdown list of severities that allows filtering log records by severities is likely to be more usable if it contains the short names of `severity_number` (and thus has a limited upper bound of elements) compared to a -dropdown list which lists all distinct severity_text values that are known to +dropdown list, which lists all distinct `severity_text` values that are known to the system (which can be a large number of elements, often differing only in capitalization or abbreviated, e.g. "Info" vs "Information"). @@ -356,16 +355,16 @@ another `severity_number` or to numbers in the 1..24 range (or to the corresponding short names). When severity is used in equality or inequality comparisons (for example in -filters in the UIs) the recommendation is to attempt to use both severity_text +filters in the UIs) the recommendation is to attempt to use both `severity_text` and short name of `severity_number` to perform matches. For example if we have a -record with severity_text field equal to "Informational" and `severity_number` +record with `severity_text` field equal to "Informational" and `severity_number` field equal to INFO then it may be preferable from the user experience -perspective to ensure that severity="Informational" and severity="INFO" +perspective to ensure that **severity="Informational"** and **severity="INFO"** conditions both to are TRUE for that record. ### Field: `short_name` -Type: string +Type: string. Description: Short event identifier that does not contain varying parts. `short_name` describes what happened (e.g. "ProcessStarted"). Recommended to be @@ -374,57 +373,56 @@ Typically used for filtering and grouping purposes in backends. ### Field: `body` -Type: any +Type: any. Description: A value containing the body of the log record (see the description -of any type above). Can be for example a human-readable string message +of `any` type above). Can be for example a human-readable string message (including multi-line) describing the event in a free form or it can be a structured data composed of arrays and maps of other values. Can vary for each occurrence of the event coming from the same source. ### Field: `resource` -Type: key/value pair list +Type: key/value pair list. Description: Describes the source of the log, aka [resource](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/overview.md#resources). -"value" is of any type. Multiple occurrences of events coming from the same -event source can happen across time and they all have the same value of -resource. Can contain for example information about the application that emits -the record or about the infrastructure where the application runs. Data formats -that represent this data model may be designed in a manner that allows the -`resource` field to be recorded only once per batch of log records that come -from the same source. SHOULD follow OpenTelemetry -[semantic conventions](https://github.com/open-telemetry/opentelemetry-specification/tree/master/specification/resource/semantic_conventions) -for Resources. +"value" of each pair is of `any` type. Multiple occurrences of events coming +from the same event source can happen across time and they all have the same +value of `resource`. Can contain for example information about the application +that emits the record or about the infrastructure where the application runs. +Data formats that represent this data model may be designed in a manner that +allows the `resource` field to be recorded only once per batch of log records +that come from the same source. SHOULD follow OpenTelemetry +[semantic conventions for Resources](https://github.com/open-telemetry/opentelemetry-specification/tree/master/specification/resource/semantic_conventions). ### Field: `attributes` -Type: key/value pair list +Type: key/value pair list. Description: Additional information about the specific event occurrence. "value" -is of any type. Unlike the resource field, which is fixed for a particular -source, attributes can vary for each occurrence of the event coming from the same -source. Can contain information about the request context (other than -TraceId/SpanId). SHOULD follow OpenTelemetry -[semantic conventions](https://github.com/open-telemetry/opentelemetry-specification/tree/master/specification/trace/semantic_conventions) -for Attributes. +of each pair is of `any` type. Unlike the `resource` field, which is fixed for a +particular source, `attributes` can vary for each occurrence of the event coming +from the same source. Can contain information about the request context (other +than TraceId/SpanId). SHOULD follow OpenTelemetry +[semantic conventions for Attributes](https://github.com/open-telemetry/opentelemetry-specification/tree/master/specification/trace/semantic_conventions). -## Representation +## Example Log Records -Representation of Log Data over the wire and the transmission protocol is to be -defined. +Below are examples that show one possible representation of log records in JSON. +These are just examples to help understand the data model. Don’t treat the +examples as _the_ way to represent this data model in JSON. -Below are examples that show one possible representation in JSON. Note: this is -just an example to help understand the data model, don’t treat it as the way to -represent this data model in JSON, that is still to be defined. +This document does not define the actual encoding and format of the log record +representation. Format definitions will be done in separate OTEPs (e.g the log +records may be represented as msgpack, JSON, Protocol Buffer messages, etc). Example 1 ```json { "timestamp": 1586960586000, // JSON needs to make a decision about - // how to represent nanos. + // how to represent nanoseconds. "attributes": { "http.status_code": 500, "http.url": "http://example.com", @@ -433,7 +431,7 @@ Example 1 "resource": { "service.name": "donut_shop", "service.version": "semver:2.0.0", - "k8s.pod.name": "1138528c-c36e-11e9-a1a7-42010a800198", + "k8s.pod.uid": "1138528c-c36e-11e9-a1a7-42010a800198", }, "trace_id": "f4dbb3edd765f620", // this is a byte sequence // (hex-encoded in JSON) @@ -459,20 +457,41 @@ Example 2 } ``` +## Open Questions + +- Should we store entire + [W3C Trace Context](https://www.w3.org/TR/trace-context/), including + `traceparent` and `tracestate` fields instead of only `trace_flags`? + +- Is `severity_text`/`severity_number` fields design good enough? + +- Early draft of this proposal specified that `timestamp` should be populated + from a monotonic, NTP-synchronized source. I removed this requirement to avoid + confusion. Do we need any requirements for timestamp sources? + +- Is there a need for special treatment of security logs? + +## Alternate Design + +An +[alternate design](https://docs.google.com/document/d/1ix9_4TQO3o-qyeyNhcOmqAc1MTyr-wnXxxsdWgCMn9c/edit?ts=5e990fe2#heading=h.cw69q2ga62p6) +that used an envelop approach was considered but I did not find it to be overall +better than this one. + ## Prior Art ### RFC5424 Syslog -RFC5424 defines structure log data format and protocol. The protocol is -ubiquitous (although unfortunately many implementations don’t follow structured -data recommendations). Here are some drawbacks that do not make Syslog a serious -contender for a data model: +[RFC5424](https://tools.ietf.org/html/rfc5424) defines structured log data +format and protocol. The protocol is ubiquitous (although unfortunately many +implementations don’t follow structured data recommendations). Here are some +drawbacks that do not make Syslog a serious contender for a data model: - While it allows structured attributes the body of the message can be only a string. - Severity is hard-coded to 8 possible numeric values, and does not allow custom - severity labels. + severity texts. - Structured data does not allow arbitrary nesting and is 2-level only. @@ -482,8 +501,9 @@ contender for a data model: ### Fluentd Forward Protocol Model -Forward protocol defines a log Entry concept as a timestamped record. The record -consists of 2 elements: a tag and a map of arbitrary key/value pairs. +[Forward protocol](https://github.com/fluent/fluentd/wiki/Forward-Protocol-Specification-v1) +defines a log Entry concept as a timestamped record. The record consists of 2 +elements: a tag and a map of arbitrary key/value pairs. The model is universal enough to represent any log record. However, here are some drawbacks: @@ -498,7 +518,6 @@ some drawbacks: values. This lack of any naming convention or standardization of key/value pairs makes interoperability difficult. - ## Appendix A. Example Mappings. This section contains examples of mapping of legacy events and logs formats to @@ -566,7 +585,7 @@ the new universal data model. array of maps of string to string A variety of use cases depending on the SDID: Can describe event source identity -Can include data that describes particular occurence of the event. +Can include data that describes particular occurrence of the event. Can be meta-information, e.g. quality of timestamp value. SDID origin.swVersion map to resource["service.version"] @@ -667,7 +686,7 @@ Rest of SDIDs -> attributes["syslog.*"] Properties map of string to any - Additional information about the specific event occurrence. Unlike Dimensions which are fixed for a particular event source, Properties can have different values for each occurence of the event coming from the same event source. + Additional information about the specific event occurrence. Unlike Dimensions which are fixed for a particular event source, Properties can have different values for each occurrence of the event coming from the same event source. attributes @@ -928,25 +947,22 @@ Rest of SDIDs -> attributes["syslog.*"] ## Appendix B: `severity_number` example mappings. -Syslog |WinEvtLog |Log4j |Zap |severity_number --------------|-----------|------|------|--------------- -- |- |TRACE |- |TRACE -Debug |Verbose |DEBUG |Debug |DEBUG -Informational|Information|INFO |Info |INFO -Notice | | | |INFO2 -Warning |Warning |WARN |Warn |WARN -Error |Error |ERROR |Error |ERROR -Critical |Critical |- |Dpanic|ERROR2 -- |- |- |Panic |ERROR3 -Alert |- |FATAL |Fatal |FATAL +|Syslog |WinEvtLog |Log4j |Zap |severity_number| +|-------------|-----------|------|------|---------------| +| | |TRACE | |TRACE | +|Debug |Verbose |DEBUG |Debug |DEBUG | +|Informational|Information|INFO |Info |INFO | +|Notice | | | |INFO2 | +|Warning |Warning |WARN |Warn |WARN | +|Error |Error |ERROR |Error |ERROR | +|Critical |Critical | |Dpanic|ERROR2 | +| | | |Panic |ERROR3 | +|Alert | |FATAL |Fatal |FATAL | -## Reference +## References - Draft discussion of Data Model: https://docs.google.com/document/d/1ix9_4TQO3o-qyeyNhcOmqAc1MTyr-wnXxxsdWgCMn9c/edit# -- Example mappings of existing log formats to this Data Model: - https://docs.google.com/document/d/1ix9_4TQO3o-qyeyNhcOmqAc1MTyr-wnXxxsdWgCMn9c/edit?ts=5e990fe2#heading=h.ud60xroz7j2n - - Discussion of Severity field: https://docs.google.com/document/d/1WQDz1jF0yKBXe3OibXWfy3g6lor9SvjZ4xT-8uuDCiA/edit# \ No newline at end of file From 784e409408ad50b35d99772e6192004b8a3385bd Mon Sep 17 00:00:00 2001 From: Tigran Najaryan Date: Tue, 28 Apr 2020 17:52:52 -0400 Subject: [PATCH 3/8] Address PR comments --- text/0097-log-data-model.md | 346 +++++++++++++++++++----------------- 1 file changed, 183 insertions(+), 163 deletions(-) diff --git a/text/0097-log-data-model.md b/text/0097-log-data-model.md index 35a16cd31..853ed0c14 100644 --- a/text/0097-log-data-model.md +++ b/text/0097-log-data-model.md @@ -7,36 +7,39 @@ Introduce Data Model for Log Records as it is understood by OpenTelemetry. * [Requirements](#requirements) * [Field Kinds](#field-kinds) * [Log and Event Record Definition](#log-and-event-record-definition) - * [Field: `timestamp`](#field-timestamp) + * [Field: `Timestamp`](#field-timestamp) * [Trace Context Fields](#trace-context-fields) - * [Field: `trace_id`](#field-traceid) - * [Field: `span_id`](#field-spanid) - * [Field: `trace_flags`](#field-traceflags) + * [Field: `TraceId`](#field-traceid) + * [Field: `SpanId`](#field-spanid) + * [Field: `TraceFlags`](#field-traceflags) * [Severity Fields](#severity-fields) - * [Field: `severity_text`](#field-severitytext) - * [Field: `severity_number`](#field-severitynumber) + * [Field: `SeverityText`](#field-severitytext) + * [Field: `SeverityNumber`](#field-severitynumber) + * [Mapping of `SeverityNumber`](#mapping-of-severitynumber) + * [Reverse Mapping](#reverse-mapping) + * [Error Semantics](#error-semantics) * [Displaying Severity](#displaying-severity) * [Comparing Severity](#comparing-severity) - * [Field: `short_name`](#field-shortname) - * [Field: `body`](#field-body) - * [Field: `resource`](#field-resource) - * [Field: `attributes`](#field-attributes) + * [Field: `ShortName`](#field-shortname) + * [Field: `Body`](#field-body) + * [Field: `Resource`](#field-resource) + * [Field: `Attributes`](#field-attributes) * [Example Log Records](#example-log-records) * [Open Questions](#open-questions) * [Alternate Design](#alternate-design) * [Prior Art](#prior-art) * [RFC5424 Syslog](#rfc5424-syslog) * [Fluentd Forward Protocol Model](#fluentd-forward-protocol-model) -* [Appendix A. Example Mappings.](#appendix-a-example-mappings) +* [Appendix A. Example Mappings](#appendix-a-example-mappings) * [RFC5424 Syslog](#rfc5424-syslog-1) * [Windows Event Log](#windows-event-log) * [SignalFx Events](#signalfx-events) * [Splunk HEC](#splunk-hec) * [Log4j](#log4j) * [Zap](#zap) - * [Apache](#apache) + * [Apache HTTP Server access log](#apache-http-server-access-log) * [CloudTrail Log Event](#cloudtrail-log-event) -* [Appendix B: `severity_number` example mappings.](#appendix-b-severitynumber-example-mappings) +* [Appendix B: `SeverityNumber` example mappings](#appendix-b-severitynumber-example-mappings) * [References](#references) ## Motivation @@ -51,6 +54,12 @@ The purpose of the data model is to have a common understanding of what a log record is, what data needs to be recorded, transferred, stored and interpreted by a logging system. +This proposal defines a data model for [Standalone +Logs](https://github.com/open-telemetry/oteps/blob/master/text/logs/0091-logs-vocabulary.md#standalone-log). +Relevant parts of it may be adopted for +[Embedded Logs](https://github.com/open-telemetry/oteps/blob/master/text/logs/0091-logs-vocabulary.md#embedded-log) +in a future OTEP. + ## Design Notes ### Requirements @@ -106,7 +115,7 @@ fields: of different types. The keys and values for well-known fields follow semantic conventions for key names and possible values that allow all parties that work with the field to have the same interpretation of the data. See references to - semantic conventions for `resource` and `attributes` fields and examples in + semantic conventions for `Resource` and `Attributes` fields and examples in [Appendix A](#appendix-a-example-mappings). The reasons for having these 2 kinds of fields are: @@ -126,9 +135,9 @@ When designing this data model I followed the following reasoning to make a decision about when to use use a top-level named field: - The field needs to be either mandatory for all records or be frequently - present in well-known log and event formats (such as `timestamp`) or is + present in well-known log and event formats (such as `Timestamp`) or is expected to be often present in log records in upcoming logging systems (such - as `trace_id`). + as `TraceId`). - The field’s semantics must be the same for all known log and event formats and can be mapped directly and unambiguously to this data model. @@ -149,97 +158,98 @@ about the meaning of the field reviewing the examples may be helpful. Here is the list of fields in a log record: -Field Name | ----------------| -timestamp | -trace_id | -span_id | -trace_flags | -severity_text | -severity_number| -short_name | -body | -resource | -attributes | +Field Name |Description +---------------|-------------------------------------------- +Timestamp |Time when the event occurred. +TraceId |Request trace id. +SpanId |Request span id. +TraceFlags |W3C trace flag. +SeverityText |The severity text (also known as log level). +SeverityNumber |Numerical value of the severity. +ShortName |Short event identifier. +Body |The body of the log record. +Resource |Describes the source of the log. +Attributes |Additional information about the event. Below is the detailed description of each field. -### Field: `timestamp` +### Field: `Timestamp` -Type: Timestamp, uint64 nanosecods since Unix epoch. +Type: Timestamp, uint64 nanoseconds since Unix epoch. Description: Time when the event occurred measured by the origin clock. This field is optional, it may be missing the timestamp is unknown. ### Trace Context Fields -#### Field: `trace_id` +#### Field: `TraceId` Type: byte sequence. -Description: Optional request trace id. Can be set for logs that are part of -request processing and have an assigned trace id. +Description: Optional request trace id as defined in +[W3C Trace Context](https://www.w3.org/TR/trace-context/#trace-id). Can be set +for logs that are part of request processing and have an assigned trace id. -#### Field: `span_id` +#### Field: `SpanId` Type: byte sequence. Description: Optional span id. Can be set for logs that are part of a particular -processing span. If span_id is present trace_id SHOULD be also present. +processing span. If SpanId is present TraceId SHOULD be also present. -#### Field: `trace_flags` +#### Field: `TraceFlags` Type: byte. Description: Optional trace flag as defined in -[W3C trace context](https://www.w3.org/TR/trace-context/#trace-flags) +[W3C Trace Context](https://www.w3.org/TR/trace-context/#trace-flags) specification. At the time of writing the specification defines one flag - the SAMPLED flag. ### Severity Fields -#### Field: `severity_text` +#### Field: `SeverityText` Type: string. Description: the severity text (also known as log level). This is an optional field and is the original string representation as it is known at the source. If -this field is missing and `severity_number` is present then the short name that -corresponds to the `severity_number` can be used as a substitution. +this field is missing and `SeverityNumber` is present then the short name that +corresponds to the `SeverityNumber` can be used as a substitution. -#### Field: `severity_number` +#### Field: `SeverityNumber` Type: number. Description: numerical value of the severity, normalized to values described in -this document. This is an optional field. If `severity_number` is missing and -severity_text is present then it may be assumed that `severity_number` is equal +this document. This is an optional field. If `SeverityNumber` is missing and +SeverityText is present then it may be assumed that `SeverityNumber` is equal to INFO (numeric 9) (see the meaning below). -`severity_number` is an integer number. Smaller numerical values correspond to +`SeverityNumber` is an integer number. Smaller numerical values correspond to less severe events (such as debug events), larger numerical values correspond to more severe events (such as errors and critical events). The following table -defines the meaning of `severity_number` value: +defines the meaning of `SeverityNumber` value: -severity_number range|Range name|Meaning ----------------------|----------|------- -1-4 |TRACE |A fine-grained debugging event. Typically disabled in default configurations. -5-8 |DEBUG |A debugging event. -9-12 |INFO |An informational event. Indicates that an event happened. -13-16 |WARN |A warning event. Not an error but is likely more important than an informational event. -17-20 |ERROR |An error event. Something went wrong. -21-24 |FATAL |A fatal error such as application or system crash. +SeverityNumber range|Range name|Meaning +--------------------|----------|------- +1-4 |TRACE |A fine-grained debugging event. Typically disabled in default configurations. +5-8 |DEBUG |A debugging event. +9-12 |INFO |An informational event. Indicates that an event happened. +13-16 |WARN |A warning event. Not an error but is likely more important than an informational event. +17-20 |ERROR |An error event. Something went wrong. +21-24 |FATAL |A fatal error such as application or system crash. Smaller numerical values in each range represent less important (less severe) events. Larger numerical values in each range represent more important (more -severe) events. For example `severity_number=17` describes an error that is less -critical than an error with `severity_number=20`. +severe) events. For example `SeverityNumber=17` describes an error that is less +critical than an error with `SeverityNumber=20`. -*Mapping of `severity_number`* +#### Mapping of `SeverityNumber` Mappings from existing logging systems and formats (or **source format** for short) must define how severity (or log level) of that particular format -corresponds to `severity_number` of this data model based on the meaning given +corresponds to `SeverityNumber` of this data model based on the meaning given for each range in the above table. If the source format has more than one severity that matches a single range in @@ -249,7 +259,7 @@ is. For example if the source format defines "Error" and "Critical" as error events and "Critical" is a more important and more severe situation then we can choose -the following `severity_number` values for the mapping: "Error"->17, +the following `SeverityNumber` values for the mapping: "Error"->17, "Critical"->18. If the source format has only a single severity that matches the meaning of the @@ -258,28 +268,28 @@ range. For example if the source format has an "Informational" log level and no other log levels with similar meaning then it is recommended to use -`severity_number=9` for "Informational". +`SeverityNumber=9` for "Informational". Source formats that do not define a concept of severity or log level MAY omit -`severity_number` and `severity_text` fields. Backend and UI may represent log +`SeverityNumber` and `SeverityText` fields. Backend and UI may represent log records with missing severity information distinctly or may interpret log -records with missing `severity_number` and `severity_text` fields as if the -`severity_number` was set equal to INFO (numeric value of 9). +records with missing `SeverityNumber` and `SeverityText` fields as if the +`SeverityNumber` was set equal to INFO (numeric value of 9). -*Reverse Mapping* +#### Reverse Mapping -When performing a reverse mapping from `severity_number` to a specific format -and the `severity_number` has no corresponding mapping entry for that format +When performing a reverse mapping from `SeverityNumber` to a specific format +and the `SeverityNumber` has no corresponding mapping entry for that format then it is recommended to choose the target severity that is in the same severity range and is closest numerically. For example Zap has only one severity in the INFO range, called "Info". When -doing reverse mapping all `severity_number` values in INFO range (numeric 9-12) +doing reverse mapping all `SeverityNumber` values in INFO range (numeric 9-12) will be mapped to Zap’s "Info" level. -*Error Semantics* +#### Error Semantics -If `severity_number` is present and has a value of ERROR (numeric 17) or higher +If `SeverityNumber` is present and has a value of ERROR (numeric 17) or higher then it is an indication that the log record represents an erroneous situation. It is up to the reader of this value to make a decision on how to use this fact (e.g. UIs may display such errors in a different color or have a feature to find @@ -287,53 +297,53 @@ all erroneous log records). If the log record represents an erroneous event and the source format does not define a severity or log level concept then it is recommended to set -`severity_number` to ERROR (numeric 17) during the mapping process. If the log -record represents a non-erroneous event the `severity_number` field may be +`SeverityNumber` to ERROR (numeric 17) during the mapping process. If the log +record represents a non-erroneous event the `SeverityNumber` field may be omitted or may be set to any numeric value less than ERROR (numeric 17). The recommended value in this case is INFO (numeric 9). See [Appendix B](#appendix-b-severitynumber-example-mappings) for more mapping examples. -#### Displaying Severity +#### Displaying Severity The following table defines the recommended short name for each -`severity_number` value. The hosrt name can be used for example for representing -the `severity_number` in the UI: - -severity_number|Short Name ----------------|---------- -1 |TRACE -2 |TRACE2 -3 |TRACE3 -4 |TRACE4 -5 |DEBUG -6 |DEBUG2 -7 |DEBUG3 -8 |DEBUG4 -9 |INFO -10 |INFO2 -11 |INFO3 -12 |INFO4 -13 |WARN -14 |WARN2 -15 |WARN3 -16 |WARN4 -17 |ERROR -18 |ERROR2 -19 |ERROR3 -20 |ERROR4 -21 |FATAL -22 |FATAL2 -23 |FATAL3 -24 |FATAL4 +`SeverityNumber` value. The short name can be used for example for representing +the `SeverityNumber` in the UI: + +SeverityNumber|Short Name +--------------|---------- +1 |TRACE +2 |TRACE2 +3 |TRACE3 +4 |TRACE4 +5 |DEBUG +6 |DEBUG2 +7 |DEBUG3 +8 |DEBUG4 +9 |INFO +10 |INFO2 +11 |INFO3 +12 |INFO4 +13 |WARN +14 |WARN2 +15 |WARN3 +16 |WARN4 +17 |ERROR +18 |ERROR2 +19 |ERROR3 +20 |ERROR4 +21 |FATAL +22 |FATAL2 +23 |FATAL3 +24 |FATAL4 When an individual log record is displayed it is recommended to show both -`severity_text` and `severity_number` values. A recommended combined string in -this case begins with the short name followed by `severity_text` in parenthesis. +`SeverityText` and `SeverityNumber` values. A recommended combined string in +this case begins with the short name followed by `SeverityText` in parenthesis. For example "Informational" Syslog record will be displayed as **INFO -(Informational)**. When for a particular log record the `severity_number` is -defined but the `severity_text` is missing it is recommended to only show the +(Informational)**. When for a particular log record the `SeverityNumber` is +defined but the `SeverityText` is missing it is recommended to only show the short name, e.g. **INFO**. When drop down lists (or other UI elements that are intended to represent the @@ -342,36 +352,36 @@ to display the short name in such UI elements. For example a dropdown list of severities that allows filtering log records by severities is likely to be more usable if it contains the short names of -`severity_number` (and thus has a limited upper bound of elements) compared to a -dropdown list, which lists all distinct `severity_text` values that are known to +`SeverityNumber` (and thus has a limited upper bound of elements) compared to a +dropdown list, which lists all distinct `SeverityText` values that are known to the system (which can be a large number of elements, often differing only in capitalization or abbreviated, e.g. "Info" vs "Information"). #### Comparing Severity In the contexts where severity participates less-than / greater-than comparisons -`severity_number` field should be used. `severity_number` can be compared to -another `severity_number` or to numbers in the 1..24 range (or to the +`SeverityNumber` field should be used. `SeverityNumber` can be compared to +another `SeverityNumber` or to numbers in the 1..24 range (or to the corresponding short names). When severity is used in equality or inequality comparisons (for example in -filters in the UIs) the recommendation is to attempt to use both `severity_text` -and short name of `severity_number` to perform matches. For example if we have a -record with `severity_text` field equal to "Informational" and `severity_number` +filters in the UIs) the recommendation is to attempt to use both `SeverityText` +and short name of `SeverityNumber` to perform matches. For example if we have a +record with `SeverityText` field equal to "Informational" and `SeverityNumber` field equal to INFO then it may be preferable from the user experience perspective to ensure that **severity="Informational"** and **severity="INFO"** conditions both to are TRUE for that record. -### Field: `short_name` +### Field: `ShortName` Type: string. Description: Short event identifier that does not contain varying parts. -`short_name` describes what happened (e.g. "ProcessStarted"). Recommended to be +`ShortName` describes what happened (e.g. "ProcessStarted"). Recommended to be no longer than 50 characters. Optional. Not guaranteed to be unique in any way. Typically used for filtering and grouping purposes in backends. -### Field: `body` +### Field: `Body` Type: any. @@ -381,7 +391,7 @@ of `any` type above). Can be for example a human-readable string message structured data composed of arrays and maps of other values. Can vary for each occurrence of the event coming from the same source. -### Field: `resource` +### Field: `Resource` Type: key/value pair list. @@ -389,20 +399,20 @@ Description: Describes the source of the log, aka [resource](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/overview.md#resources). "value" of each pair is of `any` type. Multiple occurrences of events coming from the same event source can happen across time and they all have the same -value of `resource`. Can contain for example information about the application +value of `Resource`. Can contain for example information about the application that emits the record or about the infrastructure where the application runs. Data formats that represent this data model may be designed in a manner that -allows the `resource` field to be recorded only once per batch of log records +allows the `Resource` field to be recorded only once per batch of log records that come from the same source. SHOULD follow OpenTelemetry [semantic conventions for Resources](https://github.com/open-telemetry/opentelemetry-specification/tree/master/specification/resource/semantic_conventions). -### Field: `attributes` +### Field: `Attributes` Type: key/value pair list. Description: Additional information about the specific event occurrence. "value" -of each pair is of `any` type. Unlike the `resource` field, which is fixed for a -particular source, `attributes` can vary for each occurrence of the event coming +of each pair is of `any` type. Unlike the `Resource` field, which is fixed for a +particular source, `Attributes` can vary for each occurrence of the event coming from the same source. Can contain information about the request context (other than TraceId/SpanId). SHOULD follow OpenTelemetry [semantic conventions for Attributes](https://github.com/open-telemetry/opentelemetry-specification/tree/master/specification/trace/semantic_conventions). @@ -419,35 +429,36 @@ records may be represented as msgpack, JSON, Protocol Buffer messages, etc). Example 1 -```json +```javascript { - "timestamp": 1586960586000, // JSON needs to make a decision about + "Timestamp": 1586960586000, // JSON needs to make a decision about // how to represent nanoseconds. - "attributes": { + "Attributes": { "http.status_code": 500, "http.url": "http://example.com", "my.custom.application.tag": "hello", }, - "resource": { + "Resource": { "service.name": "donut_shop", "service.version": "semver:2.0.0", "k8s.pod.uid": "1138528c-c36e-11e9-a1a7-42010a800198", }, - "trace_id": "f4dbb3edd765f620", // this is a byte sequence + "TraceId": "f4dbb3edd765f620", // this is a byte sequence // (hex-encoded in JSON) - "span_id": "43222c2d51a7abe3", - "severity": "INFO", - "body": "20200415T072306-0700 INFO I like donuts" + "SpanId": "43222c2d51a7abe3", + "SeverityText": "INFO", + "SeverityNumber": 9, + "Body": "20200415T072306-0700 INFO I like donuts" } ``` Example 2 -```json +```javascript { - "timestamp": 1586960586000, + "Timestamp": 1586960586000, ... - "body": { + "Body": { "i": "am", "an": "event", "of": { @@ -457,15 +468,32 @@ Example 2 } ``` +Example 3 + +```javascript +{ + "Timestamp": 1586960586000, + "Attributes":{ + "http.scheme":"https", + "http.host":"donut.mycie.com", + "http.target":"/order", + "http.method":"post", + "http.status_code":500, + "http.flavor":"1.1", + "http.user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36", + } +} +``` + ## Open Questions - Should we store entire [W3C Trace Context](https://www.w3.org/TR/trace-context/), including - `traceparent` and `tracestate` fields instead of only `trace_flags`? + `traceparent` and `tracestate` fields instead of only `TraceFlags`? -- Is `severity_text`/`severity_number` fields design good enough? +- Is `SeverityText`/`SeverityNumber` fields design good enough? -- Early draft of this proposal specified that `timestamp` should be populated +- Early draft of this proposal specified that `Timestamp` should be populated from a monotonic, NTP-synchronized source. I removed this requirement to avoid confusion. Do we need any requirements for timestamp sources? @@ -518,10 +546,10 @@ some drawbacks: values. This lack of any naming convention or standardization of key/value pairs makes interoperability difficult. -## Appendix A. Example Mappings. +## Appendix A. Example Mappings -This section contains examples of mapping of legacy events and logs formats to -the new universal data model. +This section contains examples of mapping of other events and logs formats to +this data model. ### RFC5424 Syslog @@ -601,7 +629,6 @@ Rest of SDIDs -> attributes["syslog.*"] - ### Windows Event Log @@ -649,7 +676,6 @@ Rest of SDIDs -> attributes["syslog.*"]
- ### SignalFx Events @@ -691,7 +717,6 @@ Rest of SDIDs -> attributes["syslog.*"]
- ### Splunk HEC @@ -751,7 +776,6 @@ Rest of SDIDs -> attributes["syslog.*"]
- ### Log4j @@ -787,7 +811,6 @@ Rest of SDIDs -> attributes["syslog.*"]
- ### Zap @@ -830,8 +853,7 @@ Rest of SDIDs -> attributes["syslog.*"]
- -### Apache +### Apache HTTP Server access log @@ -890,7 +912,6 @@ Rest of SDIDs -> attributes["syslog.*"]
- ### CloudTrail Log Event @@ -944,25 +965,24 @@ Rest of SDIDs -> attributes["syslog.*"]
- -## Appendix B: `severity_number` example mappings. - -|Syslog |WinEvtLog |Log4j |Zap |severity_number| -|-------------|-----------|------|------|---------------| -| | |TRACE | |TRACE | -|Debug |Verbose |DEBUG |Debug |DEBUG | -|Informational|Information|INFO |Info |INFO | -|Notice | | | |INFO2 | -|Warning |Warning |WARN |Warn |WARN | -|Error |Error |ERROR |Error |ERROR | -|Critical |Critical | |Dpanic|ERROR2 | -| | | |Panic |ERROR3 | -|Alert | |FATAL |Fatal |FATAL | +## Appendix B: `SeverityNumber` example mappings + +|Syslog |WinEvtLog |Log4j |Zap |java.util.logging|SeverityNumber| +|-------------|-----------|------|------|-----------------|--------------| +| | |TRACE | | TRACE |TRACE | +|Debug |Verbose |DEBUG |Debug | FINER |DEBUG | +| | | | | FINE |DEBUG2 | +| | | | | CONFIG |DEBUG3 | +|Informational|Information|INFO |Info | INFO |INFO | +|Notice | | | | |INFO2 | +|Warning |Warning |WARN |Warn | WARNING |WARN | +|Error |Error |ERROR |Error | SEVERE |ERROR | +|Critical |Critical | |Dpanic| |ERROR2 | +|Emergency | | |Panic | |ERROR3 | +|Alert | |FATAL |Fatal | |FATAL | ## References -- Draft discussion of Data Model: - https://docs.google.com/document/d/1ix9_4TQO3o-qyeyNhcOmqAc1MTyr-wnXxxsdWgCMn9c/edit# +- [Draft discussion of Data Model](https://docs.google.com/document/d/1ix9_4TQO3o-qyeyNhcOmqAc1MTyr-wnXxxsdWgCMn9c/edit#) -- Discussion of Severity field: - https://docs.google.com/document/d/1WQDz1jF0yKBXe3OibXWfy3g6lor9SvjZ4xT-8uuDCiA/edit# \ No newline at end of file +- [Discussion of Severity field](https://docs.google.com/document/d/1WQDz1jF0yKBXe3OibXWfy3g6lor9SvjZ4xT-8uuDCiA/edit#) From da165a1abe730921428271ed3967e2b6c1d52845 Mon Sep 17 00:00:00 2001 From: Tigran Najaryan Date: Mon, 11 May 2020 12:52:40 -0400 Subject: [PATCH 4/8] Add Google Cloud Logging mapping --- text/0097-log-data-model.md | 133 ++++++++++++++++++++---------------- 1 file changed, 75 insertions(+), 58 deletions(-) diff --git a/text/0097-log-data-model.md b/text/0097-log-data-model.md index 853ed0c14..c2134a26e 100644 --- a/text/0097-log-data-model.md +++ b/text/0097-log-data-model.md @@ -39,6 +39,7 @@ Introduce Data Model for Log Records as it is understood by OpenTelemetry. * [Zap](#zap) * [Apache HTTP Server access log](#apache-http-server-access-log) * [CloudTrail Log Event](#cloudtrail-log-event) + * [Google Cloud Logging](#google-cloud-logging) * [Appendix B: `SeverityNumber` example mappings](#appendix-b-severitynumber-example-mappings) * [References](#references) @@ -562,51 +563,51 @@ this data model. TIMESTAMP - timestamp + Timestamp Time when an event occurred measured by the origin clock. - timestamp + Timestamp SEVERITY enum Defines the importance of the event. Example: `Debug` - severity + Severity FACILITY enum Describes where the event originated. A predefined list of Unix processes. Part of event source identity. Example: `mail system` - attributes["syslog.facility"] + Attributes["syslog.facility"] VERSION number Meta: protocol version, orthogonal to the event. - attributes["syslog.version"] + Attributes["syslog.version"] HOSTNAME string Describes the location where the event originated. Possible values are FQDN, IP address, etc. - resource["host.hostname"] + Resource["host.hostname"] APP-NAME string User-defined app name. Part of event source identity. - resource["service.name"] + Resource["service.name"] PROCID string Not well defined. May be used as a meta field for protocol operation purposes or may be part of event source identity. - attributes["syslog.procid"] + Attributes["syslog.procid"] MSGID string Defines the type of the event. Part of event source identity. Example: "TCPIN" - short_description + ShortName STRUCTURED-DATA @@ -615,17 +616,17 @@ this data model. Can describe event source identity Can include data that describes particular occurrence of the event. Can be meta-information, e.g. quality of timestamp value. - SDID origin.swVersion map to resource["service.version"] + SDID origin.swVersion map to Resource["service.version"] SDID origin.ip map to attribute[net.host.ip"] -Rest of SDIDs -> attributes["syslog.*"] +Rest of SDIDs -> Attributes["syslog.*"] MSG string Free-form text message about the event. Typically human readable. - body + Body @@ -640,39 +641,39 @@ Rest of SDIDs -> attributes["syslog.*"] TimeCreated - timestamp + Timestamp The time stamp that identifies when the event was logged. - timestamp + Timestamp Level enum Contains the severity level of the event. - severity + Severity Computer string The name of the computer on which the event occurred. - resource["host.hostname"] + Resource["host.hostname"] EventID uint The identifier that the provider used to identify the event. - short_description + ShortName Message string The message string. - body + Body Rest of the fields. any All other fields in the event. - attributes["winlog.*"] + Attributes["winlog.*"] @@ -687,33 +688,33 @@ Rest of SDIDs -> attributes["syslog.*"] Timestamp - timestamp + Timestamp Time when the event occurred measured by the origin clock. - timestamp + Timestamp EventType string Short machine understandable string describing the event type. SignalFx specific concept. Non-namespaced. Example: k8s Event Reason field. - short_description + ShortName Category enum Describes where the event originated and why. SignalFx specific concept. Example: AGENT. - attributes["com.splunk.signalfx.event_category"] + Attributes["com.splunk.signalfx.event_category"] Dimensions map of string to string Helps to define the identity of the event source together with EventType and Category. Multiple occurrences of events coming from the same event source can happen across time and they all have the value of Dimensions. - resource + Resource Properties map of string to any Additional information about the specific event occurrence. Unlike Dimensions which are fixed for a particular event source, Properties can have different values for each occurrence of the event coming from the same event source. - attributes + Attributes @@ -730,7 +731,7 @@ Rest of SDIDs -> attributes["syslog.*"] time numeric, string The event time in epoch time format, in seconds. - timestamp + Timestamp @@ -742,31 +743,31 @@ Rest of SDIDs -> attributes["syslog.*"] host string The host value to assign to the event data. This is typically the host name of the client that you are sending data from. - resource["host.hostname"] + Resource["host.hostname"] source string The source value to assign to the event data. For example, if you are sending data from an app you are developing, you could set this key to the name of the app. - resource["service.name"] + Resource["service.name"] sourcetype string The sourcetype value to assign to the event data. - attributes["source.type"] + Attributes["source.type"] event any The JSON representation of the raw body of the event. It can be a string, number, string array, number array, JSON object, or a JSON array. - body + Body fields Map of any Specifies a JSON object that contains explicit custom fields. - attributes + Attributes index @@ -787,27 +788,27 @@ Rest of SDIDs -> attributes["syslog.*"] Instant - timestamp + Timestamp Time when an event occurred measured by the origin clock. - timestamp + Timestamp Level enum Log level. - severity + Severity Message string Human readable message. - body + Body All other fields any Structured data. - attributes + Attributes @@ -822,34 +823,34 @@ Rest of SDIDs -> attributes["syslog.*"] ts - timestamp + Timestamp Time when an event occurred measured by the origin clock. - timestamp + Timestamp level enum Logging level. - severity + Severity caller string Calling function's filename and line number. - attributes, key=TBD + Attributes, key=TBD msg string Human readable message. - body + Body All other fields any Structured data. - attributes + Attributes @@ -864,51 +865,51 @@ Rest of SDIDs -> attributes["syslog.*"] %t - timestamp + Timestamp Time when an event occurred measured by the origin clock. - timestamp + Timestamp %a string Client IP - attributes["net.peer.ip"] + Attributes["net.peer.ip"] %A string Server IP - attributes["net.host.ip"] + Attributes["net.host.ip"] %h string Remote hostname. - attributes["net.peer.name"] + Attributes["net.peer.name"] %m string The request method. - attributes["http.method"] + Attributes["http.method"] %v,%p,%U,%q string Multiple fields that can be composed into URL. - attributes["http.url"] + Attributes["http.url"] %>s string Response status. - attributes["http.status_code"] + Attributes["http.status_code"] All other fields any Structured data. - attributes, key=TBD + Attributes, key=TBD @@ -925,46 +926,62 @@ Rest of SDIDs -> attributes["syslog.*"] eventTime string The date and time the request was made, in coordinated universal time (UTC). - timestamp + Timestamp eventSource string The service that the request was made to. This name is typically a short form of the service name without spaces plus .amazonaws.com. - resource["service.name"]? + Resource["service.name"]? awsRegion string The AWS region that the request was made to, such as us-east-2. - resource["cloud.region"] + Resource["cloud.region"] sourceIPAddress string The IP address that the request was made from. - resource["net.peer.ip"] or resource["net.host.ip"]? TBD + Resource["net.peer.ip"] or Resource["net.host.ip"]? TBD errorCode string The AWS service error if the request returns an error. - short_description + ShortName errorMessage string If the request returns an error, the description of the error. - body + Body All other fields * - attributes["cloudtrail.*"] + Attributes["cloudtrail.*"] +### Google Cloud Logging + +Field | Type | Description | Maps to Unified Model Field +-----------------|--------------------| ------------------------------------------------------- | --------------------------- +timestamp | string | The time the event described by the log entry occurred. | Timestamp +resource | MonitoredResource | The monitored resource that produced this log entry. | Resource +log_name | string | The URL-encoded LOG_ID suffix of the log_name field identifies which log stream this entry belongs to. | ShortName +json_payload | google.protobuf.Struct | The log entry payload, represented as a structure that is expressed as a JSON object. | Body +proto_payload | google.protobuf.Any | The log entry payload, represented as a protocol buffer. | Body +text_payload | string | The log entry payload, represented as a Unicode string (UTF-8). | Body +severity | LogSeverity | The severity of the log entry. | Severity +trace | string | The trace associated with the log entry, if any. | TraceId +span_id | string | The span ID within the trace associated with the log entry. | SpanId +labels | map | A set of user-defined (key, value) data that provides additional information about the log entry. | Attributes +All other fields | | | Attributes["google.*"] + ## Appendix B: `SeverityNumber` example mappings |Syslog |WinEvtLog |Log4j |Zap |java.util.logging|SeverityNumber| From 2bd16360b86d6b441adc8b0bfa8f55b5d2e8ef9e Mon Sep 17 00:00:00 2001 From: Tigran Najaryan Date: Mon, 11 May 2020 18:57:03 -0400 Subject: [PATCH 5/8] Address PR comments --- text/0097-log-data-model.md | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/text/0097-log-data-model.md b/text/0097-log-data-model.md index c2134a26e..fb13339ca 100644 --- a/text/0097-log-data-model.md +++ b/text/0097-log-data-model.md @@ -132,7 +132,7 @@ The reasons for having these 2 kinds of fields are: includes well-known data that has standardized semantics as well as arbitrary custom data that the application may want to include in the logs. -When designing this data model I followed the following reasoning to make a +When designing this data model we followed the following reasoning to make a decision about when to use use a top-level named field: - The field needs to be either mandatory for all records or be frequently @@ -216,16 +216,14 @@ Type: string. Description: the severity text (also known as log level). This is an optional field and is the original string representation as it is known at the source. If this field is missing and `SeverityNumber` is present then the short name that -corresponds to the `SeverityNumber` can be used as a substitution. +corresponds to the `SeverityNumber` may be used as a substitution. #### Field: `SeverityNumber` Type: number. Description: numerical value of the severity, normalized to values described in -this document. This is an optional field. If `SeverityNumber` is missing and -SeverityText is present then it may be assumed that `SeverityNumber` is equal -to INFO (numeric 9) (see the meaning below). +this document. This is an optional field. `SeverityNumber` is an integer number. Smaller numerical values correspond to less severe events (such as debug events), larger numerical values correspond to @@ -360,9 +358,9 @@ capitalization or abbreviated, e.g. "Info" vs "Information"). #### Comparing Severity -In the contexts where severity participates less-than / greater-than comparisons -`SeverityNumber` field should be used. `SeverityNumber` can be compared to -another `SeverityNumber` or to numbers in the 1..24 range (or to the +In the contexts where severity participates in less-than / greater-than +comparisons `SeverityNumber` field should be used. `SeverityNumber` can be +compared to another `SeverityNumber` or to numbers in the 1..24 range (or to the corresponding short names). When severity is used in equality or inequality comparisons (for example in @@ -986,7 +984,7 @@ All other fields | | |Syslog |WinEvtLog |Log4j |Zap |java.util.logging|SeverityNumber| |-------------|-----------|------|------|-----------------|--------------| -| | |TRACE | | TRACE |TRACE | +| | |TRACE | | FINEST |TRACE | |Debug |Verbose |DEBUG |Debug | FINER |DEBUG | | | | | | FINE |DEBUG2 | | | | | | CONFIG |DEBUG3 | From 186433615a96a2dad8d5066caa4d6b17592efcde Mon Sep 17 00:00:00 2001 From: Tigran Najaryan Date: Tue, 12 May 2020 11:18:43 -0400 Subject: [PATCH 6/8] Resolve Open Questions --- text/0097-log-data-model.md | 42 ++++++++++++++++++++++++++++--------- 1 file changed, 32 insertions(+), 10 deletions(-) diff --git a/text/0097-log-data-model.md b/text/0097-log-data-model.md index fb13339ca..5a8072adc 100644 --- a/text/0097-log-data-model.md +++ b/text/0097-log-data-model.md @@ -484,19 +484,41 @@ Example 3 } ``` -## Open Questions +## Questions Resolved during OTEP discussion -- Should we store entire - [W3C Trace Context](https://www.w3.org/TR/trace-context/), including - `traceparent` and `tracestate` fields instead of only `TraceFlags`? - -- Is `SeverityText`/`SeverityNumber` fields design good enough? +These were Open Questions that were discussed and resolved in +[OTEP Pull Request]( https://github.com/open-telemetry/oteps/pull/97) + +### TraceFlags vs TraceParent and TraceState + +Question: Should we store entire +[W3C Trace Context](https://www.w3.org/TR/trace-context/), including +`traceparent` and `tracestate` fields instead of only `TraceFlags`? + +Answer: the discussion did not reveal any evidence that `traceparent` and +`tracestate` are needed. + +### Severity Fields + +Question: Is `SeverityText`/`SeverityNumber` fields design good enough? + +Answer: Discussions have shown that the design is reasonable. + +### Timestamp Requirements + +Question: Early draft of this proposal specified that `Timestamp` should be +populated from a monotonic, NTP-synchronized source. We removed this requirement +to avoid confusion. Do we need any requirements for timestamp sources? + +Answer: discussions revealed that it is not data model's responsibility to +specify such requirements. + +### Security Logs -- Early draft of this proposal specified that `Timestamp` should be populated - from a monotonic, NTP-synchronized source. I removed this requirement to avoid - confusion. Do we need any requirements for timestamp sources? +Question: Is there a need for special treatment of security logs? -- Is there a need for special treatment of security logs? +Answer: discussions in the OTEP did not reveal the need for any special +treatment of security logs in the context of the data model proposal. ## Alternate Design From 0f6b887a23b435815570875aa6574b774a51c0be Mon Sep 17 00:00:00 2001 From: Tigran Najaryan Date: Wed, 20 May 2020 13:25:03 -0400 Subject: [PATCH 7/8] Add ECS mapping --- text/0097-log-data-model.md | 331 +++++++++++++++++++++++++++++++++++- 1 file changed, 323 insertions(+), 8 deletions(-) diff --git a/text/0097-log-data-model.md b/text/0097-log-data-model.md index 5a8072adc..6b3ecd285 100644 --- a/text/0097-log-data-model.md +++ b/text/0097-log-data-model.md @@ -179,7 +179,7 @@ Below is the detailed description of each field. Type: Timestamp, uint64 nanoseconds since Unix epoch. Description: Time when the event occurred measured by the origin clock. This -field is optional, it may be missing the timestamp is unknown. +field is optional, it may be missing if the timestamp is unknown. ### Trace Context Fields @@ -423,7 +423,7 @@ These are just examples to help understand the data model. Don’t treat the examples as _the_ way to represent this data model in JSON. This document does not define the actual encoding and format of the log record -representation. Format definitions will be done in separate OTEPs (e.g the log +representation. Format definitions will be done in separate OTEPs (e.g. the log records may be represented as msgpack, JSON, Protocol Buffer messages, etc). Example 1 @@ -753,12 +753,6 @@ Rest of SDIDs -> Attributes["syslog.*"] The event time in epoch time format, in seconds. Timestamp - - - - - - host string @@ -1002,6 +996,327 @@ span_id | string | The span ID within the trace associated labels | map | A set of user-defined (key, value) data that provides additional information about the log entry. | Attributes All other fields | | | Attributes["google.*"] +## Elastic Common Schema + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FieldTypeDescriptionMaps to Unified Model Field
@timestampdatetimeTime the event was recordedtimestamp
messagestringAny type of messagebody
labelskey/valueArbitrary labels related to the eventattributes[*]
tagsarray of stringList of values related to the event?
trace.idstringTrace IDtrace_id
span.id*stringSpan IDspan_id
agent.ephemeral_idstringEphemeral ID created by agent**resource
agent.idstringUnique identifier of this agent**resource
agent.namestringName given to the agentresource["telemetry.sdk.name"]
agent.typestringType of agentresource["telemetry.sdk.language"]
agent.versionstringVersion of agentresource["telemetry.sdk.version"]
source.ip, client.ipstringThe IP address that the request was made from.attributes["net.peer.ip"] or attributes["net.host.ip"]
cloud.account.idstringID of the account in the given cloudresource["cloud.account.id"]
cloud.availability_zonestringAvailability zone in which this host is running.resource["cloud.zone"]
cloud.instance.idstringInstance ID of the host machine.**resource
cloud.instance.namestringInstance name of the host machine.**resource
cloud.machine.typestringMachine type of the host machine.**resource
cloud.providerstringName of the cloud provider. Example values are aws, azure, gcp, or digitalocean.resource["cloud.provider"]
cloud.regionstringRegion in which this host is running.resource["cloud.region"]
cloud.image.id*stringresource["host.image.name"]
container.idstringUnique container idresource["container.id"]
container.image.namestringName of the image the container was built on.resource["container.image.name"]
container.image.tagArray of stringContainer image tags.**resource
container.labelskey/valueImage labels.attributes[*]
container.namestringContainer name.resource["container.name"]
container.runtimestringRuntime managing this container. Example: "docker"**resource
destination.addressstringDestination address for the eventattributes["destination.address"]
error.codestringError code describing the error.attributes["error.code"]
error.idstringUnique identifier for the error.attributes["error.id"]
error.messagestringError message.attributes["error.message"]
error.stack_tracestringThe stack trace of this error in plain text.attributes["error.stack_trace]
host.architecturestringOperating system architecture**resource
host.domainstringName of the domain of which the host is a member. + +For example, on Windows this could be the host’s Active Directory domain or NetBIOS domain name. For Linux this could be the domain of the host’s LDAP provider.**resource
host.hostnamestringHostname of the host. + +It normally contains what the hostname command returns on the host machine.resource["host.hostname"]
host.idstringUnique host id.resource["host.id"]
host.ipArray of stringHost IPresource["host.ip"]
host.macarray of stringMAC addresses of the hostresource["host.mac"]
host.namestringName of the host. + +It may contain what hostname returns on Unix systems, the fully qualified, or a name specified by the user. resource["host.name"]
host.typestringType of host.resource["host.type"]
host.uptimestringSeconds the host has been up.?
service.ephemeral_id + +stringEphemeral identifier of this service**resource
service.idstringUnique identifier of the running service. If the service is comprised of many nodes, the service.id should be the same for all nodes.**resource
service.namestringName of the service data is collected from.resource["service.name"]
service.node.namestringSpecific node serving that serviceresource["service.instance.id"]
service.statestringCurrent state of the service.attributes["service.state"]
service.typestringThe type of the service data is collected from.**resource
service.versionstringVersion of the service the data was collected from.resource["service.version"]
+ +\* Not yet formalized into ECS. + +\*\* A resource that doesn’t exist in the +[OpenTelemetry resource semantic convention](https://github.com/open-telemetry/opentelemetry-specification/tree/master/specification/resource/semantic_conventions). + +This is a selection of the most relevant fields. See +[for the full reference](https://www.elastic.co/guide/en/ecs/current/ecs-field-reference.html) +for an exhaustive list. + ## Appendix B: `SeverityNumber` example mappings |Syslog |WinEvtLog |Log4j |Zap |java.util.logging|SeverityNumber| From 9b721eeaa83024004f4909d1611bc32f1621eaea Mon Sep 17 00:00:00 2001 From: Tigran Najaryan Date: Fri, 22 May 2020 08:14:03 -0400 Subject: [PATCH 8/8] Address PR comments --- text/0097-log-data-model.md | 78 ++++++++++++++++++++----------------- 1 file changed, 43 insertions(+), 35 deletions(-) diff --git a/text/0097-log-data-model.md b/text/0097-log-data-model.md index 6b3ecd285..e01fa8b1e 100644 --- a/text/0097-log-data-model.md +++ b/text/0097-log-data-model.md @@ -133,7 +133,7 @@ The reasons for having these 2 kinds of fields are: custom data that the application may want to include in the logs. When designing this data model we followed the following reasoning to make a -decision about when to use use a top-level named field: +decision about when to use a top-level named field: - The field needs to be either mandatory for all records or be frequently present in well-known log and event formats (such as `Timestamp`) or is @@ -187,25 +187,27 @@ field is optional, it may be missing if the timestamp is unknown. Type: byte sequence. -Description: Optional request trace id as defined in +Description: Request trace id as defined in [W3C Trace Context](https://www.w3.org/TR/trace-context/#trace-id). Can be set -for logs that are part of request processing and have an assigned trace id. +for logs that are part of request processing and have an assigned trace id. This +field is optional. #### Field: `SpanId` Type: byte sequence. -Description: Optional span id. Can be set for logs that are part of a particular -processing span. If SpanId is present TraceId SHOULD be also present. +Description: Span id. Can be set for logs that are part of a particular +processing span. If SpanId is present TraceId SHOULD be also present. This field +is optional. #### Field: `TraceFlags` Type: byte. -Description: Optional trace flag as defined in +Description: Trace flag as defined in [W3C Trace Context](https://www.w3.org/TR/trace-context/#trace-flags) specification. At the time of writing the specification defines one flag - the -SAMPLED flag. +SAMPLED flag. This field is optional. ### Severity Fields @@ -213,17 +215,18 @@ SAMPLED flag. Type: string. -Description: the severity text (also known as log level). This is an optional -field and is the original string representation as it is known at the source. If -this field is missing and `SeverityNumber` is present then the short name that -corresponds to the `SeverityNumber` may be used as a substitution. +Description: severity text (also known as log level). This is the original +string representation of the severity as it is known at the source. If this +field is missing and `SeverityNumber` is present then the short name that +corresponds to the `SeverityNumber` may be used as a substitution. This field is +optional. #### Field: `SeverityNumber` Type: number. Description: numerical value of the severity, normalized to values described in -this document. This is an optional field. +this document. This field is optional. `SeverityNumber` is an integer number. Smaller numerical values correspond to less severe events (such as debug events), larger numerical values correspond to @@ -365,11 +368,12 @@ corresponding short names). When severity is used in equality or inequality comparisons (for example in filters in the UIs) the recommendation is to attempt to use both `SeverityText` -and short name of `SeverityNumber` to perform matches. For example if we have a -record with `SeverityText` field equal to "Informational" and `SeverityNumber` -field equal to INFO then it may be preferable from the user experience -perspective to ensure that **severity="Informational"** and **severity="INFO"** -conditions both to are TRUE for that record. +and short name of `SeverityNumber` to perform matches (i.e. equality with either +of these fields should be considered a match). For example if we have a record +with `SeverityText` field equal to "Informational" and `SeverityNumber` field +equal to INFO then it may be preferable from the user experience perspective to +ensure that **severity="Informational"** and **severity="INFO"** conditions both +to are TRUE for that record. ### Field: `ShortName` @@ -377,8 +381,8 @@ Type: string. Description: Short event identifier that does not contain varying parts. `ShortName` describes what happened (e.g. "ProcessStarted"). Recommended to be -no longer than 50 characters. Optional. Not guaranteed to be unique in any way. -Typically used for filtering and grouping purposes in backends. +no longer than 50 characters. Not guaranteed to be unique in any way. Typically +used for filtering and grouping purposes in backends. This field is optional. ### Field: `Body` @@ -388,7 +392,7 @@ Description: A value containing the body of the log record (see the description of `any` type above). Can be for example a human-readable string message (including multi-line) describing the event in a free form or it can be a structured data composed of arrays and maps of other values. Can vary for each -occurrence of the event coming from the same source. +occurrence of the event coming from the same source. This field is optional. ### Field: `Resource` @@ -396,25 +400,29 @@ Type: key/value pair list. Description: Describes the source of the log, aka [resource](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/overview.md#resources). -"value" of each pair is of `any` type. Multiple occurrences of events coming -from the same event source can happen across time and they all have the same -value of `Resource`. Can contain for example information about the application -that emits the record or about the infrastructure where the application runs. -Data formats that represent this data model may be designed in a manner that -allows the `Resource` field to be recorded only once per batch of log records -that come from the same source. SHOULD follow OpenTelemetry +"key" of each pair is a `string` and "value" is of `any` type. Multiple +occurrences of events coming from the same event source can happen across time +and they all have the same value of `Resource`. Can contain for example +information about the application that emits the record or about the +infrastructure where the application runs. Data formats that represent this data +model may be designed in a manner that allows the `Resource` field to be +recorded only once per batch of log records that come from the same source. +SHOULD follow OpenTelemetry [semantic conventions for Resources](https://github.com/open-telemetry/opentelemetry-specification/tree/master/specification/resource/semantic_conventions). +This field is optional. ### Field: `Attributes` Type: key/value pair list. -Description: Additional information about the specific event occurrence. "value" -of each pair is of `any` type. Unlike the `Resource` field, which is fixed for a -particular source, `Attributes` can vary for each occurrence of the event coming -from the same source. Can contain information about the request context (other -than TraceId/SpanId). SHOULD follow OpenTelemetry +Description: Additional information about the specific event occurrence. "key" +of each pair is a `string` and "value" is of `any` type. Unlike the `Resource` +field, which is fixed for a particular source, `Attributes` can vary for each +occurrence of the event coming from the same source. Can contain information +about the request context (other than TraceId/SpanId). SHOULD follow +OpenTelemetry [semantic conventions for Attributes](https://github.com/open-telemetry/opentelemetry-specification/tree/master/specification/trace/semantic_conventions). +This field is optional. ## Example Log Records @@ -443,7 +451,7 @@ Example 1 "k8s.pod.uid": "1138528c-c36e-11e9-a1a7-42010a800198", }, "TraceId": "f4dbb3edd765f620", // this is a byte sequence - // (hex-encoded in JSON) + // (hex-encoded in JSON) "SpanId": "43222c2d51a7abe3", "SeverityText": "INFO", "SeverityNumber": 9, @@ -1033,13 +1041,13 @@ All other fields | | trace.id string Trace ID - trace_id + TraceId span.id* string Span ID - span_id + SpanId agent.ephemeral_id