You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Index sorting is what makes LogsDB so storage efficient. The better the configuration for sort fields, the more storage can be saved by efficiently encoding metadata fields that are the same for a set of logs. We should leverage the rich metadata provided by OTel to determine which fields to include in the default sorting. OTel has the concept of a resource which represents the entity producing telemetry as resource attributes. This includes things like host.name, service.name, and other metadata fields.
We're currently working on adding mappings for OpenTelemetry (#104455) and we'd like to configure the index sorting to resource.attributes.* (and possibly scope.attributes.* in addition to that). This would essentially allow us to normalize the data at the storage layer, while still operating with a denormalized mental model, where each metadata field is available on each record, which doesn't require joins at query time.
Currently, it's not possible to configure fields to sort by that aren't in the mapping already or even a wildcard pattern of fields to include.
Depending on the implementation, this may affect how index sorting gets configured for LogsDB, which is why I think we should come up with a plan for this before GA.
The text was updated successfully, but these errors were encountered:
After talking to the team, I no longer think that a breaking change is needed and we should be able to evolve to incorporating all resource attributes into the sorting. We can start by specifying a set of common resource attributes in index.sort.field and add them to the mappings upfront. We can then either evolve the index.sort.field property to accept a wildcard definition, or introduce another option that converts matching fields to a hash (lsh) and adds a single sort field (aside from @timestamp) which could be something like a _sorting_hash meta field.
Index sorting is what makes LogsDB so storage efficient. The better the configuration for sort fields, the more storage can be saved by efficiently encoding metadata fields that are the same for a set of logs. We should leverage the rich metadata provided by OTel to determine which fields to include in the default sorting. OTel has the concept of a resource which represents the entity producing telemetry as resource attributes. This includes things like
host.name
,service.name
, and other metadata fields.Resource attributes seem like the perfect fit to use as a default for the index sorting, as these are the fields that don't change for all logs that are related to that resource. There are also instrumentation scope attributes that are common among all logs that belong to that scope. See also https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/logs/v1/logs.proto for a more details on OTel's logs data model.
We're currently working on adding mappings for OpenTelemetry (#104455) and we'd like to configure the index sorting to
resource.attributes.*
(and possiblyscope.attributes.*
in addition to that). This would essentially allow us to normalize the data at the storage layer, while still operating with a denormalized mental model, where each metadata field is available on each record, which doesn't require joins at query time.Currently, it's not possible to configure fields to sort by that aren't in the mapping already or even a wildcard pattern of fields to include.
Depending on the implementation, this may affect how index sorting gets configured for LogsDB, which is why I think we should come up with a plan for this before GA.
The text was updated successfully, but these errors were encountered: