Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LogsDB support for sorting by OTel resource attributes #110792

Open
felixbarny opened this issue Jul 11, 2024 · 2 comments
Open

LogsDB support for sorting by OTel resource attributes #110792

felixbarny opened this issue Jul 11, 2024 · 2 comments

Comments

@felixbarny
Copy link
Member

Index sorting is what makes LogsDB so storage efficient. The better the configuration for sort fields, the more storage can be saved by efficiently encoding metadata fields that are the same for a set of logs. We should leverage the rich metadata provided by OTel to determine which fields to include in the default sorting. OTel has the concept of a resource which represents the entity producing telemetry as resource attributes. This includes things like host.name, service.name, and other metadata fields.

Resource attributes seem like the perfect fit to use as a default for the index sorting, as these are the fields that don't change for all logs that are related to that resource. There are also instrumentation scope attributes that are common among all logs that belong to that scope. See also https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/logs/v1/logs.proto for a more details on OTel's logs data model.

We're currently working on adding mappings for OpenTelemetry (#104455) and we'd like to configure the index sorting to resource.attributes.* (and possibly scope.attributes.* in addition to that). This would essentially allow us to normalize the data at the storage layer, while still operating with a denormalized mental model, where each metadata field is available on each record, which doesn't require joins at query time.

Currently, it's not possible to configure fields to sort by that aren't in the mapping already or even a wildcard pattern of fields to include.

Depending on the implementation, this may affect how index sorting gets configured for LogsDB, which is why I think we should come up with a plan for this before GA.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@felixbarny
Copy link
Member Author

After talking to the team, I no longer think that a breaking change is needed and we should be able to evolve to incorporating all resource attributes into the sorting. We can start by specifying a set of common resource attributes in index.sort.field and add them to the mappings upfront. We can then either evolve the index.sort.field property to accept a wildcard definition, or introduce another option that converts matching fields to a hash (lsh) and adds a single sort field (aside from @timestamp) which could be something like a _sorting_hash meta field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants