Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to treat non-metric fields as a dimension by default #98384

Closed
felixbarny opened this issue Aug 11, 2023 · 9 comments
Closed

Add an option to treat non-metric fields as a dimension by default #98384

felixbarny opened this issue Aug 11, 2023 · 9 comments

Comments

@felixbarny
Copy link
Member

felixbarny commented Aug 11, 2023

This is important to be able to integrate well with OTel and Prometheus metrics.

In OTel, all attributes are dimensions. Other than attributes, there’s the metric name and the value.
In ES, you have to explicitly mark all properties that you want to be dimensions. That default behavior to op-in isn’t compatible with OTel, as in OTel everything is a dimension if it isn’t a metric. The result of not defining a dimension is that TSDB will reject documents.
That’s because of the duplicate detection: we reject documents if there are multiple data points for the same timestamp and the same time series. The duplicate detection makes sense and OTel also defines the same logic.

To overcome that mismatch, we’ll need a mode in ES where all properties of a document are treated as a dimension by default unless they’re metrics.

So for a document like this:

{
  "@timestamp": "..."
  "host": "foo",
  "uptime.us": 42
}

We’ll tell ES that uptime.us is a metric of type gauge. We then need Elasticsearch to automatically treat host as a dimension.

This could be solved by introducing a setting, which can be set on the template, that controls whether fields are marked as dimension if it isn't mapped as metric (metric_type), a histogram field type, or other reserved fields.

It should still be possible to be able to explicitly configure time_series_dimension=false on specific properties, though. time_series_dimension=true would just be the implicit default.

@felixbarny felixbarny added the :StorageEngine/TSDB You know, for Metrics label Aug 11, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Aug 11, 2023
@aku019
Copy link

aku019 commented Aug 13, 2023

Is it open? Can I take it up? I am a beginner so will it be too overwhelming for me?

@lalit-satapathy
Copy link

Thanks @felixbarny for filling it. I see couple of high level usages:

  • Provide a path for enabling dimensions automatically for OTel and Prometheus ingestion.
  • Low priority metrics packages, where we can enable TSDB, even when dimensions are not explicitly defined.

This can be an opt-in configuration, where packages can continue to define dimensions, as currently being done. Some packages can choose to opt-in as needed. Most of the others packages will continue to define the dimensions, as currently being done and those packages should work as-is.

One open question is, what is the performance impact if any, when all non metric fields are mapped as mapped as dimensions? I hope, we will get some clarity around the same.

@salvatore-campagna salvatore-campagna pinned this issue Sep 21, 2023
@DaveCTurner DaveCTurner unpinned this issue Sep 23, 2023
@kkrik-es kkrik-es self-assigned this Nov 21, 2023
kkrik-es added a commit to kkrik-es/elasticsearch that referenced this issue Dec 5, 2023
Dynamic templates allow defining fields with `time_series_dimension`
annotations. When these fields are referenced in dynamic template specs
at indexing time, they can be used for routing instead of the routing
path; the latter can be empty in case all dimensions are dynamically
specified.

Related to elastic#98384
@felixbarny
Copy link
Member Author

Closed by #103648

@felixbarny
Copy link
Member Author

I need to re-open this, unfortunately. At the moment, the passthrough field type only supports keyword dimensions. There's a PR to support non-keyword fields (#105073), however it needs a major overhaul to ensure that synthesizing the _id still works. A workaround is to map all attributes in the passthrough field type as keywords. However, that has implications on the range queries that we can support on non-keyword dimensions, such as number and IP fields.

@felixbarny felixbarny reopened this Feb 9, 2024
@wchaparro wchaparro removed the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Mar 20, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@kkrik-es
Copy link
Contributor

Non-keyword dimensions are now supported, after submitting #106080 and #105501.

@StephanErb
Copy link

Does that imply APM can now leverage TSDB?

@felixbarny
Copy link
Member Author

One challenge we still have is that this requires a change in the document structure so that all dimension fields are under a certain namespace, such as attributes.*. While the passthrough field type makes queries compatible, changing the document structure may still be considered a breaking change because it affects ingest processing, for example.

Having said that, we're planning to provide an OTel-based ingestion path that has an attributes namespace. Both the existing and the OTel-based ingestion path will live alongside each other and users can choose the one they prefer.

Disclaimer that this reflects the current state of discussions and things might change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants