Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support array values for dimension fields #110387

Closed
felixbarny opened this issue Jul 2, 2024 · 5 comments · Fixed by #112645
Closed

Support array values for dimension fields #110387

felixbarny opened this issue Jul 2, 2024 · 5 comments · Fixed by #112645
Labels

Comments

@felixbarny
Copy link
Member

felixbarny commented Jul 2, 2024

At the moment, TSDB dimension fields must be single-valued, as documented for the keyword, ip and numeric field types. When providing an array, the following error is returned:

Error extracting routing: Routing values must be strings but found [START_ARRAY]

throw new ParsingException(
source.getTokenLocation(),
"Routing values must be strings but found [{}]",
source.currentToken()
);

This limitation is problematic for OpenTelemetry mappings, where all attributes are mapped as dimensions. The problematic thing is that attributes can be multi-valued (for example host.ip).

attributes:
type: passthrough
dynamic: true
priority: 10
time_series_dimension: true

As a workaround, we could leverage the type information in OTel's data model to create a comma separated list for array-typed attributes, like host.ip, while retaining the type for scalar attributes. However, we then can't create a dynamic template to map *.ip fields to the IP field type. As there's no native type for ip fields in OTel, we therefore don't have a way to map IP fields for OTel metrics.

Alternatively, we could just send the first value but that would lead to silent data loss and we may lose information that uniquely identifies the time series.

@felixbarny felixbarny added the :StorageEngine/TSDB You know, for Metrics label Jul 2, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@kkrik-es
Copy link
Contributor

kkrik-es commented Jul 3, 2024

Stepping back, mapping all attributes as dimensions seems problematic, since they're also used in tsid calculations etc. What's the reasoning behind storing all these IPs along with metric values? Do we anticipate filtering for a particular IP, as opposed to hostname etc? More so, how much value do we get by storing an array of IPs, instead of the first one as suggested, without additional info on what each IP represents?

@felixbarny
Copy link
Member Author

felixbarny commented Jul 3, 2024

OTel's metrics data model define a time series by an entity consisting of several metadata properties, which includes all attributes.

We can't make a generic assumption that the time series can still be uniquely identified when we just store the first value of an attribute array.

I do agree that in the specific case of the host.ip field, it may not always be an identifying attribute. But at this time, there's no concept of a non-identifying attribute in OpenTelemetry. This may change in the future and in that case we can store these non-identifying attributes in a namespace that is not a container for dimensions (time_series_dimension: false). However, that doesn't change the fact that there can still be multi-valued identifying attributes. It's something that the OTel data model allows for but the TSDS data model doesn't, which creates some kind of impedance mismatch when trying to map OTel metrics to TSDS.

While I do think that we need to allow for multi-valued dimensions at some point, I also think that we can live with the workaround of storing a comma-separated string and having no support for automatically mapping IP fields as the ip field type.

@kkrik-es
Copy link
Contributor

kkrik-es commented Jul 3, 2024

Makes sense, thanks. While we do want to eventually support all corner cases, having concrete data about how much of an issue this is in practice can help us prioritize it.

@felixbarny
Copy link
Member Author

I don't have data on that at the moment but I don't expect this to be a critical issue. We'll probably learn more along the way and maybe users can chime in here when and if this becomes a bigger problem than anticipated. As long as we're aligned that we want to fix this eventually, I'm good with putting this on the shelf for the time being.

felixbarny added a commit to felixbarny/elasticsearch that referenced this issue Sep 9, 2024
Closes elastic#110387

Having this in now affords us not having to introduce version checks in the ES exporter later.
elasticsearchmachine pushed a commit that referenced this issue Sep 23, 2024
Closes #110387

Having this in now affords us not having to introduce version checks in
the ES exporter later. We can simply use the same serialization logic
for metric attributes as we do for other signals. This also enables us
to properly map `*.ip` fields to the ip field type as ip fields
containing a list of IPs are not converted to a comma-separated list.
felixbarny added a commit to felixbarny/elasticsearch that referenced this issue Sep 23, 2024
Closes elastic#110387

Having this in now affords us not having to introduce version checks in
the ES exporter later. We can simply use the same serialization logic
for metric attributes as we do for other signals. This also enables us
to properly map `*.ip` fields to the ip field type as ip fields
containing a list of IPs are not converted to a comma-separated list.

(cherry picked from commit 8d223cb)

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java
elasticsearchmachine pushed a commit that referenced this issue Sep 23, 2024
* Add support for multi-value dimensions (#112645)

Closes #110387

Having this in now affords us not having to introduce version checks in
the ES exporter later. We can simply use the same serialization logic
for metric attributes as we do for other signals. This also enables us
to properly map `*.ip` fields to the ip field type as ip fields
containing a list of IPs are not converted to a comma-separated list.

(cherry picked from commit 8d223cb)

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/mapper/TimeSeriesIdFieldMapper.java

* Remove skip test for 8.x

This was just needed for 8.x to 9.0 compatibility tests
andrzej-stencel pushed a commit to open-telemetry/opentelemetry-collector-contrib that referenced this issue Sep 25, 2024
…etrics OTel mode (#35291)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
Remove the workaround in #35009 to stringify array dimensions as the
limitation has been lifted in Elasticsearch in
elastic/elasticsearch#110387

This change is not breaking as array will be coerced as string, meaning
that there will be no indexing error. Additionally, the changes are
acceptable as OTel mapping mode explicitly marked as in development.

**Link to tracking Issue:** <Issue number if applicable>

**Testing:** <Describe what testing was performed and which tests were
added.>

**Documentation:** <Describe the documentation added.>

---------

Co-authored-by: Vishal Raj <[email protected]>
jriguera pushed a commit to springernature/opentelemetry-collector-contrib that referenced this issue Oct 4, 2024
…etrics OTel mode (open-telemetry#35291)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
Remove the workaround in open-telemetry#35009 to stringify array dimensions as the
limitation has been lifted in Elasticsearch in
elastic/elasticsearch#110387

This change is not breaking as array will be coerced as string, meaning
that there will be no indexing error. Additionally, the changes are
acceptable as OTel mapping mode explicitly marked as in development.

**Link to tracking Issue:** <Issue number if applicable>

**Testing:** <Describe what testing was performed and which tests were
added.>

**Documentation:** <Describe the documentation added.>

---------

Co-authored-by: Vishal Raj <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants