Support per-field metadata #33267

jpountz · 2018-08-30T09:49:37Z

It would sometimes be useful for ingestion tools to be able to associate metadata with fields that could later be leveraged by visualization tools such as Kibana to provide a better out-of-the-box experience. One example that got mentioned a number of times for instance is the ability to know the unit of a field. One way to do it would be by giving Elasticsearch the ability to associate metadata per field in the mappings, something like this:

{
  "mappings": {
    "_doc": {
      "properties": {
        "response_time": {
          "type": "float",
          "_meta": {
            "unit": "s"
          }
        },
        "response_size": {
          "type": "long",
          "_meta": {
            "unit": "b"
          }
        }
      }
    }
  }
}

This metadata wouldn't be validated by Elasticsearch: any key-value pairs would be accepted, so there would need to be conventions on key names and values.

Even though things like units are not expected to change on an existing index, preventing updates seems a bit too restrictive since it would be a pity to require reindexing for something that doesn't affect the way that data is indexed. So I propose that updates are merged with existing _meta and that null values may be used to remove existing keys.

Known limitation: For users that already have lots of fields and/or indices, using this feature extensively won't be recommended as it will further increase the size of the cluster state.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-08-30T09:49:38Z

Pinging @elastic/es-search-aggs

jpountz · 2018-09-05T09:07:26Z

Something that just occurred to me is that given that mappings are not available from remote clusters, Kibana would probably want this information to be included in the field capabilities API.

ruflin · 2019-09-09T08:52:13Z

Mentioning elastic/kibana#44955 and elastic/kibana#35481 from the Kibana side here is I think having this concept could simplify things on the Kibana side.

mattkime · 2019-10-30T22:15:54Z

I think this is an interesting idea and I'm thinking through the implications.

There's an inherent tension in what we're trying to accomplish. We need to know the field list and associated metadata for fields across potentially thousands of indices. That list is potentially dynamic but static most of the time.

The current solution is for kibana to look up and store the field list. Its reset manually by the user when the field list changes. It would be preferable to close the gap between kibana's view of the field list and the field list in elasticsearch. The lifecycle of the kibana index pattern field mapping better matches that of an index template (random thought).

Would querying field level metadata be scalable?

This PR adds per-field metadata that can be set in the mappings and is later returned by the field capabilities API. This metadata is completely opaque to Elasticsearch but may be used by tools that index data in Elasticsearch to communicate metadata about fields with tools that then search this data. A typical example that has been requested in the past is the ability to attach a unit to a numeric field. In order to not bloat the cluster state, Elasticsearch requires that this metadata be small: - keys can't be longer than 20 chars, - values can only be numbers or strings of no more than 50 chars - no inner arrays or objects, - the metadata can't have more than 5 keys in total. Given that metadata is opaque to Elasticsearch, field capabilities don't try to do anything smart when merging metadata about multiple indices, the union of all field metadatas is returned. Here is how the meta might look like in mappings: ```json { "properties": { "latency": { "type": "long", "meta": { "unit": "ms" } } } } ``` And then in the field capabilities response: ```json { "latency": { "long": { "searchable": true, "aggreggatable": true, "meta": { "unit": [ "ms" ] } } } } ``` When there are no conflicts, values are arrays of size 1, but when there are conflicts, Elasticsearch includes all unique values in this array, without giving ways to know which index has which metadata value: ```json { "latency": { "long": { "searchable": true, "aggreggatable": true, "meta": { "unit": [ "ms", "ns" ] } } } } ``` Closes elastic#33267

robcowart · 2019-12-15T16:49:39Z

This could be very useful if it allowed for new data to enjoy a better initial presentation in Kibana. I would want to see two things...

There should be some kind of namespacing, or at least a reserved prefix like kibana_, to avoid conflicts with anything Kibana will use.
It must still possible within Kibana Index Patterns to modify/override any attributes autogenerated from this metadata.

This PR adds per-field metadata that can be set in the mappings and is later returned by the field capabilities API. This metadata is completely opaque to Elasticsearch but may be used by tools that index data in Elasticsearch to communicate metadata about fields with tools that then search this data. A typical example that has been requested in the past is the ability to attach a unit to a numeric field. In order to not bloat the cluster state, Elasticsearch requires that this metadata be small: - keys can't be longer than 20 chars, - values can only be numbers or strings of no more than 50 chars - no inner arrays or objects, - the metadata can't have more than 5 keys in total. Given that metadata is opaque to Elasticsearch, field capabilities don't try to do anything smart when merging metadata about multiple indices, the union of all field metadatas is returned. Here is how the meta might look like in mappings: ```json { "properties": { "latency": { "type": "long", "meta": { "unit": "ms" } } } } ``` And then in the field capabilities response: ```json { "latency": { "long": { "searchable": true, "aggreggatable": true, "meta": { "unit": [ "ms" ] } } } } ``` When there are no conflicts, values are arrays of size 1, but when there are conflicts, Elasticsearch includes all unique values in this array, without giving ways to know which index has which metadata value: ```json { "latency": { "long": { "searchable": true, "aggreggatable": true, "meta": { "unit": [ "ms", "ns" ] } } } } ``` Closes #33267

This PR adds per-field metadata that can be set in the mappings and is later returned by the field capabilities API. This metadata is completely opaque to Elasticsearch but may be used by tools that index data in Elasticsearch to communicate metadata about fields with tools that then search this data. A typical example that has been requested in the past is the ability to attach a unit to a numeric field. In order to not bloat the cluster state, Elasticsearch requires that this metadata be small: - keys can't be longer than 20 chars, - values can only be numbers or strings of no more than 50 chars - no inner arrays or objects, - the metadata can't have more than 5 keys in total. Given that metadata is opaque to Elasticsearch, field capabilities don't try to do anything smart when merging metadata about multiple indices, the union of all field metadatas is returned. Here is how the meta might look like in mappings: ```json { "properties": { "latency": { "type": "long", "meta": { "unit": "ms" } } } } ``` And then in the field capabilities response: ```json { "latency": { "long": { "searchable": true, "aggreggatable": true, "meta": { "unit": [ "ms" ] } } } } ``` When there are no conflicts, values are arrays of size 1, but when there are conflicts, Elasticsearch includes all unique values in this array, without giving ways to know which index has which metadata value: ```json { "latency": { "long": { "searchable": true, "aggreggatable": true, "meta": { "unit": [ "ms", "ns" ] } } } } ``` Closes elastic#33267

This PR adds per-field metadata that can be set in the mappings and is later returned by the field capabilities API. This metadata is completely opaque to Elasticsearch but may be used by tools that index data in Elasticsearch to communicate metadata about fields with tools that then search this data. A typical example that has been requested in the past is the ability to attach a unit to a numeric field. In order to not bloat the cluster state, Elasticsearch requires that this metadata be small: - keys can't be longer than 20 chars, - values can only be numbers or strings of no more than 50 chars - no inner arrays or objects, - the metadata can't have more than 5 keys in total. Given that metadata is opaque to Elasticsearch, field capabilities don't try to do anything smart when merging metadata about multiple indices, the union of all field metadatas is returned. Here is how the meta might look like in mappings: ```json { "properties": { "latency": { "type": "long", "meta": { "unit": "ms" } } } } ``` And then in the field capabilities response: ```json { "latency": { "long": { "searchable": true, "aggreggatable": true, "meta": { "unit": [ "ms" ] } } } } ``` When there are no conflicts, values are arrays of size 1, but when there are conflicts, Elasticsearch includes all unique values in this array, without giving ways to know which index has which metadata value: ```json { "latency": { "long": { "searchable": true, "aggreggatable": true, "meta": { "unit": [ "ms", "ns" ] } } } } ``` Closes #33267

This PR adds per-field metadata that can be set in the mappings and is later returned by the field capabilities API. This metadata is completely opaque to Elasticsearch but may be used by tools that index data in Elasticsearch to communicate metadata about fields with tools that then search this data. A typical example that has been requested in the past is the ability to attach a unit to a numeric field. In order to not bloat the cluster state, Elasticsearch requires that this metadata be small: - keys can't be longer than 20 chars, - values can only be numbers or strings of no more than 50 chars - no inner arrays or objects, - the metadata can't have more than 5 keys in total. Given that metadata is opaque to Elasticsearch, field capabilities don't try to do anything smart when merging metadata about multiple indices, the union of all field metadatas is returned. Here is how the meta might look like in mappings: ```json { "properties": { "latency": { "type": "long", "meta": { "unit": "ms" } } } } ``` And then in the field capabilities response: ```json { "latency": { "long": { "searchable": true, "aggreggatable": true, "meta": { "unit": [ "ms" ] } } } } ``` When there are no conflicts, values are arrays of size 1, but when there are conflicts, Elasticsearch includes all unique values in this array, without giving ways to know which index has which metadata value: ```json { "latency": { "long": { "searchable": true, "aggreggatable": true, "meta": { "unit": [ "ms", "ns" ] } } } } ``` Closes elastic#33267

jpountz added >feature :Search Foundations/Mapping Index mappings, including merging and defining field types team-discuss labels Aug 30, 2018

markharwood mentioned this issue Aug 30, 2018

New field metadata - Entity and role types elastic/kibana#22486

Closed

cbuescher removed the team-discuss label Sep 18, 2018

ruflin mentioned this issue Oct 25, 2018

Consul metricbeat module elastic/beats#8631

Merged

ruflin mentioned this issue Mar 15, 2019

Guidance on anonymization/pseudonymization elastic/ecs#68

Open

jtibshirani added the high hanging fruit label May 7, 2019

axw mentioned this issue May 27, 2019

Schema for metrics elastic/ecs#474

Open

ruflin mentioned this issue Sep 9, 2019

Proposal to rename Kibana "Index Patterns" elastic/kibana#44955

Closed

jpountz self-assigned this Nov 21, 2019

jpountz mentioned this issue Nov 21, 2019

Add per-field metadata. #49419

Merged

jpountz closed this as completed in #49419 Dec 18, 2019

jpountz mentioned this issue Dec 18, 2019

Add per-field metadata. #50333

Merged

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

axw mentioned this issue Feb 26, 2020

Feature request: structural object type matching in dynamic templates #51341

Closed

ruflin mentioned this issue Apr 14, 2022

Field descriptions in Kibana based on integrations data elastic/kibana#130238

Open

ruflin mentioned this issue Jan 4, 2023

[Fleet] Kibana data views missing field format mapping for integrations elastic/kibana#148361

Open

javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support per-field metadata #33267

Support per-field metadata #33267

jpountz commented Aug 30, 2018

elasticmachine commented Aug 30, 2018

jpountz commented Sep 5, 2018

ruflin commented Sep 9, 2019

mattkime commented Oct 30, 2019 •

edited

Loading

robcowart commented Dec 15, 2019

Support per-field metadata #33267

Support per-field metadata #33267

Comments

jpountz commented Aug 30, 2018

elasticmachine commented Aug 30, 2018

jpountz commented Sep 5, 2018

ruflin commented Sep 9, 2019

mattkime commented Oct 30, 2019 • edited Loading

robcowart commented Dec 15, 2019

mattkime commented Oct 30, 2019 •

edited

Loading