Return an aggregated view of all mappings/properties of all types #15728

ggrossetie · 2015-12-31T14:04:34Z

Currently we can retrieve all mappings and properties of all types with _cluster/state but the response size is proportional to the number of indices.

Most of the time (and I think this is mandatory in Elastcicsearch 2.x) field mappings are consistent between types, so we should be able to create an aggregated view that way the response size will be proportional to the number of properties and types (but not to the number of indices).

.API proposal to retrieve an aggregated view of all mappings

http://localhost:9200/_cluster/state/metadata?level=mappings

{
  "cluster_name": "superheroes",
  "metadata": {
    "mappings": {
      "hadoop-hdfs": {
        "_source": {},
        "dynamic_templates": [],
        "_size": {},
        "properties": {},
        "_all": {}
      },
      "jboss-boot": {
        "_source": {},
        "dynamic_templates": [],
        "_size": {},
        "properties": {},
        "_all": {}
      },
      "haproxy-error": {
        "_source": {},
        "dynamic_templates": [],
        "_size": {},
        "properties": {},
        "_all": {}
      },
      "haproxy-info": {
        "_source": {},
        "dynamic_templates": [],
        "_size": {},
        "properties": {},
        "_all": {}
      }
    }
  }
}

Following the same principle, here is the query to create an aggregated view of all properties:

.API proposal to retrieve an aggregated view of all properties

http://localhost:9200/_cluster/state/metadata?level=properties

{
  "cluster_name": "superheroes",
  "metadata": {
    "properties": {
      "tags": {
        "type": "string",
        "copy_to": [
          "tag"
        ]
      },
      "application": {
        "analyzer": "application_analyzer",
        "type": "string"
      },
      "class": {
        "index": "not_analyzed",
        "type": "string"
      },
      "@version": {
        "index": "no",
        "type": "string"
      },
      "message": {
        "analyzer": "log_analyzer",
        "type": "string"
      },
      "@timestamp": {
        "format": "date_time",
        "doc_values": true,
        "type": "date"
      }
    }
  }
}

If it's necessary we could also report fields/properties conflicts between types (or even indices):

.Conflict on application property between two types

"conflicts": [
  {
    "property": "application",
    "first_type": "hadoop-hdfs",
    "other_type": "jboss-boot",
    "parameter": ["analyzer"]
  }
]

I think this will greatly improve the performance of administration/data visualization tool like Kibana or Head plugin when the cluster have many indices (> 100).

What do you think ?

The text was updated successfully, but these errors were encountered:

jpountz · 2015-12-31T14:16:01Z

Note that we are also considering removing types or at least divorcing mappings from types. See #15613.

ggrossetie · 2016-01-08T12:31:31Z

@jpountz I think this is a good idea but in the mean time are you in favor of this proposal ? Will you support a pull request with this feature ?

clintongormley · 2016-03-01T09:16:49Z

HI @Mogztter

What exactly is your use case for this? @rashidkpc is asking for something similar in Kibana, but slightly different. They would like an API which accepts zero or more indices and zero or more types and returns an array of distinct fields (an array, because some fields with the same name will be different in different indices) listing:

the field type
whether the field is searchable
whether the field is aggregatable

Would an API like this cover your requirements too?

ggrossetie · 2016-03-01T12:43:50Z

What exactly is your use case for this?

I don't really have a use case, my goal is to improve performance of Kibana and Head by returning just the right level of information.

Would an API like this cover your requirements too?

I think so, as long as the result is an array of distinct fields.

Bargs · 2016-04-07T14:53:16Z

This feature is even more important for Kibana with the change from string -> text in ES 5.0. Since text fields have fielddata disabled by default, half of a Kibana user's string fields are going to throw a nasty error when the user attempts to aggregate on them (see more about this in elastic/kibana#6769). Having an API that gives us a canonical list of fields with an isAggregatable boolean for each one would be a boon for Kibana users.

rashidkpc · 2016-04-07T16:52:29Z

@clintongormley This is really important for us, really need this sooner than later.

spalger · 2016-04-07T16:53:01Z

Couldn't agree more with @Bargs. With the change to string fields, and the fact that text is not aggregatable by default, this has become extremely important for Kibana

jpountz · 2016-04-07T16:58:14Z

I thought it used to be the same before since the logstash template would disable fielddata on string fields? (to encourage to use the .raw field)

rashidkpc · 2016-04-07T17:05:29Z

Only a subset of Kibana users use logstash, and logstash didn't make that change until Nov 2015

jpountz · 2016-04-07T17:24:38Z

Is it something that could be done with the field mappings API? For instance, the sense recreation below:

PUT index
{
  "mappings": {
    "t": {
      "properties": {
        "f": {
          "type": "text"
        },
        "g": {
          "type": "text",
          "fielddata": true
        },
        "h": {
          "type": "keyword"
        },
        "i": {
          "type": "float"
        }
      }
    }
  }
}

GET index/_mapping/*/field/*?include_defaults=true&filter_path=**.doc_values,**.index,**.fielddata

returns

{
  "index": {
    "mappings": {
      "t": {
        "f": {
          "mapping": {
            "f": {
              "index": true,
              "doc_values": false,
              "fielddata": false
            }
          }
        },
        "g": {
          "mapping": {
            "g": {
              "index": true,
              "doc_values": false,
              "fielddata": true
            }
          }
        },
        "h": {
          "mapping": {
            "h": {
              "index": true,
              "doc_values": true
            }
          }
        },
        "i": {
          "mapping": {
            "i": {
              "index": true,
              "doc_values": true
            }
          }
        }
      }
    }
  }
}

Fields that are searchable are those that have index: true and fields that are aggregatable are those that have either doc_values: true or fielddata: true.

rashidkpc · 2016-04-07T17:30:21Z

It could be, but we'd much rather have a property that told us if the field was aggregatable. It used to be that any field with index: true could be aggregated, that changed. We tried coming up with rules in the past, but they were unreliable. There are also fields such as _type, _id, etc for which the rules are unclear. We really need to know whether or not, for sure, elasticsearch is going to allow aggregations to run on the field in a straight forward manner.

rashidkpc · 2016-04-07T17:34:07Z

@clintongormley in response to your point on #12817:

Another thought that came up in FixItFriday: HTTP compression should greatly reduce the amount of data being sent over the wire (given that there is so much repetition in the mappings). Unfortunately, HTTP compression is disabled by default (see #1482) . @kimchy can you remember the details?

I tested this out with 10 indices containing the same mapping of twenty fields, and it reduced a GET _mapping from 5589 bytes to 209 bytes... Sounds like this could be worth doing.

HTTP compression would likely be a wash at best. The issue is the size of the object being deserialized, not the actual bytes going over the wire. The browser's JSON deserializer tends to have issue deserializing single objects over a couple megs, which isn't uncommon for users with many fields in many indices.

jpountz · 2016-04-07T18:14:03Z

There are also fields such as _type, _id, etc for which the rules are unclear.

This is something that we could fix by using the same index/doc_values/fielddata convention as other fields. For instance I think it would be fine if the _index field reported that it is indexed and has doc values when include_defaults=true.

rashidkpc · 2016-04-07T18:30:18Z

The big problem here is simply tracking the rules and knowing when they change. Since elasticsearch already knows the rules for when things are aggregatable, it would be best if there was an API to convey a fields abstract capabilities instead of the client needing to discern them from its enabled features.

clintongormley · 2016-04-14T09:49:59Z

Closing in favour of #17750

ggrossetie mentioned this issue Jan 5, 2016

Allow configuration of the default index pattern creation string elastic/kibana#5818

Closed

clintongormley added discuss :Search Foundations/Mapping Index mappings, including merging and defining field types labels Feb 29, 2016

This was referenced Mar 1, 2016

Normalized/Shared Mapping #12817

Closed

Missing field mappings - Increase the default lookBack setting? elastic/kibana#6362

Closed

clintongormley mentioned this issue Mar 2, 2016

Field data loading is forbidden on [FIELDNAME] #15267

Closed

rashidkpc mentioned this issue Mar 10, 2016

Remove index pattern mapping cache elastic/kibana#6498

Closed

rashidkpc mentioned this issue Mar 30, 2016

Nested field support elastic/kibana#1084

Open

Bargs mentioned this issue Apr 7, 2016

Can't build visualizations on text fields elastic/kibana#6769

Closed

clintongormley mentioned this issue Apr 14, 2016

Extend field stats to include type, searchable, aggregatable #17750

Closed

clintongormley closed this as completed Apr 14, 2016

javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return an aggregated view of all mappings/properties of all types #15728

Return an aggregated view of all mappings/properties of all types #15728

ggrossetie commented Dec 31, 2015

jpountz commented Dec 31, 2015

ggrossetie commented Jan 8, 2016

clintongormley commented Mar 1, 2016

ggrossetie commented Mar 1, 2016

Bargs commented Apr 7, 2016

rashidkpc commented Apr 7, 2016

spalger commented Apr 7, 2016

jpountz commented Apr 7, 2016

rashidkpc commented Apr 7, 2016

jpountz commented Apr 7, 2016

rashidkpc commented Apr 7, 2016

rashidkpc commented Apr 7, 2016

jpountz commented Apr 7, 2016

rashidkpc commented Apr 7, 2016

clintongormley commented Apr 14, 2016

Return an aggregated view of all mappings/properties of all types #15728

Return an aggregated view of all mappings/properties of all types #15728

Comments

ggrossetie commented Dec 31, 2015

jpountz commented Dec 31, 2015

ggrossetie commented Jan 8, 2016

clintongormley commented Mar 1, 2016

ggrossetie commented Mar 1, 2016

Bargs commented Apr 7, 2016

rashidkpc commented Apr 7, 2016

spalger commented Apr 7, 2016

jpountz commented Apr 7, 2016

rashidkpc commented Apr 7, 2016

jpountz commented Apr 7, 2016

rashidkpc commented Apr 7, 2016

rashidkpc commented Apr 7, 2016

jpountz commented Apr 7, 2016

rashidkpc commented Apr 7, 2016

clintongormley commented Apr 14, 2016