Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return an aggregated view of all mappings/properties of all types #15728

Closed
ggrossetie opened this issue Dec 31, 2015 · 15 comments
Closed

Return an aggregated view of all mappings/properties of all types #15728

ggrossetie opened this issue Dec 31, 2015 · 15 comments
Labels
discuss :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch

Comments

@ggrossetie
Copy link
Contributor

Currently we can retrieve all mappings and properties of all types with _cluster/state but the response size is proportional to the number of indices.

Most of the time (and I think this is mandatory in Elastcicsearch 2.x) field mappings are consistent between types, so we should be able to create an aggregated view that way the response size will be proportional to the number of properties and types (but not to the number of indices).

.API proposal to retrieve an aggregated view of all mappings

http://localhost:9200/_cluster/state/metadata?level=mappings
{
  "cluster_name": "superheroes",
  "metadata": {
    "mappings": {
      "hadoop-hdfs": {
        "_source": {},
        "dynamic_templates": [],
        "_size": {},
        "properties": {},
        "_all": {}
      },
      "jboss-boot": {
        "_source": {},
        "dynamic_templates": [],
        "_size": {},
        "properties": {},
        "_all": {}
      },
      "haproxy-error": {
        "_source": {},
        "dynamic_templates": [],
        "_size": {},
        "properties": {},
        "_all": {}
      },
      "haproxy-info": {
        "_source": {},
        "dynamic_templates": [],
        "_size": {},
        "properties": {},
        "_all": {}
      }
    }
  }
}

Following the same principle, here is the query to create an aggregated view of all properties:

.API proposal to retrieve an aggregated view of all properties

http://localhost:9200/_cluster/state/metadata?level=properties
{
  "cluster_name": "superheroes",
  "metadata": {
    "properties": {
      "tags": {
        "type": "string",
        "copy_to": [
          "tag"
        ]
      },
      "application": {
        "analyzer": "application_analyzer",
        "type": "string"
      },
      "class": {
        "index": "not_analyzed",
        "type": "string"
      },
      "@version": {
        "index": "no",
        "type": "string"
      },
      "message": {
        "analyzer": "log_analyzer",
        "type": "string"
      },
      "@timestamp": {
        "format": "date_time",
        "doc_values": true,
        "type": "date"
      }
    }
  }
}

If it's necessary we could also report fields/properties conflicts between types (or even indices):

.Conflict on application property between two types

"conflicts": [
  {
    "property": "application",
    "first_type": "hadoop-hdfs",
    "other_type": "jboss-boot",
    "parameter": ["analyzer"]
  }
]

I think this will greatly improve the performance of administration/data visualization tool like Kibana or Head plugin when the cluster have many indices (> 100).

What do you think ?

@jpountz
Copy link
Contributor

jpountz commented Dec 31, 2015

Note that we are also considering removing types or at least divorcing mappings from types. See #15613.

@ggrossetie
Copy link
Contributor Author

@jpountz I think this is a good idea but in the mean time are you in favor of this proposal ? Will you support a pull request with this feature ?

@clintongormley clintongormley added discuss :Search Foundations/Mapping Index mappings, including merging and defining field types labels Feb 29, 2016
@clintongormley
Copy link
Contributor

HI @Mogztter

What exactly is your use case for this? @rashidkpc is asking for something similar in Kibana, but slightly different. They would like an API which accepts zero or more indices and zero or more types and returns an array of distinct fields (an array, because some fields with the same name will be different in different indices) listing:

  • the field type
  • whether the field is searchable
  • whether the field is aggregatable

Would an API like this cover your requirements too?

@ggrossetie
Copy link
Contributor Author

What exactly is your use case for this?

I don't really have a use case, my goal is to improve performance of Kibana and Head by returning just the right level of information.

Would an API like this cover your requirements too?

I think so, as long as the result is an array of distinct fields.

@Bargs
Copy link

Bargs commented Apr 7, 2016

This feature is even more important for Kibana with the change from string -> text in ES 5.0. Since text fields have fielddata disabled by default, half of a Kibana user's string fields are going to throw a nasty error when the user attempts to aggregate on them (see more about this in elastic/kibana#6769). Having an API that gives us a canonical list of fields with an isAggregatable boolean for each one would be a boon for Kibana users.

@rashidkpc
Copy link

@clintongormley This is really important for us, really need this sooner than later.

@spalger
Copy link
Contributor

spalger commented Apr 7, 2016

Couldn't agree more with @Bargs. With the change to string fields, and the fact that text is not aggregatable by default, this has become extremely important for Kibana

@jpountz
Copy link
Contributor

jpountz commented Apr 7, 2016

I thought it used to be the same before since the logstash template would disable fielddata on string fields? (to encourage to use the .raw field)

@rashidkpc
Copy link

Only a subset of Kibana users use logstash, and logstash didn't make that change until Nov 2015

@jpountz
Copy link
Contributor

jpountz commented Apr 7, 2016

Is it something that could be done with the field mappings API? For instance, the sense recreation below:

PUT index
{
  "mappings": {
    "t": {
      "properties": {
        "f": {
          "type": "text"
        },
        "g": {
          "type": "text",
          "fielddata": true
        },
        "h": {
          "type": "keyword"
        },
        "i": {
          "type": "float"
        }
      }
    }
  }
}

GET index/_mapping/*/field/*?include_defaults=true&filter_path=**.doc_values,**.index,**.fielddata

returns

{
  "index": {
    "mappings": {
      "t": {
        "f": {
          "mapping": {
            "f": {
              "index": true,
              "doc_values": false,
              "fielddata": false
            }
          }
        },
        "g": {
          "mapping": {
            "g": {
              "index": true,
              "doc_values": false,
              "fielddata": true
            }
          }
        },
        "h": {
          "mapping": {
            "h": {
              "index": true,
              "doc_values": true
            }
          }
        },
        "i": {
          "mapping": {
            "i": {
              "index": true,
              "doc_values": true
            }
          }
        }
      }
    }
  }
}

Fields that are searchable are those that have index: true and fields that are aggregatable are those that have either doc_values: true or fielddata: true.

@rashidkpc
Copy link

It could be, but we'd much rather have a property that told us if the field was aggregatable. It used to be that any field with index: true could be aggregated, that changed. We tried coming up with rules in the past, but they were unreliable. There are also fields such as _type, _id, etc for which the rules are unclear. We really need to know whether or not, for sure, elasticsearch is going to allow aggregations to run on the field in a straight forward manner.

@rashidkpc
Copy link

@clintongormley in response to your point on #12817:

Another thought that came up in FixItFriday: HTTP compression should greatly reduce the amount of data being sent over the wire (given that there is so much repetition in the mappings). Unfortunately, HTTP compression is disabled by default (see #1482) . @kimchy can you remember the details?

I tested this out with 10 indices containing the same mapping of twenty fields, and it reduced a GET _mapping from 5589 bytes to 209 bytes... Sounds like this could be worth doing.

HTTP compression would likely be a wash at best. The issue is the size of the object being deserialized, not the actual bytes going over the wire. The browser's JSON deserializer tends to have issue deserializing single objects over a couple megs, which isn't uncommon for users with many fields in many indices.

@jpountz
Copy link
Contributor

jpountz commented Apr 7, 2016

There are also fields such as _type, _id, etc for which the rules are unclear.

This is something that we could fix by using the same index/doc_values/fielddata convention as other fields. For instance I think it would be fine if the _index field reported that it is indexed and has doc values when include_defaults=true.

@rashidkpc
Copy link

The big problem here is simply tracking the rules and knowing when they change. Since elasticsearch already knows the rules for when things are aggregatable, it would be best if there was an API to convey a fields abstract capabilities instead of the client needing to discern them from its enabled features.

@clintongormley
Copy link
Contributor

Closing in favour of #17750

@javanna javanna added the Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch label Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

7 participants