Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend field stats to include type, searchable, aggregatable #17750

Closed
clintongormley opened this issue Apr 14, 2016 · 10 comments · Fixed by #17980
Closed

Extend field stats to include type, searchable, aggregatable #17750

clintongormley opened this issue Apr 14, 2016 · 10 comments · Fixed by #17980
Assignees
Labels
blocker >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v5.0.0-alpha3

Comments

@clintongormley
Copy link
Contributor

After much discussion in #15728 we have agreed to make the following changes to the field stats API:

  • ?fields will accept wildcards, so that it can match all fields
  • field stats will return the field type, and whether it is searchable (index ==true) and aggregatable (doc_values==true || (index==true && fielddata==true))

When ?level=cluster (the default), fields with the same name are collapsed into a single entry unless the fields have a different type, in which case the whole request returns an exception. This collapsing process should change as follows:

  • An entry should be returned for fields with conflicting type, but with an error key which describes the error.
  • searchable and aggregatable are collapsed to true if any of the instances of that field are searchable or aggregatable.

/cc @rashidkpc @spalger @jpountz @martijnvg

@jpountz
Copy link
Contributor

jpountz commented Apr 14, 2016

field stats will return the field type, and whether it is searchable (index ==true) and aggregatable (doc_values==true || (index==true && fielddata==true))

This will be an issue for some virtual fields like _id, which is searchable even though it is not indexed, and _index, which is aggregatable even though it does not have doc values.

In order not to make it too complicated, I think we can check whether a field is aggregatable by checking whether MappedFieldType.fielddataBuilder() throws an exception since it is the method that throws an exception eg. when a keyword field does not have doc values or when a text field has fielddata=false.

I don't think we have anything similar for searchable fields, so we would probably have to start by checking index=true, plus a hardcoded list of fields that are known to be searchable even though not indexed like _id and _index.

@jpountz
Copy link
Contributor

jpountz commented Apr 14, 2016

index==true && fielddata==true

This made me realize that it is a bug that you can set fielddata=true on a field which is not indexed, so I opened #17747.

@rashidkpc
Copy link

When ?level=cluster (the default), fields with the same name are collapsed into a single entry unless the fields have a different type, in which case the whole request returns an exception.

This won't work for us, assuming it means that the entire request to _field_stats will fail. Ideally we would just exclude the field, perhaps with a way of listing the fields that were excluded due to any exception

@spalger
Copy link
Contributor

spalger commented Apr 14, 2016

unless the fields have a different type, in which case the whole request returns an exception

I think this is a bit drastic. Ideally Kibana would be able to use this and not have to through some fatal error about the field types not matching. Could the conflicting field have "conflict" set for its type when types conflict? Perhaps we could do the same thing for searchable and aggregatable.

@clintongormley
Copy link
Contributor Author

in which case the whole request returns an exception.

this is what it does today, but the bullet points beneath that line explain what we need to change it to do:

  • An entry should be returned for fields with conflicting type, but with an error key which describes the error.

@jimczi
Copy link
Contributor

jimczi commented Apr 18, 2016

field stats will return the field type, and whether it is searchable (index ==true) and aggregatable (doc_values==true || (index==true && fielddata==true))

Currently the field stats API will not return any information if the field is not searchable or if there is no document indexed with a value in that field. It makes sense as none of the stats would be available in such cases.
@clintongormley @jpountz should we add an entry for the "non-searchable/no value yet" fields ?
This could look like this:

      "fields": {
         "field_index_false": {
               "max_doc": 1000,
               "doc_count": -1, // not searchable
               "density": -1,
               "sum_doc_freq": -1,
               "sum_total_term_freq": -1,
               "searchable": false,
               "aggregatable": true,
               "min_value": null,
               "min_value_as_string": "",
               "max_value": null,
               "max_value_as_string": ""
            }
         },
          "field_index_true_no_value": {
               "max_doc": 1000,
               "doc_count": 0, // the field is searchable but no value has been indexed yet.
               "density": 0,
               "sum_doc_freq": 0,
               "sum_total_term_freq": 0,
               "searchable": true,
               "aggregatable": true,
               "min_value": null,
               "min_value_as_string": "",
               "max_value": null,
               "max_value_as_string": ""
            }
         }

@clintongormley
Copy link
Contributor Author

Hmm good question... In fact, even if we can aggregate on a field because it has doc values, we can't return the min/max values unless it is also indexed (correct, @jpountz ?). Today, the field stats API will ignore non-indexed fields.

@jpountz
Copy link
Contributor

jpountz commented Apr 19, 2016

even if we can aggregate on a field because it has doc values, we can't return the min/max values unless it is also indexed

This is correct: we need doc values to aggregate, but these stats either come from points or from the inverted index.

If a field is mapped, I agree there should be an entry regardless of whether it exists in the index or not.

@Bargs
Copy link

Bargs commented Jul 2, 2016

@clintongormley I started working on the Kibana code to consume this new info and I've realized there was a miss with the PR that closed this issue. Field stats still doesn't include the type of each field, and that info is critical for Kibana.

@jimczi
Copy link
Contributor

jimczi commented Jul 4, 2016

@Bargs, sorry I missed this part in the initial PR.
I submitted a new PR to fix this discrepancy:
#19241
Could you please verify that it contains the information that you need ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker >enhancement :Search Foundations/Mapping Index mappings, including merging and defining field types Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v5.0.0-alpha3
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants