Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't build visualizations on text fields #6769

Closed
Bargs opened this issue Apr 4, 2016 · 30 comments
Closed

Can't build visualizations on text fields #6769

Bargs opened this issue Apr 4, 2016 · 30 comments
Assignees
Labels
blocker bug Fixes for quality problems that affect the customer experience v5.0.0

Comments

@Bargs
Copy link
Contributor

Bargs commented Apr 4, 2016

Selecting a text field as the target for an aggregation returns the following error:

Fielddata is disabled on text fields by default. Set fielddata=true on [agent] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory.

screen shot 2016-04-04 at 3 55 07 pm

Another thing that's odd is that the http response from ES containing the error has a 200 status code. I'm not sure if that's an intentional change in ES, but it doesn't seem right to me.

@epixa epixa added bug Fixes for quality problems that affect the customer experience P1 blocker and removed v5.0.0-alpha1 labels Apr 4, 2016
@clintongormley
Copy link
Contributor

Another thing that's odd is that the http response from ES containing the error has a 200 status code. I'm not sure if that's an intentional change in ES, but it doesn't seem right to me.

I presume you're talking about the msearch response? It is correct that msearch will return 200 - the msearch request completed correctly, you have to look at the individual items to see if they executed correctly or not. This is just like the bulk API.

@spalger
Copy link
Contributor

spalger commented Apr 5, 2016

Not sure why this would be a P1 blocker. Until we have a way to ask elasticsearch what fields are "aggregatable" we simply have to give the user errors. Ideally the error would say something about using the not-analyzed variant of the chosen field (should it exist), but perhaps this is an enhancement we can bring to this UI issue.

@Bargs
Copy link
Contributor Author

Bargs commented Apr 6, 2016

@spalger Text fields replaced analyzed string fields. Aggregating on analyzed string fields didn't used to throw an error. It wasn't recommended, but it didn't throw an error.

@spalger
Copy link
Contributor

spalger commented Apr 6, 2016

Sure, but text fields not being aggregatable by default is a new behavior in elasticsearch that bubbles up to Kibana, how is that a bug in Kibana?

If the user wants to continue to aggregate on these value types they should do as the error message suggests and "set fielddata=true".

@spalger
Copy link
Contributor

spalger commented Apr 6, 2016

What type of solution do you imagine here?

@Bargs
Copy link
Contributor Author

Bargs commented Apr 6, 2016

TBH when I read the error message, I didn't know enough about fielddata and how it relates to aggregations to understand that it essentially meant "this field is not aggregatable". I would guess most users would have the same reaction. So if text fields can't be aggregated on by default, we should hide them from the field list in the vis editor by default.

@spalger
Copy link
Contributor

spalger commented Apr 7, 2016

yeah, this is why I said:

Until we have a way to ask elasticsearch what fields are "aggregatable" we simply have to give the user errors.

@spalger
Copy link
Contributor

spalger commented Apr 7, 2016

We used to try and guess which fields were aggregateable, but the details about what qualifies/disqualifies a field are quite complex and have changed in the past without actually causing any breaks in Kibana. Users then started filing issues (#3335, #5914) about how elasticsearch had added the ability to aggregate on fields in some new scenario and we had no workaround for them, Kibana was simply going to prevent them from aggregating on that field until the next version was released.

This is why we did #5806, and why we fall back to the error message that elasticsearch chooses to explain the issue.

@Bargs
Copy link
Contributor Author

Bargs commented Apr 7, 2016

The historical context helps... I understand what you're saying. But this change in ES defaults, combined with our policy to simply throw an ES error if we get one, is going to lead to a really terrible user experience. By default, half of a user's string fields (all the non-raw fields) are going to throw a really cryptic error in their face. If this happened to me as a brand new Kibana user, I might just assume the app is broken. This is worse than previous versions where the defaults worked, and the user would only get an error if they intentionally messed with advanced mapping options.

I don't know what an acceptable solution would be since I don't know all the details of the previous discussions about removing the bucketable property, but I feel like at the very least we need to give the user some sort of warning or more friendly error message. Longer term, getting this into ES becomes much more important.

@Bargs
Copy link
Contributor Author

Bargs commented May 2, 2016

Now that elastic/elasticsearch#17980 is merged, we should be able to fix this.

@streamnsight
Copy link

Just tried out Kibana 5 alpha and running into this issue as well.

I have to agree with @Bargs that the error is cryptic and I don't know what to do from here.
Since the error message suggests an option to fix the problem, it would be nice to have a way to do so in the UI, but I don't see any obvious one (no option to set fielddata=true in options for the mappings.)

@streamnsight
Copy link

Just realize there is a new .keyword extension after the text field to build visualization...

Seems to work, but it raises a question: is this a 'representation' for the UI or an actual new field ?
What if I have a nested field ending with .keyword ? Is it going to be interpreted as the field that can be aggregated or am I going to see two fields with the same name?

@Bargs
Copy link
Contributor Author

Bargs commented Jun 27, 2016

@streamnsight in 5.0 strings are mapped as multi fields with text and keyword versions by default: https://www.elastic.co/guide/en/elasticsearch/reference/master/breaking_50_mapping_changes.html#_default_string_mappings. So .keyword isn't a UI only construct, it's coming from elasticsearch.

@streamnsight
Copy link

streamnsight commented Jun 27, 2016

@Bargs thanks for the link...
Can you confirm: does that mean keyword is now a reserved field name, and I can't have a nested key called mytextfield.keyword ?

@Bargs
Copy link
Contributor Author

Bargs commented Jun 27, 2016

@streamnsight It's not reserved, it's just a default. You can override that default by creating your own mappings for the field in your index, or index template.

Or if you want to disable the automatic multi-field entirely, you can edit the default mappings for all indices.

@clintongormley
Copy link
Contributor

Once Kibana starts using the feature added in elastic/elasticsearch#17980, this problem should go away as the text field won't be shown as aggregatable

@LeeDr
Copy link

LeeDr commented Jun 29, 2016

This has an even uglier result in Graph UI. If you use the text field there you get a server 500 error. Elasticsearch and Kibana are showing the same error.
cc @markharwood

Caused by: java.lang.IllegalArgumentException: Fielddata is disabled on text fields by default. Set fielddata=true on [agent] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory.

but Graph is showing a 500 error;

graphui_error

@irab
Copy link

irab commented Aug 16, 2016

Still getting this error on fresh install of Elasticsearch 5 alpha 5 and Kibana 5 alpha 5. elastic/elasticsearch#17980 has not fixed this.

@clintongormley
Copy link
Contributor

@irab Kibana needs to start using the feature added in elastic/elasticsearch#17980 before you'll see any difference

@irab
Copy link

irab commented Aug 17, 2016

Hi @clintongormley. I took a look at that issue - it's tagged "v5.0.0.-alpha3" and was committed to Master back in April. I'm assuming it's in the version i'm using - v5.0.0.-alpha5 release Aug 9th.

@clintongormley
Copy link
Contributor

@irab i repeat: Kibana needs to start using the feature, which will mean not showing fields that shouldn't be used in aggregations.

@irab
Copy link

irab commented Aug 17, 2016

Thanks for the clarification. Hard to tell what is enabled...

Bargs added a commit to Bargs/kibana that referenced this issue Sep 23, 2016
This adds a simple API for getting the searchable/aggregatable status of
a list of fields in a given index, list of indices, or index pattern. In
the future this will probably evolve into a full blown fields info API
that we can use when removing the index pattern mapping cache. For now
though it's built to provide the minimum info needed to fix
elastic#6769

Usage:

The API exposes a single GET endpoint.

```
GET /api/kibana/{indices}/field_capabilities
```

`indices` can be a single index, a comma delimited list, or a wildcard
pattern

Example response:

```
{
  "fields": {
    "imsearchable": {
      "searchable": true,
      "aggregatable": false
    },
    "imaggregatable": {
      "searchable": true,
      "aggregatable": true
    },
  }
}
```
This was referenced Sep 23, 2016
elastic-jasper added a commit that referenced this issue Sep 23, 2016
---------

**Commit 1:**
Add field_capabilities API

This adds a simple API for getting the searchable/aggregatable status of
a list of fields in a given index, list of indices, or index pattern. In
the future this will probably evolve into a full blown fields info API
that we can use when removing the index pattern mapping cache. For now
though it's built to provide the minimum info needed to fix
#6769

Usage:

The API exposes a single GET endpoint.

```
GET /api/kibana/{indices}/field_capabilities
```

`indices` can be a single index, a comma delimited list, or a wildcard
pattern

Example response:

```
{
  "fields": {
    "imsearchable": {
      "searchable": true,
      "aggregatable": false
    },
    "imaggregatable": {
      "searchable": true,
      "aggregatable": true
    },
  }
}
```

* Original sha: 1af6b76
* Authored by Matthew Bargar <[email protected]> on 2016-09-21T18:38:34Z

**Commit 2:**
Filter non-aggregatable fields from vis editor UI

Using the field_capabilities API added in the previous commit, this commit enhances
the client side index pattern object with information about the
searchable and aggregatable status of each field in the index pattern.
We then use this information to filter out non-aggregatable fields from
the vis editor so that users won't accidentally select them and get
nasty errors. An example of a non-aggregatable field would be a `text`
field without fielddata enabled (which is the default).

I also added the searchable and aggregatable flags to the index pattern
page so users can see the status of their fields. I removed the `indexed`
column because it was mostly redundant with `searchable` and I needed
the horizontal space.

The addition of the searchable and aggregatable properties for index
pattern fields would require users to manually refresh their field list
when upgrading to 5.0. This commit also adds a check for those properties and
if they're missing it automatically refreshes the field list for the
user in a seamless manner.

* Original sha: 4a906f3
* Authored by Matthew Bargar <[email protected]> on 2016-09-21T19:18:10Z
elastic-jasper added a commit that referenced this issue Sep 23, 2016
---------

**Commit 1:**
Add field_capabilities API

This adds a simple API for getting the searchable/aggregatable status of
a list of fields in a given index, list of indices, or index pattern. In
the future this will probably evolve into a full blown fields info API
that we can use when removing the index pattern mapping cache. For now
though it's built to provide the minimum info needed to fix
#6769

Usage:

The API exposes a single GET endpoint.

```
GET /api/kibana/{indices}/field_capabilities
```

`indices` can be a single index, a comma delimited list, or a wildcard
pattern

Example response:

```
{
  "fields": {
    "imsearchable": {
      "searchable": true,
      "aggregatable": false
    },
    "imaggregatable": {
      "searchable": true,
      "aggregatable": true
    },
  }
}
```

* Original sha: 1af6b76
* Authored by Matthew Bargar <[email protected]> on 2016-09-21T18:38:34Z

**Commit 2:**
Filter non-aggregatable fields from vis editor UI

Using the field_capabilities API added in the previous commit, this commit enhances
the client side index pattern object with information about the
searchable and aggregatable status of each field in the index pattern.
We then use this information to filter out non-aggregatable fields from
the vis editor so that users won't accidentally select them and get
nasty errors. An example of a non-aggregatable field would be a `text`
field without fielddata enabled (which is the default).

I also added the searchable and aggregatable flags to the index pattern
page so users can see the status of their fields. I removed the `indexed`
column because it was mostly redundant with `searchable` and I needed
the horizontal space.

The addition of the searchable and aggregatable properties for index
pattern fields would require users to manually refresh their field list
when upgrading to 5.0. This commit also adds a check for those properties and
if they're missing it automatically refreshes the field list for the
user in a seamless manner.

* Original sha: 4a906f3
* Authored by Matthew Bargar <[email protected]> on 2016-09-21T19:18:10Z
@zjost
Copy link

zjost commented Feb 3, 2017

I'm not sure I understand how to proceed. I see published tutorials like this which use the text of tweets to do what I want to do (the "Graphing Tweet Text Contents" section).

What do I need to change to allow this sort of analysis and where do I need to change it? I'm trying to recreate the example using twitter data.

@Bargs
Copy link
Contributor Author

Bargs commented Feb 3, 2017

@zjost I wasn't able to find the data set this blog post is using, but I suspect entities.hashtags.text is one giant string in the source JSON. It would be better to split that string into an array prior to indexing and then select the keyword version of the field. The other option is to turn on fielddata for the text version of the field to make it aggregatable, which would be fine if you're just playing around with things in a local environment, but it can suck up a lot of memory so you generally don't want to use it in production.

@pavankumarb
Copy link

Using keyword version of the fields doesn't work in kibana graph workspace.

Graphing UI make a REST call to http://localhost:5601/api/graph/graphExplore, which returns an empty response : {"ok":true,"resp":{"took":0,"timed_out":false,"failures":[],"vertices":[],"connections":[]}} .

ES&Kibana versions being used : 5.1.2

@zjost
Copy link

zjost commented Feb 5, 2017

@Bargs thanks! Is there a way to run the text through the standard analyzer before using the keyword method? I like the keyword functionality, but it only makes sense if you can first standardize the text strings or #DataScience != #datascience

@markharwood
Copy link
Contributor

Is there a way to run the text through the standard analyzer before using the keyword method?

See normalizers in 5.2

@markharwood
Copy link
Contributor

@pavankumarb Checkout the troubleshooting docs for no results

@zjost
Copy link

zjost commented Feb 6, 2017

So there's no way to use an analyzer and then index the tokens? The whole point is to do stemming...etc and find patterns in documents. It seems that's exactly what many of the old tutorials do, but there are new defaults that make this either difficult or impossible. Is there anyway to recreate the result where, i.e. given the text field of a tweet one can use Graph on the field so that tokens that are significantly related are represented by the graph? Not full tweet text, but tokens within. Thanks again for the help.

@clintongormley
Copy link
Contributor

@zjost the only alternative would be to enable fielddata on the text field - just be aware that it is going to use a lot of memory

airow pushed a commit to airow/kibana that referenced this issue Feb 16, 2017
---------

**Commit 1:**
Add field_capabilities API

This adds a simple API for getting the searchable/aggregatable status of
a list of fields in a given index, list of indices, or index pattern. In
the future this will probably evolve into a full blown fields info API
that we can use when removing the index pattern mapping cache. For now
though it's built to provide the minimum info needed to fix
elastic#6769

Usage:

The API exposes a single GET endpoint.

```
GET /api/kibana/{indices}/field_capabilities
```

`indices` can be a single index, a comma delimited list, or a wildcard
pattern

Example response:

```
{
  "fields": {
    "imsearchable": {
      "searchable": true,
      "aggregatable": false
    },
    "imaggregatable": {
      "searchable": true,
      "aggregatable": true
    },
  }
}
```

* Original sha: bea909d97634b69f07013485eee41f62d5d017e0 [formerly 1af6b76]
* Authored by Matthew Bargar <[email protected]> on 2016-09-21T18:38:34Z

**Commit 2:**
Filter non-aggregatable fields from vis editor UI

Using the field_capabilities API added in the previous commit, this commit enhances
the client side index pattern object with information about the
searchable and aggregatable status of each field in the index pattern.
We then use this information to filter out non-aggregatable fields from
the vis editor so that users won't accidentally select them and get
nasty errors. An example of a non-aggregatable field would be a `text`
field without fielddata enabled (which is the default).

I also added the searchable and aggregatable flags to the index pattern
page so users can see the status of their fields. I removed the `indexed`
column because it was mostly redundant with `searchable` and I needed
the horizontal space.

The addition of the searchable and aggregatable properties for index
pattern fields would require users to manually refresh their field list
when upgrading to 5.0. This commit also adds a check for those properties and
if they're missing it automatically refreshes the field list for the
user in a seamless manner.

* Original sha: b823b877f90ce84cb6f789ea90a0fb17e53ad12f [formerly 4a906f3]
* Authored by Matthew Bargar <[email protected]> on 2016-09-21T19:18:10Z


Former-commit-id: 672f009
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker bug Fixes for quality problems that affect the customer experience v5.0.0
Projects
None yet
Development

No branches or pull requests

10 participants