Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make limit on number of expanded fields configurable #34778

Closed
melissachang opened this issue Oct 23, 2018 · 9 comments
Closed

Make limit on number of expanded fields configurable #34778

melissachang opened this issue Oct 23, 2018 · 9 comments
Assignees
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories

Comments

@melissachang
Copy link

I'd like to use multi_match to search over documents with 6600 fields. Is there any way the 1024 limit can be increased?

@DaveCTurner DaveCTurner added the :Search/Search Search-related issues that do not fall into other categories label Oct 24, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@cbuescher
Copy link
Member

@melissachang Are you actually running into this kind of problem in a release version of Elasticsearch? Asking because I think the limit is only enforced starting with the yet unreleased version 7.0 (see #26541). I think the warning might have been backported to previous versions of the docs by accident. However it is a good warning, since the limit is going to be enforced at least from 7.0 on to prevent accidental expansions of queries to all fields causing performance problems. So increasing it might be causing performance issues and at the moment doesn't seem to be possible.

6600 is a huge number of fields, it would be interesting to learn more about your use case and see why you need to query so many fields. Maybe this indicates a problem in your document design. Could you elaborate?

@melissachang
Copy link
Author

So we finally got the index working. Elasticsearch 6.2.2, 122k documents, 6624 fields. Some of them are text fields with extra keyword fields, so there are actually 13251 mappings.

http://localhost:9200/_cat/indices?v
health status index                       uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   nurse_s_health_study        eacHM2tARHetYdMlEHMN0A   5   1     121701            0      5.6gb          5.6gb

This multi_match query worked beautifully:

GET /nurse_s_health_study/_search
{
  "query": {
    "multi_match" : {
      "query":    "pre*"
    }
  }
}
{
  "took": 1485,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 106438,

I will explain my use case. But since things work with 13251 mappings, it seems like Elasticsearch shouldn't impose a hard limit? I can understand making the default limit 1024, but users should be able to increase that.

I am using Elasticsearch to do faceted search on health-related datasets. You can see an example here.

Nurse's Health Study is one of our datasets. The Nurse's Health data is a table with 6k columns and 120k rows. Each row is a participant. Each column represents a field -- eg weight in a certain year, menopausal status in a certain year, etc. (Of course not every participant will have every field filled out.)

In the index, each document corresponds to a participant. Each of the 6k table columns is a field on the document.

We want to use multi_match to search across the column contents of all 6k columns. For example, pre* finds pre/never, premenopause, etc.

CC @bfcrampton

@cbuescher
Copy link
Member

But since things work with 13251 mappings, it seems like Elasticsearch shouldn't impose a hard limit? I can understand making the default limit 1024, but users should be able to increase that.

Great that it works in this particular case, but that still doesn't mean it works for other cases. We often see problems with that amount of fields. But I agree that it probably makes sense to make this configurable.

The Nurse's Health data is a table with 6k columns...

This again makes sense if you are coming from thinking of your data as a database table, but is problematic for an inverted index like Lucene that started out as being used for full text search. In your particular case I think the limit come into effect when searching across all fields. I would suggest to copy all text fields relevant to your search to a dedicated "catch_all" field using the copy_to parameter in the mappings.

@cbuescher cbuescher changed the title Increase 1024 field limit for multi_match Make limit on number of expanded fields configurable Nov 2, 2018
@cbuescher
Copy link
Member

@melissachang fyi I changed the issue title to reflect better your ask about making the current hard limit introduced with #26541 configurable. I marked this for internal group discussion, but maybe @dakrone who authored that change can share his thoughts on this as well on this issue.

@melissachang
Copy link
Author

Conceptually, our data is a database. It's not like other use cases like logs ingestion, where the fields are arbitrary strings. Our columns are well-defined and meaningful.

We are using Elasticsearch to perform full-text search over the database. It's performant, easy, and it works -- I don't see what the problem is.

I experimented with copy_to. I spent maybe five hours and couldn't get copy_to to work with nested fields. (This is on a simple toy index, not the Nurse's Health index.) Also, there's no way to debug copy_to. Elasticsearch doesn't expose the inverted index in any way.

multi_match works with zero work on our part. Just because some people experience performance problems with > 1024 fields, doesn't mean you should prohibit all users from this.

@colings86 colings86 assigned cbuescher and unassigned cbuescher Nov 5, 2018
@colings86 colings86 added help wanted adoptme and removed team-discuss labels Nov 5, 2018
@cbuescher
Copy link
Member

We discussed this issue internally and agreed this limit should be configurable. We also said it would make sense to not introduce a new setting for overwrite this limit but instead use the max_clause_count setting that is currently already limiting the clauses a Lucene BooleanQuery can have to 1024. While changing this on master we should also make sure we update the deprecation warning on the 6.x branches to kick in only when the configurable max_clause_count limit is exceeded.

@melissachang
Copy link
Author

Great. Is there going to be an upper limit on max_clause_count?

@cbuescher
Copy link
Member

Its default is 1024 clauses (if we use it for the field expansion as well this will mean 1024 fields). It can be increased to whatever value you want to, but you use this at your own risk then basically. Up until recently we didn't propery document this but I added docs about the setting in #34779.

@cbuescher cbuescher removed the help wanted adoptme label Nov 6, 2018
@cbuescher cbuescher self-assigned this Nov 6, 2018
cbuescher pushed a commit to cbuescher/elasticsearch that referenced this issue Nov 6, 2018
Currently we introduced a hard limit of 1024 to the number of fields a query can
be expanded to in elastic#26541. Instead of using a hard limit, we should make this
configurable. This change removes the hard limit check and uses the existing
`max_clause_count` setting instead.

Closes elastic#34778
cbuescher pushed a commit that referenced this issue Nov 8, 2018
Currently we introduced a hard limit of 1024 to the number of fields a query can
be expanded to in #26541. Instead of using a hard limit, we should make this
configurable. This change removes the hard limit check and uses the existing
`max_clause_count` setting instead.

Closes #34778
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

5 participants