Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make split_on_whitespace default to false #25470

Closed
jpountz opened this issue Jun 29, 2017 · 10 comments
Closed

Make split_on_whitespace default to false #25470

jpountz opened this issue Jun 29, 2017 · 10 comments

Comments

@jpountz
Copy link
Contributor

jpountz commented Jun 29, 2017

When we added the split_on_whitespace option, the goal was to ultimately stop splitting on whitespace and let the analyzer do the right thing. Yet we did not change the default immediately because of backward compatibility considerations. Now that we are bumping the major version, we should consider changing the default value of this option to false?

@jimczi Pinging you since you are likely the most aware person of potential side-effects of such a change

@colings86
Copy link
Contributor

What should we do if all_fields is true? There is no analyzer in this case so should we still default split_on_whitespace to false (seems better as default does not change depending on other options) or should we set it to true so multi-word searches match?

@jpountz
Copy link
Contributor Author

jpountz commented Jul 5, 2017

I don't think we should change the default based on the value of all_fields. For instance say that the mapping has a single field. Wouldn't it be weird to have different behaviour with all_fields:true and field:my_single_field?

@colings86
Copy link
Contributor

agreed

@jimczi
Copy link
Contributor

jimczi commented Jul 6, 2017

The ultimate goal is to completely remove the need for it so we could also deprecate it in 6 and never splits on whitespace.
Instead the new behavior starting in 6 would be to only split on operators which is the right thing to do ?

The concerning change if we remove the option entirely is the loss of the implicit cross-fields mode when multiple fields are requested. split_on_whitespace indirectly allows matches on different field for each whitespace separated token so I think it'd be nice to implement https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#multi-match-types for query_string as a follow up.
Currently best_fields and most_fields multi_match type equivalents can be built with use_dismax option but cross_fields would not be possible.

So as a follow up for 6 we could deprecate use_dismax and adds the type option like in
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#multi-match-types.

@jpountz
Copy link
Contributor Author

jpountz commented Jul 6, 2017

So as a follow up for 6 we could deprecate use_dismax and adds the type option

In general I think it's better to have the same options do the same thing across query parsers so +1 to replace use_dismax with type.

@colings86
Copy link
Contributor

What should we do about the auto_generate_phrase_queries option? We have a check that throws an exception if auto_generate_phrase_queries is true and split_on_whitespace is false

@jimczi
Copy link
Contributor

jimczi commented Jul 6, 2017

Right auto_generate_phrase_queries is problematic. I think it can be replaced with the match_phrase type option. With this option a simple query_string query like michael jackson creates a phrase query if the analyzer for the field splits the free text in multiple tokens and to get the old behavior back users can just put explicit operators; michael OR jackson query would create a phrase query for michael only if the analyzer for the field splits michael in multiple tokens, ...

@jimczi
Copy link
Contributor

jimczi commented Jul 6, 2017

We discussed internally and decided to take this major release opportunity to completely remove the ability to split on whitespace in the query_string in 6.x. Doing this in one step is required in order to not have the other options have different meanings depending on whether we split or nor on whitespace.
See #25574 for the full proposal

@odmby
Copy link

odmby commented May 13, 2018

I just want to say that this change is very bad for most of our users who work via Kibana. For a long time they are working by querying lists of values in parenthesis (similar to "in" operator in SQL), and now all the behavior will change and they won't even be aware that they made an error. Why wasn't split_on_whitespace made configurable instead of making it work the exact opposite as it was up to now with no option to revert?

@jpountz
Copy link
Contributor Author

jpountz commented May 14, 2018

We are considering making it configurable on a per-field basis at #30393.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants