Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using AND for the default_operator in a query_string doesn't appear to be working unless term is quoted #30131

Closed
ontology-rory opened this issue Apr 25, 2018 · 7 comments
Labels
>bug discuss feedback_needed :Search/Search Search-related issues that do not fall into other categories

Comments

@ontology-rory
Copy link

Elasticsearch version: 6.2.1

Plugins installed: []

JVM version:

openjdk version "1.8.0_162"
OpenJDK Runtime Environment (Zulu 8.27.0.7-linux64) (build 1.8.0_162-b01)
OpenJDK 64-Bit Server VM (Zulu 8.27.0.7-linux64) (build 25.162-b01, mixed mode)

OS version: Linux 3.10.0-514.10.2.el7.x86_64 #1 SMP Fri Mar 3 00:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

Using AND for the default_operator in a query_string doesn't appear to be working unless term is quoted.

If I run a query I would expect to get 1 document back for, I get zero results. If I quote one of the terms I then get the required result. For example, searching for 3150185 J3050 versus 3150185 "J3050":

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

versus

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "index-test",
        "_type": "document",
        "_id": "test",
        "_score": 0.5753642,
        "_source": {
          "meNm": [
            "3150185"
          ],
          "label": [
            "J3050",
            "JD050"
          ],
          "name": [
            "JD050_ABCD_13"
          ]
        }
      }
    ]
  }
}

Steps to reproduce:

  • create a document
PUT index-test/document/test
{
  "meNm": ["3150185"],
  "label": ["J3050","JD050"],
  "name": ["JD050_ABCD_13"]
}
  • run query below and get 0 results
GET index-test/_search
{
  "query": {
    "query_string": {
      "query": "3150185 J3050",
      "default_operator": "AND"
    }
  },
  "explain": false, 
  "from": 0,
  "size": 10
}
  • run query below and get 1 result
GET index-test/_search
{
  "query": {
    "query_string": {
      "query": "3150185 \"J3050\"",
      "default_operator": "AND"
    }
  },
  "explain": false, 
  "from": 0,
  "size": 10
}

Note that putting the explicit AND in gives the expected result:

GET index-test/_search
{
  "query": {
    "query_string": {
      "query": "3150185 AND J3050"
    }
  },
  "explain": false, 
  "from": 0,
  "size": 10
}

gives

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "index-test",
        "_type": "document",
        "_id": "test",
        "_score": 0.5753642,
        "_source": {
          "meNm": [
            "3150185"
          ],
          "label": [
            "J3050",
            "JD050"
          ],
          "name": [
            "JD050_ABCD_13"
          ]
        }
      }
    ]
  }
}
@romseygeek romseygeek added >bug :Search/Search Search-related issues that do not fall into other categories labels Apr 25, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@jpountz
Copy link
Contributor

jpountz commented Apr 25, 2018

This is the expected behavior. The reason for this is that Elasticsearch relies on the analyzer to split everything that it in-between query operators into terms. ", OR and AND are query operators. And the analyzer on keyword fields is a keyword analyzer. So a query_string query on 3150185 J3050 is parsed as a query on the term 3150185 J3050.

I'm adding the discuss label here in case we think there should be a way to configure this, by somehow configuring a whitespace search analyzer on the field.

@ontology-rory
Copy link
Author

in the real index where I noticed the issue, there is a custom analyser for the index. I removed it for a simpler test case

    ....
      "analysis": {
        "filter": {
          "OntoFilter": { ... }
        },
        "analyzer": {
          "default": {
            "filter": [
              "OntoFilter",
              "lowercase"
            ],
            "tokenizer": "whitespace"
          }
        }
    ....

would that not get picked up for the field or does keyword analyser trump the default analyser set on the index?

@jpountz
Copy link
Contributor

jpountz commented Apr 25, 2018

The default analyzer only applies to text fields. I suspect that label is a keyword in your mappings?

@ontology-rory
Copy link
Author

it was - looking at changing the indexing structure now to see if that helps

@ontology-rory
Copy link
Author

ontology-rory commented Apr 30, 2018

So after tweaking the code a bit I now have the following index structure

{
  "ontoindex": {
    "aliases": {},
    "mappings": {
      "document": {
        "properties": {
          "OntoFields": {
            "type": "nested",
            "properties": {
              "key": {
                "type": "keyword"
              },
              "value": {
                "type": "text",
                "copy_to": [ "OntoAll" ]
              }
            }
          },
          "OntoAll": {
            "type": "text"
          }
        }
      }
    },
    "settings": {
      "index": {
        "analysis": {
          "filter": {
            "OntoFilter": {
              "split_on_numerics": "true",
              "generate_word_parts": "true",
              "preserve_original": "true",
              "catenate_words": "false",
              "generate_number_parts": "true",
              "catenate_all": "false",
              "split_on_case_change": "true",
              "type": "word_delimiter_graph",
              "catenate_numbers": "false"
            }
          },
          "analyzer": {
            "default": {
              "filter": [
                "OntoFilter",
                "lowercase"
              ],
              "tokenizer": "whitespace"
            }
          }
        }
      }
    }
  }
}

Previous, the mappings were very dynamic and in the worst case we could end up with 7500+ fields in the mapping :-(
Spun it round to a more KV-like pattern and then set the type to text for the value and all field. The queries then work as expected.

GET _search
{
  "from": 0,
  "size": 10,
  "query": {
    "query_string": {
      "default_field": "OntoAll",
      "default_operator": "AND",
      "query": "3150185 J3050"
    }
  },
  "explain": false
}

Thanks for you helping me fix this - happy for you to close as non issue

@jpountz
Copy link
Contributor

jpountz commented May 4, 2018

I'll close as a non issue, but we agreed to follow up on the idea to make it possible to split keywords on whitespace at query time: #30393.

@jpountz jpountz closed this as completed May 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug discuss feedback_needed :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

5 participants