Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Impossible to use search_after with null values #66242

Closed
ozupey opened this issue Dec 14, 2020 · 8 comments
Closed

Impossible to use search_after with null values #66242

ozupey opened this issue Dec 14, 2020 · 8 comments
Labels
>bug needs:triage Requires assignment of a team area label :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@ozupey
Copy link

ozupey commented Dec 14, 2020

Elasticsearch version (bin/elasticsearch --version): 6.8.12

Plugins installed: []

JVM version (java -version): openjdk version "1.8.0_275"

OS version (uname -a if on a Unix-like system): Linux production 4.15.0-118-generic #119-Ubuntu SMP Tue Sep 8 12:30:01 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:

It's impossible to use search_after with null values. Expected behavior is to make it possible.

Steps to reproduce:

  1. Index documents with null values
  2. Sort by a field that has a few null values
  3. Try to use search_after with these null values

Provide logs (if relevant):

As mentioned here:
https://discuss.elastic.co/t/search-after-and-nil-values/232246

If you happen to sort by a field that has a few null values, passing "null" or "nil" to search_after will result in an exception.

Failed to parse search_after value for field [last_request_at].
{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must": [{
            "term": { "team_id": 44 }
          }, {
            "exists": { "field": "user_id"}
          }]
        }
      }
    }
  },
  "sort": [{
    "last_request_at": { "order": "desc"}
  }, {
    "internal_id": { "order": "asc"}
  }],
  "size": "10",
  "search_after": [nil, "5e79de5f33640a29a0027d41"]
}
@ozupey ozupey added >bug needs:triage Requires assignment of a team area label labels Dec 14, 2020
@pgomulka pgomulka added the :Search/Search Search-related issues that do not fall into other categories label Dec 14, 2020
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Dec 14, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@jimczi
Copy link
Contributor

jimczi commented Dec 16, 2020

null is not a valid return value for a numeric or a date sort. Internally we replace missing values with a concrete numeric value depending on the missing option. We could do it automatically for search_after too but it shouldn't be needed if you extract the sort values from a search response. The sort values returned in each top hits are directly compatible with search_after so you shouldn't have to worry about missing values.

@ozupey
Copy link
Author

ozupey commented Dec 16, 2020

Hi @jimczi,

Thanks for the response, but I'm not entirely following.

Let's say I have the following 4 documents that I search one by one using search_after:

[
         {
            "_id": 100,
            "_source":{
               "price": 1,
            }
         },

         {
            "_id": 200,
            "_source":{
               "price": 2,
            }
         },

         {
            "_id": 300,
            "_source":{
               "price": null
            }
         },

         {
            "_id": 400,
            "_source":{
               "price": null
            }
         }
]

I want to sort them by lowest price first, then on the _id as a tie breaker, so I use the following sort:

"sort": [
        {"price": "asc"},  
        {"_id": "desc"}
    ]

The first result is fine:

         {
            "_id": 100,
            "_source":{
               "price": 1,
            }
         },

So then I search_after price 1, id 100:

"search_after": [1, 100],

Which results in the next document:

         {
            "_id": 200,
            "_source":{
               "price": 2,
            }
         },

So then I search_after price 2, id 200:

"search_after": [2, 200],

Which results in the next document:

         {
            "_id": 300,
            "_source":{
               "price": null
            }
         },

But now what do I do? How would I use search_after with this to get to the document with ID 400? Because doing the following:

"search_after": [null, 300],

Results in a null_pointer_exception.

@ozupey
Copy link
Author

ozupey commented Dec 16, 2020

Alright, I looked at the sort value in the response and figured it out. You can use 'Infinity' instead of null and it will work.

Very unexpected and entirely undocumented, but I'm happy it works. Thanks a lot!

@ozupey ozupey closed this as completed Dec 16, 2020
@mayya-sharipova
Copy link
Contributor

@ozupey As Jim said, in search_after parameter, you should use the sort values of the last document from the response.

@benneq
Copy link

benneq commented Sep 3, 2021

Here are some more suggestions: Infinity does not work for date fields. Here you can use Long.MAX_VALUE as a workaround.

Also if you use both ascending and descending search_after the sorting needs some more fine tuning. If you use ascending sorting with "missing" : "_last", you must use "missing" : "_first" when sorting in descending order. Else the results aren't in correct order when null values are involved.

I'd still like to have true null handling in search_after that will do all of this automatically for us.

@j0k3r
Copy link

j0k3r commented Oct 8, 2021

Looks like the hack of using Long.MAX_VALUE for search_after on undefined date worked well until the 7.6 version. Works on on 5.6 & 6.8 but not on 7.6 (even 7.15). (we are using the JS client, see elastic/elasticsearch-js#662)

What's the alternative now?

@j0k3r
Copy link

j0k3r commented Nov 5, 2021

We found a solution, see opensearch-project/OpenSearch#1490

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug needs:triage Requires assignment of a team area label :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

7 participants