Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Ids query to score documents by ids order #3458

Closed
damienalexandre opened this issue Aug 7, 2013 · 15 comments
Closed

Allow Ids query to score documents by ids order #3458

damienalexandre opened this issue Aug 7, 2013 · 15 comments
Labels

Comments

@damienalexandre
Copy link
Contributor

This is more a feature suggestion than an issue.

Id's query are great and allow to use ES as a great searchable datastore.

I have some use cases where I want to restrict user search to a limited set of documents - so I use Ids query with some match queries.

But sometime, I don't have query, I just want my N documents, without sorting on a field: and in this case, all my documents get a score of 1.

If I run this query multiple times, I get different order almost everytime, as score are equals.

{
    "query": {
            "ids": {
                "values": [
                   "1221","5","6","7","8","9","10"
                ]
            }
    }
}

This is not great for users (as they get a random feeling), and I may also want to use the id values order as document order. So I created a custom script query for this case:

{
  "query": {
    "custom_score": {
      "query": {
        "ids": {
          "type": "pony",
          "values": [
            "1337",
            "1664",
            "8888",
            "1111"
          ]
        }
      },
      "script": "
        count = ids.size();
        id    = org.elasticsearch.index.mapper.Uid.idFromUid(doc['_uid'].value);
        for (i = 0; i < count; i++) {
          if (id == ids[i]) { return count - i; }
        }"
      "params": {
        "ids": [
          "1337",
          "1664",
          "8888",
          "1111"
        ]
      },
      "lang": "mvel"
    }
  }
}

As you can see, I inject the ids to the script as a param, and give them a custom score based on the position of the current document ID in the list.

This fix consistency and ordering issues, but this is slow when dealing with lots of ID's (started noticing when hitting 3k ids).

What I was thinking about is some kind of option we can add to IdsQuery to score docs based on the Id position.

{
    "query": {
            "ids": {
                "values": [
                   "1221","5","6","7","8","9","10"
                ],
                "score_by_order": true
            }
    }
}

With this value to "true", the IdsQuery could give a score to each document, removing the random effect and the need of a custom script to sort by id's.

What do you think?!
Thanks!

@javanna
Copy link
Member

javanna commented Oct 18, 2013

If the problem is having consistent ordering of your documents and you only have the ids filter (or query), I'd suggest to switch to the multi_get api. In that case you would get back the documents in the same order you have put the id in your request. Also, get and multi_get are a better fit when using elasticsearch as a storage as they are real-time, while search is only (Near) real-time, which means that a refresh needs to happen in order to make newly indexed documents searchable (a refresh happens automatically every second by default though).

Otherwise, if you do need a query and want to use the search API, can't you just sort your documents by _id? The issue you may encounter there is that the _id field is not indexed by default, but you can change its mapping or use the _uid field instead, which contains type+id and it is indexed by default, thus it can be used for sorting out of the box.

Let me know if this helps and maybe next time (if you haven't done it yet) can you send a question to the mailing list just to double check that you tried all the options you have?

@ghost ghost assigned javanna Oct 18, 2013
@damienalexandre
Copy link
Contributor Author

Thank for the reply :)

  • Multi get api can't run facets or other ES Query powerful features - having the feature in the ID's query would allow a lot of possibilities.
  • Ordering by _id/_uid is not a solution, you can have non linear _id (like hashes from an url shortener...), and also I want my documents in the order I request them, it can be random.

PS: Here is the related discussion in the ES ML: https://groups.google.com/d/topic/elasticsearch/QQ8RXyMD4fM/discussion

@javanna
Copy link
Member

javanna commented Oct 18, 2013

Thanks for your quick feeback, I see what you mean!

I think a custom script is the way to go here, as it's really your own logic and not something really common. I'd suggest to have a look at script sorting though. In fact, you need to infuence the way the score is computed because you are sorting by score, but if you are able to express your sorting logic as a script, you can just sort based on it, that's it.

@damienalexandre
Copy link
Contributor Author

That's exactly what I do (see the second example in my issue);
I use the list of ID to compute the score of the document. But using score is painfully slow on large dataset, that's why I opened this issue: asking the community if I'm the only one who need this as a feature (a new option in the ID Query) or not 😬

@javanna
Copy link
Member

javanna commented Oct 18, 2013

Got it, what I suggested to do is different to custom_score, although still executes a script per document. Have a look at script sorting.

@felixbarny
Copy link
Member

I'd love to have a score_by_order as well

@darklow
Copy link

darklow commented Jun 11, 2014

Unfortunately by some reason function idFromUid was recently removed and this solution doesn't work anymore with latest ElasticSearch version.
Maybe @martijnvg could comment why was this method removed and are there alternatives?
0e780b7#diff-376fdeb0c8f420de09933212c022341cL97

Maybe someone else knows how to get this feature to work again?
Thank you.

@darklow
Copy link

darklow commented Jun 11, 2014

Actually i just tried using doc['id'].value instead of org.elasticsearch.index.mapper.Uid.idFromUid(doc['_uid'].value); and looks like everything works fine too. Don't even know why idFromUid was used in this solution in first place.

"script": "return -ids.indexOf(Integer.parseInt(doc['id'].value));"

@felixbarny
Copy link
Member

My script just looks like this: ids.indexOf(doc['id'].value)

@javanna javanna removed their assignment Aug 1, 2014
@javanna javanna added the discuss label Aug 1, 2014
@clintongormley
Copy link
Contributor

Closing as won't fix.

@itsjavi
Copy link

itsjavi commented Apr 13, 2017

I would still consider supporting an alternative for this, @clintongormley

It's pity that after 3 years ES haven't provide a reliable alternative for these situations where we need to keep the order of the documents that have been requested + apply search filters.

Scripting is not an option for serious and large-scale applications and anyway we cannot use expression for this I guess, which is in theory more performant.

It would be enough to be able to sort by position in an array of values, provided in the ES request, like:

{
 "sort": [{
    "_position": {
       "field_to_compare_values_with": [1201, 982, 34134]
    }
  }]
}

Similar to the way_geo_distance is used. Benefits compared to the first proposal:

  • Would be compatible with any kind of search supporting sort
  • As you can see, we don't need to mess up with scores.
  • It will work with any field, not only ids.
  • This is useful when you want to keep a fixed/constant sorting and you still need, for example, to ES to calculate and return the distance in geospatial searches.

If you reconsider it, I could open this in a new feature-request ticket.

@Xophe
Copy link

Xophe commented May 15, 2017

I suggest you use a script like this :
switch(doc['id'].value){case "1337":return 0;case "1664":return 1;case "8888":return 2;case "1111":return 3;}
By avoiding the lookup, it will speed up your script execution.
You can also use a hashmap.

@vaclavpfeifer
Copy link

I see the feature request closed. Is there a support in 6.4+? We need to sort index of 20k+ documents (with ids as strings - guids) and dont think that script with custom ranking is the correct way to do this...

@kartlee
Copy link

kartlee commented Mar 4, 2019

The original request to support through query option in json is still useful instead of maintaining the scripting logic. Can you please consider this feature request?

@mifans
Copy link

mifans commented Jan 9, 2020

sort by script will take more times
I suggest you use a script like this :
"sort":
{
"_script": {
"type": "number",
"script":{
"source": "params.get(doc._id.value);",
"params": {
"5c357c0eb565e654fcc3507c": 0,
"5c3539f1b565e632d1c690ee": 1
},
"lang": "painless"
},
"order": "asc"
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

10 participants