Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ERR to ranking evaluation documentation #32314

Merged
merged 3 commits into from
Jul 24, 2018
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions docs/reference/search/rank-eval.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -259,6 +259,56 @@ in the query. Defaults to 10.
|`normalize` | If set to `true`, this metric will calculate the https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG[Normalized DCG].
|=======================================================================

[float]
==== Expected Reciprocal Rank (ERR)

Expected Reciprocal Rank (ERR) is an extension of the classical reciprocal rank for the graded relevance case
(Chapelle, Olivier, Donald Metzler, Ya Zhang, and Pierre Grinspan. 2009.
http://olivier.chapelle.cc/pub/err.pdf[Expected reciprocal rank for graded relevance].)

It is based on the assumption of a cascade model of search, which models that a user scans through ranked search
results in order and stops at the first document satisfies the information need of the user. For this reason, it
is a good metric for question answering and navigation queries, but less for survey oriented information needs
where the user is interested in finding several relevant documents in the top k results.

The metric tries to model the expectation of the reciprocal of the position of a result at which a user stops.
This means, relevant document in top ranking positions will contribute much to the overall ERR score. The same
document will contribute much less to the score on a lower rank, but even more so if there were some
relevant documents preceding it. By this, ERR discounts documents which are shown below very relevant documents
and introduces some kind of dependency in the ordering of relevant documents.

[source,js]
--------------------------------
GET /twitter/_rank_eval
{
"requests": [
{
"id": "JFK query",
"request": { "query": { "match_all": {}}},
"ratings": []
}],
"metric": {
"expected_reciprocal_rank": {
"maximum_relevance" : 3,
"k" : 20
}
}
}
--------------------------------
// CONSOLE
// TEST[setup:twitter]

The `expected_reciprocal_rank` metric takes the following parameters:

[cols="<,<",options="header",]
|=======================================================================
|Parameter |Description
| `maximum_relevance` | Mandatory parameter. The highest relevance grade used in the user supplied
relevance judgments.
|`k` | sets the maximum number of documents retrieved per query. This value will act in place of the usual `size` parameter
in the query. Defaults to 10.
|=======================================================================

[float]
=== Response format

Expand Down