Skip to content

Commit

Permalink
First version of the LTR guide. (elastic#105956)
Browse files Browse the repository at this point in the history
  • Loading branch information
afoucret committed Mar 11, 2024
1 parent 822a906 commit de54eaa
Show file tree
Hide file tree
Showing 7 changed files with 383 additions and 0 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
[[learning-to-rank-model-training]]
=== Deploy and manage Learning To Rank models
++++
<titleabbrev>Deploy and manage LTR models</titleabbrev>
++++

preview::["The Learning To Rank feature is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but this feature is not subject to the support SLA of official GA features."]

NOTE: This feature was introduced in version 8.12.0 and is only available to certain subscription levels.
For more information, see {subscriptions}.

[discrete]
[[learning-to-rank-model-training-workflow]]
==== Train and deploy a model using Eland

Typically, the https://xgboost.readthedocs.io/en/stable/[XGBoost^] model training process uses standard Python data science tools like Pandas and scikit-learn.


We have developed an
https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/search/08-learning-to-rank.ipynb[example
notebook^] available in the `elasticsearch-labs` repo. This interactive Python notebook
details an end-to-end model training and deployment workflow.

We highly recommend using https://eland.readthedocs.io/[eland^] in your workflow, because it provides important functionalities for working with LTR in {es}. Use eland to:

* Configure feature extraction

* Extract features for training

* Deploy the model in {es}

[discrete]
[[learning-to-rank-model-training-feature-definition]]
===== Configure feature extraction in Eland

Feature extractors are defined using templated queries. https://eland.readthedocs.io/[Eland^] provides the `eland.ml.ltr.QueryFeatureExtractor` to define these feature extractors directly in Python:

[source,python]
----
from eland.ml.ltr import QueryFeatureExtractor
feature_extractors=[
# We want to use the score of the match query for the title field as a feature:
QueryFeatureExtractor(
feature_name="title_bm25",
query={"match": {"title": "{{query}}"}}
),
# We can use a script_score query to get the value
# of the field rating directly as a feature:
QueryFeatureExtractor(
feature_name="popularity",
query={
"script_score": {
"query": {"exists": {"field": "popularity"}},
"script": {"source": "return doc['popularity'].value;"},
}
},
),
# We can execute a script on the value of the query
# and use the return value as a feature:
QueryFeatureExtractor(
feature_name="query_length",
query={
"script_score": {
"query": {"match_all": {}},
"script": {
"source": "return params['query'].splitOnToken(' ').length;",
"params": {
"query": "{{query}}",
}
},
}
},
),
]
----
// NOTCONSOLE

Once the feature extractors have been defined, they are wrapped in an `eland.ml.ltr.LTRModelConfig` object for use in later training steps:

[source,python]
----
from eland.ml.ltr import LTRModelConfig
ltr_config = LTRModelConfig(feature_extractors)
----
// NOTCONSOLE

[discrete]
[[learning-to-rank-model-training-feature-extraction]]
===== Extracting features for training

Building your dataset is a critical step in the training process. This involves
extracting relevant features and adding them to your judgment list. We
recommend using Eland's `eland.ml.ltr.FeatureLogger` helper class for this
process.

[source,python]
----
from eland.ml.ltr import FeatureLogger
# Create a feature logger that will be used to query {es} to retrieve the features:
feature_logger = FeatureLogger(es_client, MOVIE_INDEX, ltr_config)
----
// NOTCONSOLE

The FeatureLogger provides an `extract_features` method which enables you to extract features for a list of specific documents from your judgment list. At the same time, you can pass query parameters to the feature extractors defined earlier:

[source,python]
----
feature_logger.extract_features(
query_params:{"query": "foo"},
doc_ids=["doc-1", "doc-2"]
)
----
// NOTCONSOLE

Our https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/search/08-learning-to-rank.ipynb[example notebook^] explains how to use the `FeatureLogger` to build a training dataset, by adding features to a judgment list.

[discrete]
[[learning-to-rank-model-training-feature-extraction-notes]]
====== Notes on feature extraction

* We strongly advise against implementing feature extraction on your own. It's crucial to maintain consistency in feature extraction between the training environment and inference in {es}. By using eland tooling, which is developed and tested in tandem with {es}, you can ensure that they function together consistently.

* Feature extraction is performed by executing queries on the {es} server. This could put a lot of stress on your cluster, especially when your judgment list contains a lot of examples or you have many features. Our feature logger implementation is designed to minimize the number of search requests sent to the server and reduce load. However, it might be best to build your training dataset using an {es} cluster that is isolated from any user-facing, production traffic.

[discrete]
[[learning-to-rank-model-deployment]]
===== Deploy your model into {es}

Once your model is trained you will be able to deploy it in your {es} cluster. You can use Eland's `MLModel.import_ltr_model method`:

[source,python]
----
from eland.ml import MLModel
LEARNING_TO_RANK_MODEL_ID="ltr-model-xgboost"
MLModel.import_ltr_model(
es_client=es_client,
model=ranker,
model_id=LEARNING_TO_RANK_MODEL_ID,
ltr_model_config=ltr_config,
es_if_exists="replace",
)
----
// NOTCONSOLE

This method will serialize the trained model and the Learning To Rank configuration (including feature extraction) in a format that {es} can understand. The model is then deployed to {es} using the <<put-trained-models, Create Trained Models API>>.

The following types of models are currently supported for LTR with {es}:

* https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html[`DecisionTreeRegressor`^]
* https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html[`RandomForestRegressor`^]
* https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMRegressor.html[`LGBMRegressor`^]
* https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBRanker[`XGBRanker`^]
* https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBRegressor[`XGBRegressor`^]


More model types will be supported in the future.

[discrete]
[[learning-to-rank-model-management]]
==== Learning To Rank model management

Once your model is deployed in {es} you can manage it using the https://www.elastic.co/guide/en/elasticsearch/reference/current/ml-df-trained-models-apis.html[trained model APIs].
You're now ready to work with your LTR model as a rescorer at <<learning-to-rank-search-usage, search time>>.
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
[[learning-to-rank-search-usage]]
=== Search using Learning To Rank
++++
<titleabbrev>Search using LTR</titleabbrev>
++++

preview::["The Learning To Rank feature is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but this feature is not subject to the support SLA of official GA features."]

NOTE: This feature was introduced in version 8.12.0 and is only available to certain subscription levels.
For more information, see {subscriptions}.

[discrete]
[[learning-to-rank-rescorer]]
==== Learning To Rank as a rescorer

Once your LTR model is trained and deployed in {es}, it can be used as a <<rescore, rescorer>> in the <<search-your-data, search API>>:

[source,console]
----
GET my-index/_search
{
"query": { <1>
"multi_match": {
"fields": ["title", "content"],
"query": "the quick brown fox"
}
},
"rescore": {
"learning_to_rank": {
"model_id": "ltr-model", <2>
"params": { <3>
"query_text": "the quick brown fox"
}
},
"window_size": 100 <4>
}
}
----
// TEST[skip:TBD]
<1> First pass query providing documents to be rescored.
<2> The unique identifier of the trained model uploaded to {es}.
<3> Named parameters to be passed to the query templates used for feature.
<4> The number of documents that should be examined by the rescorer on each shard.

[discrete]
[[learning-to-rank-rescorer-limitations]]
===== Known limitations

[discrete]
[[learning-to-rank-rescorer-limitations-window-size]]
====== Rescore window size

Scores returned by LTR models are usually not comparable with the scores issued by the first pass query and can be lower than the non-rescored score. This can cause the non-rescored result document to be ranked higher than the rescored document. To prevent this, the `window_size` parameter is mandatory for LTR rescorers and should be greater than or equal to `from + size`.

[discrete]
[[learning-to-rank-rescorer-limitations-pagination]]
====== Pagination

When exposing pagination to users, `window_size` should remain constant as each page is progressed by passing different `from` values. Changing the `window_size` can alter the top hits causing results to confusingly shift as the user steps through pages.

[discrete]
[[learning-to-rank-rescorer-limitations-negative-scores]]
====== Negative scores

Depending on how your model is trained, it’s possible that the model will return negative scores for documents. While negative scores are not allowed from first-stage retrieval and ranking, it is possible to use them in the LTR rescorer.

[discrete]
[[learning-to-rank-rescorer-limitations-field-collapsing]]
====== Compatibility with field collapsing

LTR rescorers are not compatible with the <<collapse-search-results, collapse feature>>.

[discrete]
[[learning-to-rank-rescorer-limitations-term-statistics]]
====== Term statistics as features

We do not currently support term statistics as features, however future releases will introduce this capability.

136 changes: 136 additions & 0 deletions docs/reference/search/search-your-data/learning-to-rank.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
[[learning-to-rank]]
== Learning To Rank

preview::["The Learning To Rank feature is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but this feature is not subject to the support SLA of official GA features."]

NOTE: This feature was introduced in version 8.12.0 and is only available to certain subscription levels.
For more information, see {subscriptions}.

Learning To Rank (LTR) uses a trained machine learning (ML) model to build a
ranking function for your search engine. Typically, the model is used as a
second stage re-ranker, to improve the relevance of search results returned by a
simpler, first stage retrieval algorithm. The LTR function takes a list of
documents and a search context and outputs ranked documents:

[[learning-to-rank-overview-diagram]]
.Learning To Rank overview
image::images/search/learning-to-rank-overview.png[Learning To Rank overview,align="center"]


[discrete]
[[learning-to-rank-search-context]]
=== Search context

In addition to the list of documents to sort, the LTR function also requires a
search context. Typically, this search context includes at least the search
terms provided by the user (`text_query` in the example above).
The search context can also provide additional information used in the ranking mode.
This could be information about the user doing the search (such as demographic data, geolocation, or age); about the query (such as query length); or document in the context of the query (such as score for the title field).

[discrete]
[[learning-to-rank-judgement-list]]
=== Judgment list
The LTR model is usually trained on a judgment list, which is a set of queries and documents with a relevance grade. Judgment lists can be human or machine generated: they're commonly populated from behavioural analytics, often with human moderation. Judgment lists determine the ideal ordering of results for a given search query. The goal of LTR is to fit the model to the judgment list rankings as closely as possible for new queries and documents.

The judgment list is the main input used to train the model. It consists of a dataset that contains pairs of queries and documents, along with their corresponding relevance labels.
The relevance judgment is typically either a binary (relevant/irrelevant) or a more
granular label, such as a grade between 0 (completely irrelevant) to 4 (highly
relevant). The example below uses a graded relevance judgment.


[[learning-to-rank-judgment-list-example]]
.Judgment list example
image::images/search/learning-to-rank-judgment-list.png[Judgment list example,align="center"]

[discrete]
[[judgment-list-notes]]
==== Notes on judgment lists

While a judgment list can be created manually by humans, there are techniques available to leverage user engagement data, such as clicks or conversions, to construct judgment lists automatically.

The quantity and the quality of your judgment list will greatly influence the overall performance of the LTR model. The following aspects should be considered very carefully when building your judgment list:

* Most search engines can be searched using different query types. For example, in a movie search engine, users search by title but also by actor or director. It's essential to maintain a balanced number of examples for each query type in your judgment list. This prevents overfitting and allows the model to generalize effectively across all query types.

* Users often provide more positive examples than negative ones. By balancing the number of positive and negative examples, you help the model learn to distinguish between relevant and irrelevant content more accurately.

[discrete]
[[learning-to-rank-feature-extraction]]
=== Feature extraction

Query and document pairs alone don't provide enough information to train the ML
models used for LTR. The relevance scores in judgment lists depend on a number
of properties or _features_. These features must be extracted to determine how
the various components combine to determine document relevance. The judgment
list plus the extracted features make up the training dataset for an LTR model.

These features fall into one of three main categories:

* *Document features*:
These features are derived directly from document properties.
Example: product price in an eCommerce store.

* *Query features*:
These features are computed directly from the query submitted by the user.
Example: the number of words in the query.

* *Query-document features*:
Features used to provide information about the document in the context of the query.
Example: the BM25 score for the `title` field.

To prepare the dataset for training, the features are added to the judgment list:

[[learning-to-rank-judgement-feature-extraction]]
.Judgment list with features
image::images/search/learning-to-rank-feature-extraction.png[Judgment list with features,align="center"]

To do this in {es}, use templated queries to extract features when building the
training dataset and during inference at query time. Here is an example of a
templated query:

[source,js]
----
[
{
"query_extractor": {
"feature_name": "title_bm25",
"query": { "match": { "title": "{{query}}" } }
}
}
]
----
// NOTCONSOLE

[discrete]
[[learning-to-rank-models]]
=== Models

The heart of LTR is of course an ML model. A model is trained using the training data described above in combination with an objective. In the case of LTR, the objective is to rank result documents in an optimal way with respect to a judgment list, given some ranking metric such as https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Discounted_cumulative_gain[nDCG^] or https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Mean_average_precision[MAP^]. The model relies solely on the features and relevance labels from the training data.

The LTR space is evolving rapidly and many approaches and model types are being
experimented with. In practice {es} relies specifically on gradient boosted decision tree
(https://en.wikipedia.org/wiki/Gradient_boosting#Gradient_tree_boosting[GBDT^]) models for LTR inference.

Note that {es} supports model inference but the training process itself must
happen outside of {es}, using a GBDT model. Among the most popular LTR models
used today, https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/MSR-TR-2010-82.pdf[LambdaMART^] provides strong ranking performance with low inference
latencies. It relies on GBDT models and is therefore a perfect fit for LTR in
{es}.

https://xgboost.readthedocs.io/en/stable/[XGBoost^] is a well known library that provides an https://xgboost.readthedocs.io/en/stable/tutorials/learning_to_rank.html[implementation^] of LambdaMART, making it a popular choice for LTR. We offer helpers in https://eland.readthedocs.io/[eland^] to facilitate the integration of a trained https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBRanker[XBGRanker^] model as your LTR model in {es}.

[TIP]
====
Learn more about training in <<learning-to-rank-model-training, Train and deploy a LTR model>>, or check out our https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/search/08-learning-to-rank.ipynb[interactive LTR notebook] available in the `elasticsearch-labs` repo.
====
[discrete]
[[learning-to-rank-in-the-elastic-stack]]
=== LTR in the Elastic stack

In the next pages of this guide you will learn to:

* <<learning-to-rank-model-training, Train and deploy a LTR model using `eland`>>
* <<learning-to-rank-search-usage, Search using LTR model as a rescorer>>

include::learning-to-rank-model-training.asciidoc[]
include::learning-to-rank-search-usage.asciidoc[]
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ include::search-api.asciidoc[]
include::search-application-overview.asciidoc[]
include::knn-search.asciidoc[]
include::semantic-search.asciidoc[]
include::learning-to-rank.asciidoc[]
include::search-across-clusters.asciidoc[]
include::search-with-synonyms.asciidoc[]
include::behavioral-analytics/behavioral-analytics-overview.asciidoc[]

0 comments on commit de54eaa

Please sign in to comment.