From 115a561a23c12c7c2bafb71868aa9e07051b3898 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Aur=C3=A9lien=20FOUCRET?= Date: Thu, 7 Mar 2024 06:09:13 +0100 Subject: [PATCH] Apply suggestions from code review Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> --- .../learning-to-rank.asciidoc | 51 ++++++++++++------- 1 file changed, 34 insertions(+), 17 deletions(-) diff --git a/docs/reference/search/search-your-data/learning-to-rank.asciidoc b/docs/reference/search/search-your-data/learning-to-rank.asciidoc index ce65af187f76b..a9622f3c48918 100644 --- a/docs/reference/search/search-your-data/learning-to-rank.asciidoc +++ b/docs/reference/search/search-your-data/learning-to-rank.asciidoc @@ -38,11 +38,11 @@ image::images/search/learning-to-rank-judgment-list.png[Judgment list example,al [[judgment-list-notes]] ==== Notes on judgment lists -While a judgment list can be created manually by humans, there are techniques available to utilize user engagement data, such as clicks or conversions, to construct such a judgment list automatically. +While a judgment list can be created manually by humans, there are techniques available to leverage user engagement data, such as clicks or conversions, to construct judgment lists automatically. The quantity and the quality of your judgment list will greatly influence the overall performance of the LTR model. The following aspects should be considered very carefully when building your judgment list: -* Most search engines can be searched using different query types (e.g: for a movie search engine, users are searching by title but also by actor or director). It is essential to maintain a balanced number of examples for each query type in your judgment list to prevent overfitting and allow the model to generalize effectively across all query types. +* Most search engines can be searched using different query types. For example, in a movie search engine, users search by title but also by actor or director. It's essential to maintain a balanced number of examples for each query type in your judgment list. This prevents overfitting and allows the model to generalize effectively across all query types. * Users often provide more positive examples than negative ones. By balancing the number of positive and negative examples, you help the model learn to distinguish between relevant and irrelevant content more accurately. @@ -50,21 +50,25 @@ The quantity and the quality of your judgment list will greatly influence the ov [[learning-to-rank-feature-extraction]] === Feature extraction -The ML models used for LTR are not able to understand the query and document pair directly but require that we transform their properties into an array of numerical features. +Query and document pairs alone don't provide enough information to train the ML +models used for LTR. The relevance scores in judgment lists depend on a number +of properties or _features_. These features must be extracted to determine how +the various components combine to determine document relevance. The judgment +list plus the extracted features make up the training dataset for an LTR model. These features fall into one of three main categories: -* Document features: - These features are derived directly from the document properties. - Examples: product price in an eCommerce store +* *Document features*: + These features are derived directly from document properties. + Example: product price in an eCommerce store. -* Query features: +* *Query features*: These features are computed directly from the query submitted by the user. - Examples: number of words in the query + Example: the number of words in the query. -* Query-document features: +* *Query-document features*: Features used to provide information about the document in the context of the query. - Examples: BM25 score for the title field, … + Example: the BM25 score for the `title` field. To prepare the dataset for training, the features are added to the judgment list: @@ -72,7 +76,9 @@ To prepare the dataset for training, the features are added to the judgment list .Judgment list with features image::images/search/learning-to-rank-feature-extraction.png[Judgment list with features,align="center"] -To do this, we use templated queries to extract features both when building the training dataset and during inference at query time: +To do this in {es}, use templated queries to extract features when building the +training dataset and during inference at query time. Here is an example of a +templated query: [source,js] ---- @@ -91,21 +97,32 @@ To do this, we use templated queries to extract features both when building the [[learning-to-rank-models]] === Models -The heart of LTR is of course an ML model. A model is trained using the training data described above in combination with an objective. In the case of LTR, the objective is to rank result documents in an optimal way with respect to a judgment list, given some ranking metric such as https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Discounted_cumulative_gain[nDCG^] or https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Mean_average_precision[MAP^]. The model has access to only the features in the training data, as well as the associated relevance labels which are used in the ranking metric. +The heart of LTR is of course an ML model. A model is trained using the training data described above in combination with an objective. In the case of LTR, the objective is to rank result documents in an optimal way with respect to a judgment list, given some ranking metric such as https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Discounted_cumulative_gain[nDCG^] or https://en.wikipedia.org/wiki/Evaluation_measures_(information_retrieval)#Mean_average_precision[MAP^]. The model relies solely on the features and relevance labels from the training data. -Many approaches and model types exist for LTR and the field is continuing to evolve, however LTR inference in {es} relies specifically on gradient boosted decision tree (GBDT) models. In {es}, we also only support model inference and not the training process itself. As such, training an LTR model needs to happen outside of {es} and using a GBDT model. Among the most popular LTR models used today, LambdaMART provides strong ranking performance with low inference latencies. It relies on GBDT models and is thus a perfect fit for LTR in {es}. +The LTR space is evolving rapidly and many approaches and model types are being +experimented with. In practice {es} relies specifically on gradient boosted decision tree +(GBDT) models for LTR inference. + +Note that {es} supports model inference but the training process itself must +happen outside of {es}, using a GBDT model. Among the most popular LTR models +used today, LambdaMART provides strong ranking performance with low inference +latencies. It relies on GBDT models and is therefore a perfect fit for LTR in +{es}. https://xgboost.readthedocs.io/en/stable/[XGBoost^] is a well known library that provides an https://xgboost.readthedocs.io/en/stable/tutorials/learning_to_rank.html[implementation^] of LambdaMART, making it a popular choice for LTR. We offer helpers in https://eland.readthedocs.io/[eland^] to facilitate the integration of a trained https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBRanker[XBGRanker^] model as your LTR model in {es}. +[TIP] +==== +Learn more about training in <>, or check out our https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/search/08-learning-to-rank.ipynb[interactive LTR notebook] available in the `elasticsearch-labs` repo. +==== [discrete] [[learning-to-rank-in-the-elastic-stack]] -=== Learning To Rank in the Elastic stack +=== LTR in the Elastic stack In the next pages of this guide you will learn to: -* train and deploy a LTR model using eland - -* search using your LTR model +* <> +* <> include::learning-to-rank-model-training.asciidoc[] include::learning-to-rank-search-usage.asciidoc[]