Return the output of the ranking rules for each search result #594

loiclec · 2023-01-26T10:37:47Z

loiclec
Jan 26, 2023

I'd like to make the output of each ranking rule visible for each document returned by a search request.

Let's start with a simple example. Given the following ranking rules:

words
typo
proximity

and the search query:

everything everywhere all at once

and the results:

[
	{ "title": "Everything Everywhere All At Once" },
	{ "title": "Everything And Everywhere All At Once" },
	{ "title": "Everything Everyhere All At Once" },
	{ "title": "Everything And Everyhere All At Once" },
	{ "title": "Everything Everywhere All At Twice" },
	{ "title": "Everything And Everyhere All At Twice" }
]

We could return the following information:

[
	[{ "words": 6}, {"typo": 0}, {"proximity": 0 }],
	[{ "words": 6}, {"typo": 0}, {"proximity": 1 }],
	[{ "words": 6}, {"typo": 1}, {"proximity": 0 }],
	[{ "words": 6}, {"typo": 1}, {"proximity": 1 }],
	[{ "words": 5}, {"typo": 0}, {"proximity": 0 }],
	[{ "words": 5}, {"typo": 1}, {"proximity": 1 }],
]

With the caveat that the output of some ranking rules could be unavailable, because it was not necessary to execute them:

[
	[{ "words": 6}, {"typo": 0}, {"proximity": 0 }],
	[{ "words": 6}, {"typo": 0}, {"proximity": 1 }],
	[{ "words": 6}, {"typo": 1}, {"proximity": 0 }],
	[{ "words": 6}, {"typo": 1}, {"proximity": 1 }],
	[{ "words": 5}, {"typo": 0}, {"proximity": null }],
	[{ "words": 5}, {"typo": 1}, {"proximity": null }], 
	// here proximity's output is null because there was only one
	// document for both `words: 5, typo: 0` and `words: 5, typo: 1`
	// so we didn't have to execute `proximity`, and therefore do
	// not know its output
]

And an additional caveat that the output of a ranking rule would not necessarily be a number. For example, with the sort ranking rule, we could have results such as:

// documents:
[
	{"text": "hello world", "name": "loic"},
	{"text": "hello world", "name": "tamo"},
]
// ranking rule outputs:
[
	[{"words": 2}, {"name:asc": "loic"}],
	[{"words": 2}, {"name:asc": "tamo"}],
]

Why

First, I would find it very useful to debug search relevancy problems. It would also help users understand how meilisearch works and help them fine-tune their settings and improve the relevancy of their search results.

Second, this is a building block towards automatically-aggregated multi-index search queries (i.e. federated search).

Federated Search

Given a per-index mapping from the ranking rules' outputs to a vector of numbers, we would be able to merge two sets of search results by sorting them by their score.

For example, with the ranking rules:

words
typo
name:asc

and the output:

[{"words": 2}, {"typo": 3}, {"name:asc": "loic"}]

The mapping could work as follows:

let words_score 
	= normalized(words.ouput) 
	= words.output / max_words
	= 2/10 = 0.2;
let typo_score 
	= normalized(typos.output)
	= 1 - (typos.output / max_typos)
	= 1 - 3 / 20 = 0.85;
let name_asc_score 
	= normalized(name_asc.output)
	= 108/255 + 111/255^2 + 69/255^3 + 63/255^4
	= 0.42524;
let overall_score = [words_score, typo_score, name_asc_score]
	= [0.2, 0.85, 0.42524];

Then, we can compare the score of two search results from different indexes by comparing their score components one-by-one (lexicographically):

{
	"a": [0.2, 0.85, 0.42524],
	"b": [0.5, 0.15, 0.21524, 0.001],
}
// here, "b" is better than "a" because 0.5 > 0.2

Note, however, that we may need to normalise each results further so that they are comparable. This could be done by giving a weight to each ranking rule for the two indexes

{
	"indexA": {
		"rankingRules": [
			// strongly prefer results from this index, even if
			// its results contain fewer words from the query
			["words", 1.5], 
			// and even if they contain more typos
			["typos", 1.1],
			["name:asc", 1.0],
		]
	},
	"indexB": {
		"rankingRules": [
			["words", 1.0],
			["typos", 1.0],
			["name:asc", 1.0],
		]
	}
}

which could also work when the indexes have different ranking rules, with (very) carefully chosen weights. More options can be considered to make the results comparable, such as adding dummy ranking rules with constant scores: ["dummy", 0.5] to tweak the per-score-component sorting behaviour. There is unfortunately no way to perform the sorting other than lexicographically though.

Note also that I haven't considered what the score should be if the ranking rule's output is unknown (= null). This is an open problem for later, which could always be (desperately) resolved by forcing the execution of all ranking rules.

Finally, we could consider tweaking the search algorithm so that it can stop its search when reaching results that fall below a given score. Let's say we perform a federated search on two indexes.

First, we perform a search on indexA, which gives 20 results with the normalized scores:

[
	[0.5, 0.15, 0.21524, 0.001], // the first, highest score
	...
	[0.2, 0.85, 0.42524, 0.912] // the last, lowest score
]

Then, when we perform the request in indexB, we want to stop searching as soon as a document's score falls below [0.2, 0.85, 0.42524, 0.912], because we know it will rank below the 20th position (such an optimisation becomes more difficult to implement for search queries starting from a given offset, for pagination).

dureuill · 2023-01-26T13:47:19Z

dureuill
Jan 26, 2023
Collaborator

Thanks for the write-up @loiclec!

Regarding returning the score in the response, it reminds me of the EXPLAIN keyword of SQL, but transposed to relevancy.

I'm not sure if we want the scores by default in the response though, given how it adds a lot of data to the response (+95% in the example you provided, obviously relative overheads decreases with document size).

Regarding normalization, I'm unsure of how the scores are computed right now. Are they independent from, e.g., the number of documents in the index? I'm a bit concerned that for the score to be comparable, the rules need to be similar, or at least carefully weighted (and same number of rules). I wonder if we could find a simpler solution to avoid pitfalls such as "accidentally making an entire index less relevant".

On the whole, I really like the direction you're setting here! Feels like it opens up a lot of possibilities!

0 replies

gmourier · 2023-01-26T14:54:50Z

gmourier
Jan 26, 2023
Maintainer

Thank @loiclec 👍

I'm cross-referencing a previous product discussion (#379)

0 replies

juchom · 2023-01-30T08:14:26Z

juchom
Jan 30, 2023

Hello,

This is pretty interesting, I like the idea to have detailed information on how a hit is found in an index.

Like @dureuill, having this level of details by default is not a good idea, there is too much information for normal use cases.

What would be great, is to have a single score per hit on regular searches. This would allow a user to query Meilisearch and make sure that only hits with a score over 0.7 for exemple are returned.

0 replies

vincent-herlemont · 2023-01-30T10:40:57Z

vincent-herlemont
Jan 30, 2023

Thank @loiclec, I'm really interested too. That could allow to improve UX for some use case if a result have an important score or not.

0 replies

vincent-herlemont · 2023-01-30T10:47:33Z

vincent-herlemont
Jan 30, 2023

It may be too early to know, but could it be developed directly into the milli core?

0 replies

Magorx15 · 2023-03-07T14:38:26Z

Magorx15
Mar 7, 2023

Amazing idea! We would love to have Federated Search as we have a few problems right now that would be easily solved if we could merge the results of multi-index search!

Returning score in the response by default seems unnecessary. Maybe it would be better with a parameter (like a boolean: score) or a totally different api endpoint, so everyone could decide which to use.

0 replies

gmourier · 2023-03-08T13:19:20Z

gmourier
Mar 8, 2023
Maintainer

Sharing and copying/pasting the initial message from @AymanHamdoun on #614

Feature Description

There is a very useful feature in Algolia where you can send getRankingInfo = true in the search request which would return the ranking info (matched words, exact words, typos, etc..) of each result.

Feel free to check their API Reference

Basically the results array would then look like this:

[
  {
    "id": 111,
    "title": "Some Document Title",
    "_rankingInfo": {
        "nbTypos": 0,
        "proximityDistance": 0,
        "nbExactWords": 0,
        "words": 0,
    }
  },
  ...
]

How it can be helpful ?

Assume you have several indices, each containing a certain type of entity and you want to search each of these indices and provide one list of mixed results to your users.

Lets say you have a Movies index and a Series index. and I search for "Help".

My movies index returned a result titled "Help"
My series index returned a result titled "Hell" (which is 1 typo away from the original query)

it would be very helpful to have the rankingInfo of each result, so i can simply merge the two results into one array and sort it by the ranking info attributes (matched words descending, typos descending, exact words descending, etc...)

This is a huge part of my work actually and Algolia makes my result merging logic easy because it provides the rankingInfo for my results. If i were to compute them manually it would be such a waste of time coz the search engine already computed them at one point and i shouldn't need to compute them again. Also if i were to compute them myself, I may as well do so in a slightly different manner than the search engine which would cause some weird inconsistencies.

0 replies

bakerfugu · 2023-03-23T20:11:38Z

bakerfugu
Mar 23, 2023

Adding my support for a feature like this along with my use case.

I display my search results on a Mon-Fri calendar, so a result might be an event occurring on Mondays and Wednesdays at 1pm. I have a limit of 20 results, so when a user makes a very specific search (e.g. many keywords) the calendar gets clogged visually with 15-19 irrelevant results. This is because Meilisearch always tries to return the search limit by pruning search keywords / expanding to find potential typos. I'm fine with this behavior leading to some visually clogging on broad searches (e.g. one keyword), but I want to make the difference in relevancy more obvious.

For example, using this feature I could use the ranking rules output to create a color mapping on the calendar for each result. The most relevant results could be red, and less would go down a gradient of yellow >> green >> blue. So the very specific search would have a few red results but the sharp dropoff in relevancy would mean the rest would be green/blue. But for the broad search, most of the results would be red/yellow, emphasizing the similarity in relevancy.

If I tried to create this feature now using only the order of the search results, I wouldn't be able to distinguish the sudden dropoff in relevancy for a very specific search vs the similar relevance for a broad search.

As a potential suggestion to address the issue of significantly increasing the response data, perhaps one option could be to calculate an aggregated "relevancy score" of the ranking rule output. A perfectly relevant result would have a score of 1.0, with corresponding reductions for each missed word, typo, and so on.

0 replies

macraig · 2023-06-08T17:02:50Z

macraig
Jun 8, 2023
Maintainer

Hello everyone! 👋

Quick update on our side, we've started working on a solution to return ranking details.

Stay tuned for updates and feel free to keep the feedback coming!

0 replies

macraig · 2023-06-12T14:05:05Z

macraig
Jun 12, 2023
Maintainer

Hello everyone 👋

We just released a 🧪 prototype that allows displaying ranking details when searching, and we'd love your feedback.

How to get the prototype?

Using docker, use the following command:

docker run -p 7700:7700 -v $(pwd)/meili_data:/meili_data getmeili/meilisearch:prototype-scoring-0

Alternatively, you can also build the prototype from source by checkout'ing to the prototype-scoring-0 tag.

How to use the prototype?

You can find some usage examples below, or look at the original PR for more details.

Getting the ranking details to customize result UI

Add "showRankingScore": true to your search queries.
In the replies, get the hits[x]._rankingScore for each document you want to examine. It is a number on a 0 to 1000 scale, with 1000 a "perfect match", and 0 no match at all (normally never returned by Meilisearch)
Depending on the ranking score, customize your UI, for instance use a CSS class for documents with score higher than 950, another for documents with score higher than 900, etc.

Getting the ranking details for the results of multiple indexes to be able to re-rank documents coming from distinct indexes

Make sure all the indexes share the same following settings:

same searchableAttributes
same typoTolerance settings (enabled, number of typos, ignored words)
same sortableAttributes
same ranking rules in the same order

Add "showRankingScoreDetails": true to the search queries to the various indexes.
Make sure the search parameters for the various queries performed against the various indexes are the same.
Make your search queries against your various indexes using the multi-search route, or with multiple calls to the search route of each index.
In the replies, get the hits[x]._rankingScoreDetails field for each document you're interested in.
Compare the _rankingScoreDetails object lexicographically. The order for comparing is given by the order subfield in each field of that object.
For most rules, you can simply compare the score subfield. Higher is better
For sort and geoDistance, you need to take the value into account, and the direction (it is given in the name of the ranking rule) to rerank the documents.

Questions we have for you

Is your use case solved by using the normalized score or the score details?
Does the normalized score match your expectations? (for example, documents matching the query "well" have high scores, documents matching the query "badly" have lower scores)
What details would you like to see in the scoring details? Why?
Do you have any other feedback on the API?

⚠️ We do NOT recommend using this prototype in production. This is for test purposes only. ⚠️

Feedback and bug reporting when using this prototype are encouraged! Thanks in advance for your involvement. It means a lot to us ❤️

0 replies

macraig · 2023-06-19T09:42:39Z

macraig
Jun 19, 2023
Maintainer

Hello again 👋

We just released a new version of this prototype with some improvements thanks to your feedback ❤️

How to get the prototype?

Using docker, use the following command:

docker run -p 7700:7700 -v $(pwd)/meili_data:/meili_data getmeili/meilisearch:prototype-scoring-1

Alternatively, you can also build the prototype from source by checkout'ing to the prototype-scoring-1 tag.

What has changed?

Relevancy & scoring oddities: We've changed the proximity and attribute_position ranking rules to give perfect scores when the document perfectly matches. This led to relevancy improvements for the attribute_position ranking rules (the distance to the word position in the query serves as a reference now, rather than the distance to position 0). Additionally, we've also changed the way words appearing in phrases are scored (they are now counted as matching words). Lastly, we've improved the accuracy of exactness for queries and documents containing hard separators (such as , or .)
API: We switched to a float unit scale for scores and we adjusted the naming of some of the ranking score details following feedback (in particular, we added score for attribute)
Security: fields that are not in the list of displayedAttributes are now hidden when in the details of sort ranking rules.

⚠️ We do NOT recommend using this prototype in production. This is for test purposes only. ⚠️

0 replies

macraig · 2023-06-21T16:27:31Z

macraig
Jun 21, 2023
Maintainer

Update: Scoring details will be available as an experimental feature in Meilisearch v1.3 (released July 31st)

You can find the details in #674

0 replies

macraig · 2023-07-04T17:00:47Z

macraig
Jul 4, 2023
Maintainer

Hello everyone 👋

We have just released the first RC (release candidate) of Meilisearch containing this new feature!
You can test it by using

the release assets
the Meilisearch Docker image

docker run -it --rm -p 7700:7700 -v $(pwd)/meili_data:/meili_data getmeili/meilisearch:v1.3.0-rc.0

You are welcome to leave your feedback in this discussion.

If you encounter any bugs, please report them here.
Thanks in advance for your help and your involvement in Meilisearch ❤️

🎉 The official and stable release containing this change will be available on July 31st, 2023

⚠️ RC (release candidates) are not recommended for production

0 replies

johnrich85 · 2023-07-19T15:17:37Z

johnrich85
Jul 19, 2023

@macraig Is it/will it be possible to filter on the scores too? Would be useful to say only return results >0.8 etc (per rule ideally).

3 replies

macraig Jul 20, 2023
Maintainer

You'll be able to get the detailed scores per rule per document by using _rankingScoreDetails. You can then filter the scores as you please. The _rankingScoreDetails are released as Experimental for v1.3, you can check out the details here: #674

jr-cobweb Jul 20, 2023

Thanks @macraig ! Do you know if there's any plans in future to allow filtering based on the score? I realise we can filter after the fact, but that would be problematic when using pagination

macraig Jul 24, 2023
Maintainer

@jr-cobweb our next step is to gather feedback on usage and needs to stabilize the feature as it's currently Experimental. Your use case is the perfect example of the feedback we want to take into account to prioritize. Would you mind adding a brief description of your use case in #674? If not no worries, I can paste your comment as the feedback :)

macraig · 2023-08-01T14:38:03Z

macraig
Aug 1, 2023
Maintainer

Hey folks 👋

v1.3 has been released! 🦁 You can now get global and detailed ranking scores for your documents ✨

Note: Document ranking score details are considered Experimental, you can find the usage instructions in #674

📚 https://www.meilisearch.com/docs/learn/core_concepts/relevancy#ranking-score
📚 https://www.meilisearch.com/docs/learn/experimental/ranking_rule_score_details

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meilisearch

Return the output of the ranking rules for each search result #594

{{title}}

Replies: 15 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Return the output of the ranking rules for each search result #594

Why

Federated Search

Replies: 15 comments · 3 replies

dureuill Jan 26, 2023 Collaborator

gmourier Jan 26, 2023 Maintainer

gmourier Mar 8, 2023 Maintainer

Feature Description

How it can be helpful ?

macraig Jun 8, 2023 Maintainer

macraig Jun 12, 2023 Maintainer

How to get the prototype?

How to use the prototype?

Getting the ranking details to customize result UI

Getting the ranking details for the results of multiple indexes to be able to re-rank documents coming from distinct indexes

Questions we have for you

macraig Jun 19, 2023 Maintainer

How to get the prototype?

What has changed?

macraig Jun 21, 2023 Maintainer

macraig Jul 4, 2023 Maintainer

macraig Jul 20, 2023 Maintainer

macraig Jul 24, 2023 Maintainer

macraig Aug 1, 2023 Maintainer

Replies: 15 comments 3 replies

dureuill
Jan 26, 2023
Collaborator

gmourier
Jan 26, 2023
Maintainer

gmourier
Mar 8, 2023
Maintainer

macraig
Jun 8, 2023
Maintainer

macraig
Jun 12, 2023
Maintainer

macraig
Jun 19, 2023
Maintainer

macraig
Jun 21, 2023
Maintainer

macraig
Jul 4, 2023
Maintainer

macraig Jul 20, 2023
Maintainer

macraig Jul 24, 2023
Maintainer

macraig
Aug 1, 2023
Maintainer