Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing nbHits #1120

Closed
curquiza opened this issue Dec 2, 2020 · 10 comments
Closed

Confusing nbHits #1120

curquiza opened this issue Dec 2, 2020 · 10 comments
Labels
feature request & feedback Go to https://github.com/meilisearch/product/

Comments

@curquiza
Copy link
Member

curquiza commented Dec 2, 2020

I open an issue before opening any spec to be sure I understand well the topic.

It’s been many times I notice the users are confused with the role of nbHits.
If I understand well nbHits is the number of matches. See the docs about the definition of nbHits: https://docs.meilisearch.com/references/search.html#response
And most of all, nbHits is only an estimation (until we release the feature that allows the exhaustive nbHits) and cannot be reliable for accuracy.

Problems

The problems are:

Solutions

To solve these issues, at least the second point, I would suggest renaming nbHits into something else which is not hits, like nbMatches. Indeed, nbHits is not the number of hits but the number of matches, if I understand well again.
You can ask for a limit of 2, so the total number of hits returned by MeilISearch is 2. But also, you have 23 matches for your query. With this example, the nbHits would be set to 23 despite the length of the hits array would be 2. Am I right?

My question is: would this be correct to change the name of nbHits? Or do I miss understand something in the notion of « number of hits »? The naming will be discussed in the spec/meeting of course.

To prevent the first point, the documentation could warn and guide the user through a good pagination set up. See the issue Charlotte has already created: meilisearch/documentation#561
We could also think about renaming hits, but again, we will discuss in a spec if the change is possible.


Edit

The kind of change I have to do because of contributors confusion: meilisearch/meilisearch-laravel-scout#80

@MarinPostma
Copy link
Contributor

Well, I don't really see the nuance between nbHits and nbMatches to be honest... I think that what makes it confusing is that it is linked to exhaustiveNbHits which is probably overlooked. I guess aproximativeNbHits along withisExactNbHits would make it clear

@curquiza
Copy link
Member Author

curquiza commented Dec 3, 2020

What I understand since I use Meili and read the docs:

  • matches are the documents that match your query during the search. All of them.
  • hits are the best documents returned in the MeiliSearch answer for your query during the search, depending on the offset and limit you pass. So when I mean "the best documents", I mean the documents you asked for, among all the matches. By default, MeiliSearch returns the 20 first documents, so 20 hits.

=> You can have 2 hits in the array hits but 23 matches in nbHits as I said in my previous example. That's why I thought they were not exactly the same word. hits was a specific kind/group of matches for me.

I guess aproximativeNbHits along withisExactNbHits would make it clear

That could be a good start!

@curquiza
Copy link
Member Author

curquiza commented Dec 9, 2020

Another user who seems to be confused with nbHits: #849 (comment)
I'm thinking what is the real purpose/usecase of nbHits if it's not usable for pagination or to know the number of hits either?

@MarinPostma
Copy link
Contributor

I will write an article about pagination, because there actually is a workaround now that we have #849.

That does not change the necessity to address the naming issue.

@Kerollmops proposed that we drop the isExhaustiveNbHits and just have an approximativeNbHits.

@curquiza
Copy link
Member Author

curquiza commented Dec 9, 2020

I like the naming, but most of all, before even renaming it: what would be the usecase of approximativeNbHits/nbHits?

@MarinPostma
Copy link
Contributor

MarinPostma commented Dec 9, 2020

this is the PR that introduced the feature: #541

For me it is used to report number of results, and for pagination, as long as you don't need to access the last page/random page. This last reason was a known limitation when implementing the first iteration of the feature.

@curquiza
Copy link
Member Author

curquiza commented Dec 9, 2020

Even to build a "non-finished" pagination, I mean with no page number, nbHits is not reliable information. The user might browse through the resulting documents thanks to the pagination and the pagination can be broken at any moment when trying to access another page. You cannot implement a front-end page that could fail sometimes.
We even say not to use nbHits in this article/issue about how to build a great pagination with MeiliSearch -> meilisearch/documentation#561

Am I wrong @bidoubiwa? Can you confirm what I say about pagination and nbHits?

@MarinPostma
Copy link
Contributor

MarinPostma commented Dec 24, 2020

@curquiza I am not sure to understand why that would not be possible?

What do we decide about the renaming?

@bidoubiwa
Copy link
Contributor

bidoubiwa commented Jan 5, 2021

I have the feeling there are two main problems raised here:

1. Matches and Hits are confusing terms used for different purposes arbitrarily

Matches

in the search request parameters:

{
    "matches": Boolean // if true, indexes are provided for the croped content
}

This is in my opinion, not an intuitive naming. But mostly, because it uses matches, it would be confusing to use it anywhere else in the search route.
Also because it is not intuitive, a lot of people (even inside meilisearch 😨 often forget about its existence).

In the documentation, matches is used to talk about either the number of matching documents or the matching query words. Which is not what it means in the search request parameter.

Hits

In the search response :

  hits:  array // List of returned documents by MeiliSearch (ex: `20`)
  nbHits: number // List of "matching" documents inside MeiliSearch (ex: `1229`) 
  exhaustiveNbHits: boolean // If the "matching" documents inside MeiliSearch is exhaustive or not

In the same returned object by the search route, hits has two different meanings. I think is confusing and should be addressed. So I agree with @curquiza. We should rethink how we use hits and matches and decide on a clear meaning for them.

For example:
Hits: documents returned by MeiliSearch
Matches: Number of documents matching in MeiliSearch before the bucket-sort.

2. Pagination is not recommended

For me it is used to report a number of results, and for pagination, as long as you don't need to access the last page/random page. This last reason was a known limitation when implementing the first iteration of the feature.

Even to build a "non-finished" pagination, I mean with no page number, nbHits is not reliable information. The user might browse through the resulting documents thanks to the pagination and the pagination can be broken at any moment when trying to access another page.

The user would have to add an additional documents to the limit parameter to ensure using the hits length that there is an additional document that can be loaded in the next "page" or "scroll". Since approximativeNbHits is approximative, the user will still have to do this work arround, to confirm that additional documents exist. So what is its purpose?
Unless this approximativeNbHits is always (100%) smaller than the actual number of results. In which case It could be used.

Another problem. but I'm not sure this is linked to this issue, is that how greater our offset becomes, how more MeiliSearch will have to bucket-sort to find - for example - the 1450 to 1470 matching documents. Which will be very heavy in processing for MeiliSearch.
( It could slow down the server if multiple users are using the pagination on the same MeiliSearch instance at the same time, but maybe we don't care, so I write this in italic)

In either case, because it does not serve our main objective of being blazing fast, I'm not sure we want users to think that because exhaustiveNbHits exists, it was made for pagination, and thus we recommend the use of that information to create a pagination system as fast as the search.

I'm sorry if I missed some subtleties, which I'm sure I did.

ExhaustiveNbHits Naming

If we decide to keep it, I think, following what i previously said on hits and matches, and @Kerollmops suggestion, I would like it to be called approximativeNbMatches.

Conclusion

I don't understand the purpose of nbHits if it is approximative.
So in think, we should consider removing it until it becomes a trustable number, and even then, I'm not sure we want to imply to the user that they can travel so far in the pages because it goes against our blazing fast value (sorry @MarinPostma, I know you created it 😣).
And I would like us to agree on the meaning of hits and matches in the future design choices.

bors bot added a commit to meilisearch/meilisearch-laravel-scout that referenced this issue Jan 27, 2021
80: Fix getTotalCount() method r=curquiza a=curquiza

Remove `nbHits` usage because this is not reliable information.

The Meili team is aware of this. Here are the different issues and comments about it to explain why it's confusing, and why we should not use it:
- meilisearch/meilisearch#1120
- meilisearch/documentation#561
- meilisearch/meilisearch-php#119 (comment)

TLDR;
`nbHits` is not reliable for pagination because can be exhaustive or not, depending on the value of `exhaustiveNbHits` that MeiliSearch returns which is always `false` for the moment.

We are sorry for this. We all hope this confusion will be fixed asap in MeiliSearch.

⚠️ The linter error in the CI will be fixed with #82  

Co-authored-by: Clémentine Urquizar <[email protected]>
@curquiza curquiza added feature request & feedback Go to https://github.com/meilisearch/product/ and removed type:ux labels Aug 5, 2021
@gmourier
Copy link
Member

gmourier commented Aug 9, 2021

I'm closing this one since the main problem comes from the pagination. We have specified a finite pagination https://github.com/meilisearch/specifications/pull/42/files

Concerning the naming we will pass on it during the stabilization phase of the different API endpoints before 1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request & feedback Go to https://github.com/meilisearch/product/
Projects
None yet
Development

No branches or pull requests

5 participants