Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial matching doesn't work after certain number of digits.. #273

Closed
khaledosman opened this issue Jun 1, 2017 · 3 comments
Closed

Partial matching doesn't work after certain number of digits.. #273

khaledosman opened this issue Jun 1, 2017 · 3 comments

Comments

@khaledosman
Copy link

khaledosman commented Jun 1, 2017

I was able to reproduce this on the demo page

  1. go to https://olivernn.github.io/moonwalkers/
  2. search for experi* -> finds results
  3. search for experim* or experime* or experimen* or experiment* -> doesn't find any results.

Sidenote: Is there any reason why experiment* doesn't work, but experiment does? that means I usually have to hack it in code to pass two search words, one with the wild card and one without, because I want to hide that wildcard away from the user.

@olivernn
Copy link
Owner

olivernn commented Jun 3, 2017

This is because the terms in the index are stemmed, so there is no token "experiment" in the index, instead it is stemmed to "experi", you can test this with idx.pipeline.runString("experiment").

When you search for a token without a wildcard, the search term is also stemmed, so for "experiment" there will still be a match as it is stemmed to the same token. When the search term includes a wildcard the stemmer is bypassed. This is because its difficult to correctly stem something with a wildcard correctly.

You can force the token to be stemmed, even if it includes a wildcard, but you must use the lunr.Index#query interface, e.g.

idx.query(function (q) {
  q.term("experiment*", { usePipeline: true })
})

The root of the problem though is that performing typeahead style searches is really a different problem than a normal search. In the past I've suggested doing the following kind of search for typeahead style search:

idx.query(function (q) {
  q.term(queryTerm, { usePipeline: true, boost: 100 })
  q.term(queryTerm + "*", { usePipeline: false, boost: 10 })
  q.term(queryTerm, { usePipeline: false, editDistance: 1 })
})

So do a search for the exact query term, the query term with wildcards at the end (and possibly the beginning) and finally the query term with an edit distance (1 is probably enough but feel free to experiment).

@olivernn
Copy link
Owner

I'm going to close this issue now as I don't think there is anything more to be done, feel free to re-open or add a comment if you disagree or have any other questions.

@tehandyb
Copy link

tehandyb commented Jul 24, 2017

Hey I've tried to use the query api like you mentioned, adding a wildcard and setting usePipeline to false but I'm finding that when I search it still uses the stemmer. Maybe I'm not understanding correctly.
My example: The search input is 'competitiv*', and I can see the invertedIndex has the stem 'competit', and the document itself has the word 'competitiveness' in it. The search only works if I remove the stemmer inside the Lunr convienience function with this.pipeline.remove(Lunr.stemmer)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants