-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
more documentation. package still maintained? #7
Comments
Hi @randomgambit , it seem there is nobody giving feedback on this amazing package. I'm trying to use it, but no documentation is there. Can you tell me whether you found more information or something seemed to this? |
I can write up some more documentation if you care about it :) |
that would be great, thanks! |
I've been using your package and it is working very well for me. However, I'm afraid I don't completely understand how it works based on your description. Is there a primary published reference for this algorithm? In particular, the passage
Is not clear to me.
Is this a reference to the reference trigram (both the reference and the query are "listed above")?
Does "these matched elements" refer to the query or the reference? I think it only makes sense if you are taking about the cosine similarity between the reference trigram and the query string, but I could be wrong. In either case, if they match, won't the cosine similarity be perfect by definition? Additionally, you seem to be implying that you are comparing a string with 3 characters to a string with more characters. How do you calculate the cosine similarity of two strings of different length? Based on the current description, I'm not seeing how you distinguish between different matches. Thanks again for your efforts, and if these questions can be answered by a reference, please point me to it. |
I don't have a paper, but it's inspired by fulltext search. In some circles you might see this called trigram or shingle indexing. @Glench wrote a wonderful intuitive description of how it works here: https://github.com/glench/fuzzyset |
Eh. I meant here: http://glench.github.io/fuzzyset.js/ |
Thanks! |
Hi,
First of all, congratulations for this amazing packages that is wayyyy faster than
fuzzymatch
when dealing with large datasets of strings.Do you have more documentation about the matching algorithm that is used here? In particular I am matching sentences together (not only words) such as
this is a sentence
and I wanted to know if your defaut settings were appropriate in that case (ngrams=2 for instance).How can I change them?
Many thanks for your help
The text was updated successfully, but these errors were encountered: