Returning vectors with similarity above threshold for most_similar() #34

lucas-ubm · 2020-08-24T07:48:47Z

In sentencevectors.py most_similar() can return the topn most similar words. However it would be useful to be able to specify a similarity threshold above which the sentences are returned. For this topn could take a fractional value and therefore if topn is strictly smaller than 1 then it's considered a threshold and otherwise it works in the same way as it does now.

The text was updated successfully, but these errors were encountered:

oborchers · 2021-01-28T08:56:54Z

Yes this is absolutely correct. However, the current implementation is actually highly inefficient in terms of similarty search (brute force). I had plans to include approximate nearest neighbor search, but haven't found time to implement it

oborchers self-assigned this Jan 28, 2021

oborchers added the enhancement New feature or request label Jan 28, 2021

oborchers mentioned this issue Nov 27, 2021

maintenance #49

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Returning vectors with similarity above threshold for most_similar() #34

Returning vectors with similarity above threshold for most_similar() #34

lucas-ubm commented Aug 24, 2020

oborchers commented Jan 28, 2021

Returning vectors with similarity above threshold for most_similar() #34

Returning vectors with similarity above threshold for most_similar() #34

Comments

lucas-ubm commented Aug 24, 2020

oborchers commented Jan 28, 2021