Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce the similarity as boost functionality to the Word2VecSynonyFilter #12433

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dantuzi
Copy link
Contributor

@dantuzi dantuzi commented Jul 11, 2023

Description

This is the follow-up of #12169

In the Word2VecSynonymFilter, when we extract the synonyms of a term, we have the cosine similarity between the vector associated with the original term and the vector associated with the synonym. The higher the cosine similarity is, the closer the meaning of the two terms.

In this PR we want to add the possibility of using the similarity value as a boost for synonym terms

Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dantuzi!

This change looks good to me -- I just left a comment about back compat.

The naming conflict is "fun", and I think it's OK to rename the "old" BoostAttribute since it is @lucene.internal anyways.

The PR is marked as WIP still (why?), and also now has conflicts. @dantuzi do you want to refresh it? Thanks!

@@ -62,14 +65,16 @@ public Word2VecSynonymFilter(
TokenStream input,
Word2VecSynonymProvider synonymProvider,
int maxSynonymsPerTerm,
float minAcceptedSimilarity) {
float minAcceptedSimilarity,
boolean similarityAsBoost) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a back-compat ctor that doesn't take similarityAsBoost, and defaults it to false I guess?

@dantuzi
Copy link
Contributor Author

dantuzi commented Nov 8, 2023

Thanks @mikemccand for your feedback.
I had to address some comments from @alessandrobenedetti, that's why this PR is still WIP.
At the moment I have other priorities at work but I'll resolve all conflicts and address your comments in the next few days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants