-
Notifications
You must be signed in to change notification settings - Fork 765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add nr_repr_docs parameter #2130
base: master
Are you sure you want to change the base?
Conversation
Thank you for this pull request! I believe there were a number of issues open discussing this, which issue does this PR refer to? I remember there was some discussion on how to approach this there. That perspective will help me understand what was needed and what decisions we made before going through this PR. |
I didn't actually look at past issues—it was more from my own usage that I found this added parameter to be helpful. That said, it looks like this issue is the most pertinent |
Ah, then you might have missed the contributing guideline! I think it's first important to establish what exactly is needed. From the issue you shared as well as the connected one, it seems that users also want to re-calculate the number of representative documents after training the model. Also, what made you choose to add this parameter to |
Whoops, missed that part on creating an issue! 😅 WRT what is exactly needed—AFAICT, I think there are three needs/wants based on the specific usage pattern.
This PR would solve for 1 and 3 (given the As for why I included it in
Let me know if you think I'm off in any way |
I think it would indeed be nice to give users the ability to choose the number of representative documents. Wouldn't it be easier to update the number of representative documents after fitting the model? That way, it would solve all 3 problems without the need to actually needing to refit the model constantly. The representative documents should not affect the representations (except for LLM-based representation models which has separate parameters for this) so it does not feel like it should be a parameter related to either init or transform. It might be easiest and more related to the task itself to put it after training the model.
Generally, we would put these kinds of parameter in the |
What does this PR do?
Adds a parameter that you can pass around for getting the
nr_repr_docs
. This helps when attempting to get a higher # of representative docs for a given topic.Before submitting