-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add intra_distance_score evaluation #103
base: master
Are you sure you want to change the base?
Conversation
Thank you for this! I think to move this forward, we'll want to do two things.
On making this fast: as far as I understand, we want to compute the cosine distance between the columns i and j of the (sparse) interaction matrix. There are a couple of things that we can do to make this faster than the current implementation:
This way we don't need the cache either.
We can make it even faster by using the fact that indices are sorted and using numba, but this is probably a good first step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes, I made some additional comments.
More generally:
- Can you give me a sense of how slow/fast this is? Could you post how long it takes for the Movielens 100k dataset, for instance?
- We will need a version of this for sequence-based models. Have a look at the MRR routines to see how the two differ.
spotlight/evaluation.py
Outdated
|
||
distances = [] | ||
test = test.tocsr() | ||
mat = train.tocoo().T.tocsr() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the coo
conversion necessary?
spotlight/evaluation.py
Outdated
""" | ||
|
||
distances = [] | ||
test = test.tocsr() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we need test
for? For knowing which users to compute the predictions for?
Maybe a cleaner way of doing the same would be to allow the user to pass in an optional array of user ids for which the metric should be computed.
spotlight/evaluation.py
Outdated
for user_id, row in enumerate(test): | ||
predictions = -model.predict(user_id) | ||
rec_list = predictions.argsort()[:k] | ||
distance = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally find nested list comprehensions very confusing. Could we use nested for loops here?
Thank you very much for the comments. I agree. In addition to these comments, I'm still looking for a way to move distance function to input parameters. The current function will be the default one since it's really fast (I'll post exact times separately) |
I've fixed the review comments except for the sequence-based models solution. calling I should check how we can achieve to run this on sequence-based models; So we need a matrix like this to calculate the distance; where each row represents an item. Any advise or guidance would be greatly appreciated. |
Issue #90 (Diversification metrics for evaluation)
Intra_distance diversity is probably mostly considered metric of diversity.
Therefore I'd like to add it first.