-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cosine full vector implementation #141
base: master
Are you sure you want to change the base?
Conversation
3fd53d4
to
6c4a819
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, I made a quick review (without diving into the code).
Could you please provide a few benchmark (RMSE, MAE, maybe recall / precision as in FAQ, computation time...)
doc/source/prediction_algorithms.rst
Outdated
@@ -130,6 +130,9 @@ argument is a dictionary with the following (all optional) keys: | |||
``'False'``) for the similarity not to be zero. Simply put, if | |||
:math:`|I_{uv}| < \text{min_support}` then :math:`\text{sim}(u, v) = 0`. The | |||
same goes for items. | |||
- ``'common_ratings_only'``: Determines whether only common user/item ratings are | |||
taken into account or all the full rating vectors are considered | |||
(only relevant for cosine-based similraty). Default is True. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be 'Default is True
' with two '`'
surprise/similarities.pyx
Outdated
|
||
Depending on ``common_ratings_only`` field of ``sim_options`` | ||
only common users (or items) are taken into account, or full rating | ||
vectors (default: True). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
surprise/similarities.pyx
Outdated
sqi[xi, xj] += ri**2 | ||
sqj[xi, xj] += rj**2 | ||
sqi[xi, xj] += ri ** 2 | ||
sqj[xi, xj] += rj ** 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to change this?
xi_iter = iter(sorted_y_ratings) | ||
try: | ||
xi_non_missing, ri_non_missing = next(xi_iter) | ||
except StopIteration: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could all the StopIteration be avoided?
surprise/similarities.pyx
Outdated
@@ -149,7 +244,7 @@ def msd(n_x, yr, min_support): | |||
for y, y_ratings in iteritems(yr): | |||
for xi, ri in y_ratings: | |||
for xj, rj in y_ratings: | |||
sq_diff[xi, xj] += (ri - rj)**2 | |||
sq_diff[xi, xj] += (ri - rj) ** 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not relevant to the PR
surprise/similarities.pyx
Outdated
sqi[xi, xj] += ri**2 | ||
sqj[xi, xj] += rj**2 | ||
sqi[xi, xj] += ri ** 2 | ||
sqj[xi, xj] += rj ** 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
surprise/similarities.pyx
Outdated
sq_diff_i[xi, xj] += diff_i**2 | ||
sq_diff_j[xi, xj] += diff_j**2 | ||
sq_diff_i[xi, xj] += diff_i ** 2 | ||
sq_diff_j[xi, xj] += diff_j ** 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
3: [(1, 1), (2, 4), (3, 2), (4, 3), (5, 3), (6, 3.5), (7, 2)], # noqa | ||
4: [(1, 5), (2, 1), (5, 2), (6, 2.5), (7, 2.5)], # noqa | ||
3: [ (1, 1), (2, 4), (3, 2), (4, 3), (5, 3), (6, 3.5), (7, 2)], # noqa | ||
4: [ (1, 5), (2, 1), (5, 2), (6, 2.5), (7, 2.5)], # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
6c4a819
to
df067c0
Compare
df067c0
to
2442bdf
Compare
No, sorry, I don't have time for this measurement. And I added full cosine only to cover all possible options and to prepare common grounds for adjusted cosine (basically, I just add features necessary to pass the "recommender systems" specialization on Coursera, as I have seen complaints there about a lack of Python libs supproting this course). Regarding the reformatting in the test examples - with previous formatting they were awfully confusing, being improperly aligned (probably, I broke it with my previous PR). This certainly should be fixed. Other issues fixed |
I'm sorry but I can't accept a new algorithm / similarity measure without even having a vague idea of its performance. |
For similarity metrics currently you haven't made such measurements yourself, at least I have found only measurements on a single default sim metric. So, probably you would allow to skip it for a new similarity metrics as well? Otherwise, I kindly ask you to provide detailed requirements on what you are expecting of me. |
Ho trust me, I have.
No, please don't ask me this, really.
I'm not asking for much. A few CV procedures on ml-100k and ml-1m, comparing
Good idea
Absolutely. Benchmarking is about comparing, see my point above. |
Could you give a link on an existing similarity metrics comparison, as an example of what you are asking for? Or just inform when you will update contributors guideline on this? |
I don't have any link. I'm just asking for what I've already described in my previous post. |
Any update / benchmark on this? |
Implemented cosine for full-vector (as was requested - adjusted cosine PR will be made separately, after this one). Tests and docs added.