-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review CPU notebooks and run with Python 3.9 #1950
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
error appearing in Azureml, but not in local
See run https://github.com/microsoft/recommenders/actions/runs/5388016531/jobs/9780549827 |
@anargyri the issue with scipy is fixed, I'm reruning the tests, hopefuly they pass. Please review. |
One problem we are seeing is that the old version of SAR with python 3.6 gave these results:
while the new version with Python 3.9 gives these results:
with Python 3.8:
with Python 3.7:
with Python 3.6:
with Python 3.6 and Pandas<1:
|
"MAP:\t\t 0.113796\n", | ||
"NDCG:\t\t 0.384809\n", | ||
"Precision@K:\t 0.331707\n", | ||
"Recall@K:\t 0.182571\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anargyri I run the notebook with python 3.6-3.9 and I always got the same results. So I believe that the numbers are correct.
An hypothesis of what could have happened is that we haven't updated this notebook since before 2020 https://github.com/microsoft/recommenders/commits/main/examples/02_model_collaborative_filtering/sar_deep_dive.ipynb and in the mean time, there was been some changes, like the TOP_K issue Chuyang found 68b60c0. That could explain the different numbers. Another option is the splitter, again, since it's been a while that we haven't touched the notebook, we might have not detected them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think you are right. Some time in the past there was a change but the notebook outputs were not checked in properly. Btw I think we do not ensure against something like this in the unit tests, do we? I.e. we just do a test that this notebook executes without error but we don't test that the results don't change significantly (I remember that this can be a source of many failing tests, but maybe we should revisit?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the discrepancy is around 20% which should raise a flag imo. I mean, the error tolerance can be high enough, but at least there should be some error checking, whereas now there is none.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, we are not checking this particular notebook with the metrics, we are only making sure it runs in the unit tests.
I have added an issue: #1955
Description
Related Issues
related to #1947
References
Checklist:
staging branch
and not tomain branch
.