-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Working smf #151
Open
DanielTakeshi
wants to merge
18
commits into
master
Choose a base branch
from
workingSMF
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Working smf #151
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I managed to figure out a way to get RMSE of 0.845380, with the following settings:
The performance of the algorithm seems to be very sensitive to the settings, so one should set them carefully. |
Now the next step is to figure out how to use ADAGrad. A couple of points: (1) this will assume that sigma^2 can be computed in one minibatch, (2) this will assume IID components in the matrix, which is clearly violated with, say, one user's column, (3) results are odd, I don't know why RMSE is rougly 0.7 on the training when there's barely any learning signal, (4) RMSE on test set oddly increases, 0.91 to 1 as you increase the MB size. I'm still lost on this. =(
My confusion before was about how the model matrices were still updating even though I wasn't accepting anything in the updater. It turns out that the SMF code will update it in the mupdate method. Ugh ...
I resolved my earlier questions. Now, let's TRY to get MHTest to work on this ... gulp.
two issues: (1) CPU allocation, and (2) we always seem to be accepting, I've only seen one time where it rejected. Does that make sense? Note that I had to cut a threshold of -15 as the log prob for SMF, which should be OK.
Update: don't merge this into master yet. I am doing some more work offline. |
… can track acceptance rates
Now let's wait to see what John thinks of the proposal issue funcitno. Then I can benchmark with different tests but same hyperparameters for withMHTest and noMHTest
To check out performance at different settings, etc.
Now let me switch focus to MALA ...
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
@jcanny Here's what I did for this pull request:
The goal here is to be able to test the SMF.scala code on data (such as netflix, which I use here) using ADAGrad for stochastic gradient updates. This is different from SFA.scala which will internally use a more complicated conjugate gradient updater which I don't know about.
Stuff added:
testsmf_v2.ssc
as proof of concept.Stuff modified:
SFA.scala
to remove learner methods which explicitly form an updater as input. The reason is that the SFA code already calls ADAGrad updaters internally (and uses conjugate gradient as well) so there is no need to add more gradient updates to it.SMF.scala
to remove learner methods which do not explicitly form an updater as input. It's the reverse reason for SMF as compared to SFA, because SMF needs an updater to be formed. I added a learner which will form an ADAGrad updater, though the default parameters may be bad.I also added a predictor method which will explicitly form an empty user matrix like the SFA predictor, and I needed a second evalfun in SMF which will use that matrix.
Finally, I modified the default evalfun for training so that it assigns omats, just like in SFA. This is the only thing that might affect existing code (other than if code was calling the outdated learners).
I added minor documentation.
Test results:
On the netflix data, I can get a RMSE of 0.90 with 1 pass using SMF.scala and the settings in
testsmf_v2.ssc
. Unfortunately, RMSE rises to 0.95 and 0.97 with 2 and 3 passes, respectively, and soon, it becomes no better than doing SMF with a second factor matrix of all zeros (this is the matrix of size (opts.dim, 480k)).The results are disconcerting, but these may be because I'm not using good settings for ADAGrad. We should ideally be getting RMSE roughly 0.85 or so. Perhaps there is something else we need in this SMF code?