Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some confusion #36

Open
TK-blost opened this issue Apr 3, 2019 · 2 comments
Open

some confusion #36

TK-blost opened this issue Apr 3, 2019 · 2 comments

Comments

@TK-blost
Copy link

TK-blost commented Apr 3, 2019

hello, i am newer to use libFM, it was a great tool
i used mcmc to train a CTR model, i met 2 pro

  1. data has 160 million features, when init_V is small such as 0.001,0.005 it seem normally that auc is 0.6-0.7 but when i set init_V 0.1,0.5 the result just like 0,0,3333,1... i hope you give me some advice
  2. i saw "if mcmc save model it will have to save param every iter" why only save last iter param is not ok ? and way the final y_predict is avg of evey iter.

i hope to receive some reply
thanks!

@srendle
Copy link
Owner

srendle commented Apr 7, 2019

about 2: sgd and als are point estimators that find the "best" model parameters, it is reasonable to use the parameters of the last iteration. MCMC is a sampling method that finds many probable model parameters. Just like it is not a good idea to use one of the decision trees out of a random forest, taking one of the MCMC models won't give a good prediction. Instead, the MCMC models produced by each "iteration" should be used collectively.
about 1: Can you give some more details what you mean by "the result just like 0,0,3333,1..."? Is this the AUC or several predictions of the CTR model?

@TK-blost
Copy link
Author

TK-blost commented Apr 8, 2019

about 2: sgd and als are point estimators that find the "best" model parameters, it is reasonable to use the parameters of the last iteration. MCMC is a sampling method that finds many probable model parameters. Just like it is not a good idea to use one of the decision trees out of a random forest, taking one of the MCMC models won't give a good prediction. Instead, the MCMC models produced by each "iteration" should be used collectively.
about 1: Can you give some more details what you mean by "the result just like 0,0,3333,1..."? Is this the AUC or several predictions of the CTR model?

oh, thank you so much for your reply, about 2 it is the out file of the prediction for TASK_CLASSIFICATION , i may find the reason that i have a lot feature which value is too large such
as several thousand result in cache_e for pre_y is too large if init_stdev is large.
thank you again for your reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants