Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ASK] xDeepFM - Help on saving model checkpoint to Azure ML output directory #970

Closed
ghost opened this issue Nov 4, 2019 · 4 comments
Closed
Labels
help wanted Need help from developers

Comments

@ghost
Copy link

ghost commented Nov 4, 2019

Description

I've taken the xDeepFM deep dive notebook and adapted it so that it can run in Azure Machine Learning Service. I would like Azure to capture the model checkpoints and associated files so that I can download the best run and visualize training in Tensorboard, as well as restore the model at a later point in time. Currently, I do not see any of the model files captured (AML Service needs these to be in the outputs directory).

image

It appears that MODEL_DIR is used in a concatenation above. Should MODEL_DIR be passed in the form of a string such as './outputs' or as an os.path.join type construct?

image

When I tried the above on my local machine, I get the summaries nicely placed in the summaries directory under the outputs directory as I would expect. However the model files are placed in the outputs directory and are prepended with "model"

image

When I run this in Azure ML Service, summary files and model files are not available for download. My hunch is that the relative directory must be off.

Any tips on how to set up MODEL_DIR correctly in order to get the files placed in the outputs directory and how to set this up for running in Azure ML Service would be welcome.

Other Comments

@ghost ghost added the help wanted Need help from developers label Nov 4, 2019
@miguelgfierro
Copy link
Collaborator

maybe @eedeleon can help here?

@elogicaadith
Copy link
Contributor

@miguelgfierro: I have a solution to this problem. It involves adding a "/" to the save_path string as shown below:

image

This code is located in base_model.py. I have validated that this works on the Azure Machine Learning service.

image

I'm in the process of validating that the change works on local notebooks as well. Will you accept a pull request for this?

@elogicaadith
Copy link
Contributor

@miguelgfierro: Just finished testing this change on the local xdeeepfm notebook in 00_quick_start:

image

contents of the local folder in my C: drive -

image

elogicaadith added a commit to elogicaadith/recommenders that referenced this issue Dec 12, 2019
@elogicaadith elogicaadith mentioned this issue Dec 12, 2019
2 tasks
@miguelgfierro
Copy link
Collaborator

Thanks @elogicaadith I added a comment, can you please take a look?

elogicaadith added a commit to elogicaadith/recommenders that referenced this issue Dec 16, 2019
@elogicaadith elogicaadith mentioned this issue Dec 16, 2019
3 tasks
miguelgfierro added a commit that referenced this issue Dec 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Need help from developers
Projects
None yet
Development

No branches or pull requests

2 participants