-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Databricks installation script #457
Conversation
Add installation sh Update SETUP.md accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, just a few small changes
also in general I'm wondering if we should move towards python based scripts which could generalize better across OS's. I wouldn't worry about it right now. but food for thought
Handle error cases such as no databricks-cli installed, cluster id not found, etc. SETUP.md bug fix as databricks-cli works for not just Azure Databricks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great, nice work
…mmenders into jumin/databricks_setup_sh
Agreed with Scott re: python vs. shell. This does a small part of the create and config notebook in the pull request here: #438. It assumes a supported version of ADB is running on the cluster (4.1). Using the I had chosen python rather than sh so I only had to do 1 REST call to install the libraries instead of doing multiple CLI calls, but moving those to Re: dependencies, it requires zip is installed (usually, that's available, but I didn't have access to it by default within a clean Ubuntu from a Windows subsystem for linux). I don't know if we want to mention this or not. |
To follow-up on my last comment; the question I would like feedback on is that we have a few options with how to simplify setup on azure databricks. Do we want 1 setup script for adb that installs only the |
The former is preferred by myself. It is more modular and more flexible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, Scott idea is interesting
Currently testing an addition to do the second part of installation to prepare for operationalization. |
Just had quick chat w/ @jreynolds01. We can utilize this script within the create_and_configure_(databricks)_cluster notebook. So that this script can serve those who already have Databricks running and want to install our repo on it, while Jeremy's notebook serves more o18n forcused use-cases. |
Update to mention the databricks installation script doesn't handle dependencies, and set prerequisites a part from the main step.
Last commit addresses the bash script issue when using Windows' git-bash |
…mmenders into jumin/databricks_setup_sh
@loomlike - can you do a check of the script I just added for o16n? |
Update to mention the databricks installation script doesn't handle dependencies, and set prerequisites a part from the main step.
…mmenders into jumin/databricks_setup_sh
* For the [reco_utils](reco_utils) import to work on Databricks, it is important to zip the content correctly. The zip has to be performed inside the Recommenders folder, if you zip directly above the Recommenders folder, it won't work. | ||
|
||
## Prepare Azure Databricks for Operationalization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nikhilrj - please take a look at this SETUP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually do we even need the manual installation steps now? Maybe we should cut them out...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nikhilrj hmmm good point. But basically the install-script does the same thing as the manual steps... Maybe we can explain what the script does instead of having those contents as 'manual installation'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just read your comment below. I agree the SETUP is a bit long now... what's other's thought?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it's long, but i think the benefits of having all setup in 1 place is worth it. I think we should provide manual information in case someone can't run the scripts for some reason.
However, there are a few different ways to do that:
- Say something like see the scripts and comments in the scripts for manual dependencies, and include links to documentation on how to add libraries, etc. The scripts do implement all the steps, so that is in some ways self-documenting
- Another option would be to have a separate SETUP_MANUAL.md file, and use that as a an Appendix of sorts, where we could reference in the default SETUP.md.
I think if we want to clean it up first bullet is probably a good way to do it. I'd say let's go ahead and merge, and have that as a follow-up action.
I don't understand why we are getting the error in the spark pipeline. I'll investigate tomorrow, but feel free to ignore it for this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work!
* For the [reco_utils](reco_utils) import to work on Databricks, it is important to zip the content correctly. The zip has to be performed inside the Recommenders folder, if you zip directly above the Recommenders folder, it won't work. | ||
|
||
## Prepare Azure Databricks for Operationalization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Main comment is that the Setup.md is really long now. Maybe we should cut out the manual installation steps in the Operationalization dependency setup?
I'm less worried about a long setup as we have a mini setup in the top level readme.
* For the [reco_utils](reco_utils) import to work on Databricks, it is important to zip the content correctly. The zip has to be performed inside the Recommenders folder, if you zip directly above the Recommenders folder, it won't work. | ||
|
||
## Prepare Azure Databricks for Operationalization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually do we even need the manual installation steps now? Maybe we should cut them out...?
@jreynolds01 Script looks good to me! Just need to update a paragraph in SETUP about Databricks' env to include all CPU, GPU, and Spark. See my comment above. FYI, Databricks runtime ML includes TensorFlow package. For non-ml, cpu- and gpu-clusters, you still can install TF or pyTorch manually. |
…mmenders into jumin/databricks_setup_sh
Description
Motivation and Context
To make repo installation onto Databricks
Releated Issues
#237
Checklist: