Databricks installation script #457

loomlike · 2019-01-28T18:33:18Z

Description

Add repo installation sh for Databricks
Update SETUP.md accordingly

Motivation and Context

To make repo installation onto Databricks

Releated Issues

#237

Checklist:

My code follows the code style of this project, as detailed in our contribution guidelines.
I have added tests.
I have updated the documentation accordingly.

Add installation sh Update SETUP.md accordingly

gramhagen

looks good, just a few small changes
also in general I'm wondering if we should move towards python based scripts which could generalize better across OS's. I wouldn't worry about it right now. but food for thought

SETUP.md

scripts/databricks_install.sh

Handle error cases such as no databricks-cli installed, cluster id not found, etc. SETUP.md bug fix as databricks-cli works for not just Azure Databricks

gramhagen

looks great, nice work

…mmenders into jumin/databricks_setup_sh

jreynolds01 · 2019-01-29T01:12:13Z

Agreed with Scott re: python vs. shell.

This does a small part of the create and config notebook in the pull request here: #438.

It assumes a supported version of ADB is running on the cluster (4.1). Using the .sh and CLI approach to do this is fine, but would like to see that this is extensible to support the full set up, including cluster creation and potentially preparing for operationalization (which includes additional library installation, including the cosmosdb-spark connector).

I had chosen python rather than sh so I only had to do 1 REST call to install the libraries instead of doing multiple CLI calls, but moving those to sh is fine if we think that's easier.

Re: dependencies, it requires zip is installed (usually, that's available, but I didn't have access to it by default within a clean Ubuntu from a Windows subsystem for linux). I don't know if we want to mention this or not.

jreynolds01 · 2019-01-29T01:36:49Z

To follow-up on my last comment; the question I would like feedback on is that we have a few options with how to simplify setup on azure databricks.

Do we want 1 setup script for adb that installs only the reco_utils module and then a second setup script that facilitates the preparation for operationalization. Or do we want to 1 setup script that can install all dependencies end-to-end depending on flags. Which do we think is easier for the user?

yueguoguo · 2019-01-29T09:37:18Z

Do we want 1 setup script for adb that installs only the reco_utils module and then a second setup script that facilitates the preparation for operationalization. Or do we want to 1 setup script that can install all dependencies end-to-end depending on flags. Which do we think is easier for the user?

The former is preferred by myself. It is more modular and more flexible.

SETUP.md

miguelgfierro

Looks great, Scott idea is interesting

jreynolds01 · 2019-01-29T16:42:59Z

Currently testing an addition to do the second part of installation to prepare for operationalization.

loomlike · 2019-01-29T17:05:13Z

Just had quick chat w/ @jreynolds01. We can utilize this script within the create_and_configure_(databricks)_cluster notebook. So that this script can serve those who already have Databricks running and want to install our repo on it, while Jeremy's notebook serves more o18n forcused use-cases.

Update to mention the databricks installation script doesn't handle dependencies, and set prerequisites a part from the main step.

loomlike · 2019-01-29T19:52:43Z

Last commit addresses the bash script issue when using Windows' git-bash

…mmenders into jumin/databricks_setup_sh

jreynolds01 · 2019-01-29T21:50:52Z

@loomlike - can you do a check of the script I just added for o16n?

This reverts commit 45b2b03.

Update to mention the databricks installation script doesn't handle dependencies, and set prerequisites a part from the main step.

…mmenders into jumin/databricks_setup_sh

…sion"" This reverts commit 1dd9c74.

This reverts commit 530f713.

This reverts commit 93f2ce9.

This reverts commit 213cf84.

…to have all setup in one place

jreynolds01 · 2019-01-30T17:52:59Z

SETUP.md

 * For the [reco_utils](reco_utils) import to work on Databricks, it is important to zip the content correctly. The zip has to be performed inside the Recommenders folder, if you zip directly above the Recommenders folder, it won't work.

+## Prepare Azure Databricks for Operationalization


@nikhilrj - please take a look at this SETUP.

Actually do we even need the manual installation steps now? Maybe we should cut them out...?

@nikhilrj hmmm good point. But basically the install-script does the same thing as the manual steps... Maybe we can explain what the script does instead of having those contents as 'manual installation'?

Just read your comment below. I agree the SETUP is a bit long now... what's other's thought?

i think it's long, but i think the benefits of having all setup in 1 place is worth it. I think we should provide manual information in case someone can't run the scripts for some reason.

However, there are a few different ways to do that:

Say something like see the scripts and comments in the scripts for manual dependencies, and include links to documentation on how to add libraries, etc. The scripts do implement all the steps, so that is in some ways self-documenting

Another option would be to have a separate SETUP_MANUAL.md file, and use that as a an Appendix of sorts, where we could reference in the default SETUP.md.

I think if we want to clean it up first bullet is probably a good way to do it. I'd say let's go ahead and merge, and have that as a follow-up action.

miguelgfierro · 2019-01-30T20:26:39Z

I don't understand why we are getting the error in the spark pipeline. I'll investigate tomorrow, but feel free to ignore it for this PR

nikhilrj

Awesome work!

nikhilrj · 2019-01-31T01:16:57Z

SETUP.md

 * For the [reco_utils](reco_utils) import to work on Databricks, it is important to zip the content correctly. The zip has to be performed inside the Recommenders folder, if you zip directly above the Recommenders folder, it won't work.

+## Prepare Azure Databricks for Operationalization


nikhilrj

Looks good! Main comment is that the Setup.md is really long now. Maybe we should cut out the manual installation steps in the Operationalization dependency setup?

I'm less worried about a long setup as we have a mini setup in the top level readme.

SETUP.md

nikhilrj · 2019-01-31T01:27:38Z

SETUP.md

 * For the [reco_utils](reco_utils) import to work on Databricks, it is important to zip the content correctly. The zip has to be performed inside the Recommenders folder, if you zip directly above the Recommenders folder, it won't work.

+## Prepare Azure Databricks for Operationalization


Actually do we even need the manual installation steps now? Maybe we should cut them out...?

loomlike · 2019-01-31T04:30:21Z

@jreynolds01 Script looks good to me! Just need to update a paragraph in SETUP about Databricks' env to include all CPU, GPU, and Spark. See my comment above.

FYI, Databricks runtime ML includes TensorFlow package. For non-ml, cpu- and gpu-clusters, you still can install TF or pyTorch manually.

…mmenders into jumin/databricks_setup_sh

Databricks installation script

0b70706

Add installation sh Update SETUP.md accordingly

loomlike requested review from miguelgfierro and gramhagen January 28, 2019 18:33

loomlike requested a review from yueguoguo as a code owner January 28, 2019 18:33

gramhagen requested changes Jan 28, 2019

View reviewed changes

SETUP.md Outdated Show resolved Hide resolved

scripts/databricks_install.sh Outdated Show resolved Hide resolved

scripts/databricks_install.sh Outdated Show resolved Hide resolved

scripts/databricks_install.sh Outdated Show resolved Hide resolved

miguelgfierro added this to the Reference architecture review milestone Jan 28, 2019

loomlike and others added 2 commits January 29, 2019 00:27

Update databricks installation script

0d7a3ea

Handle error cases such as no databricks-cli installed, cluster id not found, etc. SETUP.md bug fix as databricks-cli works for not just Azure Databricks

Update SETUP

f6f6309

gramhagen approved these changes Jan 29, 2019

View reviewed changes

Merge branch 'jumin/databricks_setup_sh' of github.com:Microsoft/Reco…

12c1b3d

…mmenders into jumin/databricks_setup_sh

yueguoguo reviewed Jan 29, 2019

View reviewed changes

SETUP.md Outdated Show resolved Hide resolved

miguelgfierro approved these changes Jan 29, 2019

View reviewed changes

Update SETUP

3a2a771

Update to mention the databricks installation script doesn't handle dependencies, and set prerequisites a part from the main step.

loomlike requested a review from jreynolds01 January 29, 2019 17:51

jreynolds01 and others added 2 commits January 29, 2019 11:53

update error message

cfb600d

Fix to work on Windows git-bash

fbcec2d

jreynolds01 added 3 commits January 29, 2019 14:46

add script to configure ADB for operationalization

0a95037

Merge branch 'jumin/databricks_setup_sh' of github.com:Microsoft/Reco…

988fa9d

…mmenders into jumin/databricks_setup_sh

add a minor update to error message

6c247d0

jreynolds01 approved these changes Jan 29, 2019

View reviewed changes

jreynolds01 added 2 commits January 29, 2019 14:56

update o16n prep script to work on gitbash

e94eaa5

Revert "add comment for pyspark_version"

213cf84

This reverts commit 45b2b03.

jreynolds01 and others added 7 commits January 29, 2019 19:13

add script to configure ADB for operationalization

4cbbb75

Update SETUP

235d137

Update to mention the databricks installation script doesn't handle dependencies, and set prerequisites a part from the main step.

Fix to work on Windows git-bash

42faa78

add a minor update to error message

8e62c45

update o16n prep script to work on gitbash

48b2fec

update usage message

e039719

Merge branch 'jumin/databricks_setup_sh' of github.com:Microsoft/Reco…

74ebb52

…mmenders into jumin/databricks_setup_sh

jreynolds01 mentioned this pull request Jan 30, 2019

Jeremr refarch review #438

Closed

1 task

jreynolds01 added 7 commits January 30, 2019 09:14

Merge branch 'staging' into jumin/databricks_setup_sh

2b2d41a

Revert "Revert "update conda script to take a parameter for spark ver…

60ec26b

…sion"" This reverts commit 1dd9c74.

Revert "Revert "add default name to conda file""

6ebf3e3

This reverts commit 530f713.

Revert "Revert "update help and error messages to be more informative.""

e5999f4

This reverts commit 93f2ce9.

Revert "Revert "add comment for pyspark_version""

fd553ac

This reverts commit 213cf84.

fix name fieldin generate_conda_file again

23e9796

update SETUP.md for clarity and add a section for operationalization …

2e115de

…to have all setup in one place

jreynolds01 reviewed Jan 30, 2019

View reviewed changes

Merge branch 'staging' into jumin/databricks_setup_sh

bde7866

nikhilrj assigned nikhilrj and unassigned nikhilrj Jan 30, 2019

nikhilrj self-requested a review January 30, 2019 20:12

nikhilrj approved these changes Jan 31, 2019

View reviewed changes

jreynolds01 added 2 commits January 31, 2019 07:03

update databricks envs supported

23190d0

Merge branch 'jumin/databricks_setup_sh' of github.com:Microsoft/Reco…

e2b78b9

…mmenders into jumin/databricks_setup_sh

jreynolds01 merged commit ced29ce into staging Jan 31, 2019

miguelgfierro deleted the jumin/databricks_setup_sh branch February 7, 2019 10:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Databricks installation script #457

Databricks installation script #457

loomlike commented Jan 28, 2019

gramhagen left a comment

gramhagen left a comment

jreynolds01 commented Jan 29, 2019 •

edited

Loading

jreynolds01 commented Jan 29, 2019

yueguoguo commented Jan 29, 2019

miguelgfierro left a comment

jreynolds01 commented Jan 29, 2019

loomlike commented Jan 29, 2019

loomlike commented Jan 29, 2019

jreynolds01 commented Jan 29, 2019

jreynolds01 Jan 30, 2019

nikhilrj Jan 31, 2019

nikhilrj Jan 31, 2019

loomlike Jan 31, 2019

loomlike Jan 31, 2019

jreynolds01 Jan 31, 2019

miguelgfierro commented Jan 30, 2019

nikhilrj left a comment

nikhilrj Jan 31, 2019

nikhilrj left a comment

nikhilrj Jan 31, 2019

loomlike commented Jan 31, 2019

		* For the [reco_utils](reco_utils) import to work on Databricks, it is important to zip the content correctly. The zip has to be performed inside the Recommenders folder, if you zip directly above the Recommenders folder, it won't work.

		## Prepare Azure Databricks for Operationalization

Databricks installation script #457

Databricks installation script #457

Conversation

loomlike commented Jan 28, 2019

Description

Motivation and Context

Releated Issues

Checklist:

gramhagen left a comment

Choose a reason for hiding this comment

gramhagen left a comment

Choose a reason for hiding this comment

jreynolds01 commented Jan 29, 2019 • edited Loading

jreynolds01 commented Jan 29, 2019

yueguoguo commented Jan 29, 2019

miguelgfierro left a comment

Choose a reason for hiding this comment

jreynolds01 commented Jan 29, 2019

loomlike commented Jan 29, 2019

loomlike commented Jan 29, 2019

jreynolds01 commented Jan 29, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

miguelgfierro commented Jan 30, 2019

nikhilrj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikhilrj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

loomlike commented Jan 31, 2019

jreynolds01 commented Jan 29, 2019 •

edited

Loading