Skip to content

Latest commit

 

History

History
150 lines (119 loc) · 4.67 KB

R-setup.md

File metadata and controls

150 lines (119 loc) · 4.67 KB
title hide_title sidebar_label description
R setup
true
R setup
R setup and example for SynapseML

R setup and example for SynapseML

Installation

Requirements: Ensure that R and devtools installed on your machine.

Also make sure you have Apache Spark installed. If you are using Sparklyr, you can use spark-install. Be sure to specify the correct version. As of this writing, that should be version="3.2". spark_install is a bit eccentric and may install a slightly different version. Be sure that the version you get is one that you want.

On Windows, download WinUtils.exe and copy it into the bin directory of your Spark installation, e.g. C:\Users\user\AppData\Local\Spark\spark-3.3.2-bin-hadoop3\bin

To install the current SynapseML package for R, first install synapseml-core:

...
devtools::install_url("https://mmlspark.azureedge.net/rrr/synapseml-core-0.11.0.zip")
...

and then install any or all of the following packages, depending on your intended usage:

synapseml-cognitive, synapseml-deep-learning, synapseml-lightgbm, synapseml-opencv, synapseml-vw

In other words:

...
devtools::install_url("https://mmlspark.azureedge.net/rrr/synapseml-cognitive-0.11.0.zip")
devtools::install_url("https://mmlspark.azureedge.net/rrr/synapseml-deep-learning-0.11.0.zip")
devtools::install_url("https://mmlspark.azureedge.net/rrr/synapseml-lightgbm-0.11.0.zip")
devtools::install_url("https://mmlspark.azureedge.net/rrr/synapseml-opencv-0.11.0.zip")
devtools::install_url("https://mmlspark.azureedge.net/rrr/synapseml-vw-0.11.0.zip")
...

Importing libraries and setting up spark context

Installing all dependencies may be time-consuming. When complete, run:

...
library(sparklyr)
library(dplyr)
config <- spark_config()
config$sparklyr.defaultPackages <- "com.microsoft.azure:synapseml_2.12:0.11.1"
sc <- spark_connect(master = "local", config = config)
...

This creates a spark context on your local machine.

We then need to import the R wrappers:

...
 library(synapseml.core)
 library(synapseml.cognitive)
 library(synapseml.deep.learning)
 library(synapseml.lightgbm)
 library(synapseml.opencv)
 library(synapseml.vw)
...

Example

We can use the faithful dataset in R:

...
faithful_df <- copy_to(sc, faithful)
cmd_model = ml_clean_missing_data(
              x=faithful_df,
              inputCols = c("eruptions", "waiting"),
              outputCols = c("eruptions_output", "waiting_output"),
              only.model=TRUE)
sdf_transform(cmd_model, faithful_df)
...

You should see the output:

...
# Source:   table<sparklyr_tmp_17d66a9d490c> [?? x 4]
# Database: spark_connection
   eruptions waiting eruptions_output waiting_output
          <dbl>   <dbl>            <dbl>          <dbl>
          1     3.600      79            3.600             79
          2     1.800      54            1.800             54
          3     3.333      74            3.333             74
          4     2.283      62            2.283             62
          5     4.533      85            4.533             85
          6     2.883      55            2.883             55
          7     4.700      88            4.700             88
          8     3.600      85            3.600             85
          9     1.950      51            1.950             51
          10     4.350      85            4.350             85
          # ... with more rows
...

Azure Databricks

In Azure Databricks, you can install devtools and the spark package from URL and then use spark_connect with method = "databricks":

install.packages("devtools")
devtools::install_url("https://mmlspark.azureedge.net/rrr/synapseml-0.11.1.zip")
library(sparklyr)
library(dplyr)
sc <- spark_connect(method = "databricks")
faithful_df <- copy_to(sc, faithful)
unfit_model = ml_light_gbmregressor(sc, maxDepth=20, featuresCol="waiting", labelCol="eruptions", numIterations=10, unfit.model=TRUE)
ml_train_regressor(faithful_df, labelCol="eruptions", unfit_model)

Building from Source

Our R bindings are built as part of the normal build process. To get a quick build, start at the root of the synapseml directory, and find the generated files. For instance, to find the R files for deep-learning, run

sbt packageR
ls ./deep-learning/target/scala-2.12/generated/src/R/synapseml/R

You can then run R in a terminal and install the above files directly:

...
devtools::install_local("./deep-learning/target/scala-2.12/generated/src/R/synapseml/R")
...