title | hide_title | sidebar_label | description |
---|---|---|---|
R setup |
true |
R setup |
R setup and example for SynapseML |
Requirements: Ensure that R and devtools installed on your machine.
Also make sure you have Apache Spark installed. If you are using Sparklyr, you can use spark-install. Be sure to specify the correct version. As of this writing, that should be version="3.2". spark_install is a bit eccentric and may install a slightly different version. Be sure that the version you get is one that you want.
On Windows, download WinUtils.exe and copy it into the bin
directory of your Spark installation, e.g. C:\Users\user\AppData\Local\Spark\spark-3.3.2-bin-hadoop3\bin
To install the current SynapseML package for R, first install synapseml-core:
...
devtools::install_url("https://mmlspark.azureedge.net/rrr/synapseml-core-0.11.0.zip")
...
and then install any or all of the following packages, depending on your intended usage:
synapseml-cognitive, synapseml-deep-learning, synapseml-lightgbm, synapseml-opencv, synapseml-vw
In other words:
...
devtools::install_url("https://mmlspark.azureedge.net/rrr/synapseml-cognitive-0.11.0.zip")
devtools::install_url("https://mmlspark.azureedge.net/rrr/synapseml-deep-learning-0.11.0.zip")
devtools::install_url("https://mmlspark.azureedge.net/rrr/synapseml-lightgbm-0.11.0.zip")
devtools::install_url("https://mmlspark.azureedge.net/rrr/synapseml-opencv-0.11.0.zip")
devtools::install_url("https://mmlspark.azureedge.net/rrr/synapseml-vw-0.11.0.zip")
...
Installing all dependencies may be time-consuming. When complete, run:
...
library(sparklyr)
library(dplyr)
config <- spark_config()
config$sparklyr.defaultPackages <- "com.microsoft.azure:synapseml_2.12:0.11.1"
sc <- spark_connect(master = "local", config = config)
...
This creates a spark context on your local machine.
We then need to import the R wrappers:
...
library(synapseml.core)
library(synapseml.cognitive)
library(synapseml.deep.learning)
library(synapseml.lightgbm)
library(synapseml.opencv)
library(synapseml.vw)
...
We can use the faithful dataset in R:
...
faithful_df <- copy_to(sc, faithful)
cmd_model = ml_clean_missing_data(
x=faithful_df,
inputCols = c("eruptions", "waiting"),
outputCols = c("eruptions_output", "waiting_output"),
only.model=TRUE)
sdf_transform(cmd_model, faithful_df)
...
You should see the output:
...
# Source: table<sparklyr_tmp_17d66a9d490c> [?? x 4]
# Database: spark_connection
eruptions waiting eruptions_output waiting_output
<dbl> <dbl> <dbl> <dbl>
1 3.600 79 3.600 79
2 1.800 54 1.800 54
3 3.333 74 3.333 74
4 2.283 62 2.283 62
5 4.533 85 4.533 85
6 2.883 55 2.883 55
7 4.700 88 4.700 88
8 3.600 85 3.600 85
9 1.950 51 1.950 51
10 4.350 85 4.350 85
# ... with more rows
...
In Azure Databricks, you can install devtools and the spark package from URL and then use spark_connect with method = "databricks":
install.packages("devtools")
devtools::install_url("https://mmlspark.azureedge.net/rrr/synapseml-0.11.1.zip")
library(sparklyr)
library(dplyr)
sc <- spark_connect(method = "databricks")
faithful_df <- copy_to(sc, faithful)
unfit_model = ml_light_gbmregressor(sc, maxDepth=20, featuresCol="waiting", labelCol="eruptions", numIterations=10, unfit.model=TRUE)
ml_train_regressor(faithful_df, labelCol="eruptions", unfit_model)
Our R bindings are built as part of the normal build process. To get a quick build, start at the root of the synapseml directory, and find the generated files. For instance, to find the R files for deep-learning, run
sbt packageR
ls ./deep-learning/target/scala-2.12/generated/src/R/synapseml/R
You can then run R in a terminal and install the above files directly:
...
devtools::install_local("./deep-learning/target/scala-2.12/generated/src/R/synapseml/R")
...