Monitoring Azure Databricks in an Azure Log Analytics Workspace

This repository extends the core monitoring functionality of Azure Databricks to send streaming query event information to Azure Log Analytics. It has the following directory structure:

/src
  /spark-listeners-loganalytics
  /spark-listeners
  /pom.xml

The spark-jobs directory is a sample Spark application with sample code demonstrating how to implement a Spark application metric counter.

The spark-listeners-loganalytics and spark-listeners directories contain the code for building the two JAR files that are deployed to the Databricks cluster. The spark-listeners directory includes a scripts directory that contains a cluster node initialization script to copy the JAR files from a staging directory in the Azure Databricks file system to execution nodes.

The pom.xml file is the main Maven project object model build file for the entire project.

Prerequisites

Before you begin, ensure you have the following prerequisites in place:

Clone or download this GitHub repository.
An active Azure Databricks workspace. For instructions on how to deploy an Azure Databricks workspace, see get started with Azure Databricks..
Install the Azure Databricks CLI.
- An Azure Databricks personal access token is required to use the CLI. For instructions, see token management.
- You can also use the Azure Databricks CLI from the Azure Cloud Shell.
A Java IDE, with the following resources:

Build the Azure Databricks monitoring library

You can build the library using either Docker or Maven.

Option 1: Docker

Linux:

chmod +x spark-monitoring/build.sh
docker run -it --rm -v `pwd`/spark-monitoring:/spark-monitoring -v "$HOME/.m2":/root/.m2 maven:3.6.1-jdk-8 /spark-monitoring/build.sh

Windows:

docker run -it --rm -v %cd%/spark-monitoring:/spark-monitoring -v "%USERPROFILE%/.m2":/root/.m2 maven:3.6.1-jdk-8 /spark-monitoring/build.sh

Option 2: Maven

Import the Maven project project object model file, pom.xml, located in the /src folder into your project. This will import two projects:
- spark-listeners
- spark-listeners-loganalytics
Activate a single Maven profile that corresponds to the versions of the Scala/Spark combination that is being used. By default, the Scala 2.11 and Spark 2.4.3 profile is active.
Execute the Maven package phase in your Java IDE to build the JAR files for each of the these projects:

Project JAR file

spark-listeners spark-listeners__-.jar

spark-listeners-loganalytics spark-listeners-loganalytics__-.jar

Configure the Databricks workspace

Copy the JAR files and init scripts to Databricks.

Use the Azure Databricks CLI to create a directory named dbfs:/databricks/spark-monitoring:
```
dbfs mkdirs dbfs:/databricks/spark-monitoring
```
Open the /src/spark-listeners/scripts/spark-monitoring.sh script file and add your Log Analytics Workspace ID and Key to the lines below:
```
export LOG_ANALYTICS_WORKSPACE_ID=
export LOG_ANALYTICS_WORKSPACE_KEY=
```
Use the Azure Databricks CLI to copy /src/spark-listeners/scripts/spark-monitoring.sh to the directory created in step 3:
```
dbfs cp <local path to spark-monitoring.sh> dbfs:/databricks/spark-monitoring/spark-monitoring.sh
```
Use the Azure Databricks CLI to copy all of the jar files from the spark-monitoring/src/target folder to the directory created in step 3:
```
dbfs cp --overwrite --recursive <local path to target folder> dbfs:/databricks/spark-monitoring/
```

Create and configure the Azure Databricks cluster

Navigate to your Azure Databricks workspace in the Azure Portal.
On the home page, click "new cluster".
Choose a name for your cluster and enter it in "cluster name" text box.
In the "Databricks Runtime Version" dropdown, select 5.0 or later (includes Apache Spark 2.4.0, Scala 2.11).
Under "Advanced Options", click on the "Init Scripts" tab. Go to the last line under the "Init Scripts section" Under the "destination" dropdown, select "DBFS". Enter "dbfs:/databricks/spark-monitoring/spark-monitoring.sh" in the text box. Click the "add" button.
Click the "create cluster" button to create the cluster. Next, click on the "start" button to start the cluster.

Run the sample job (optional)

The monitoring library includes a sample application that shows how to send application metrics and application logs to Azure Monitor.

Use Maven to build the POM located at sample/spark-sample-job/pm.xml or run the following Docker command:

Linux:

docker run -it --rm -v `pwd`/spark-monitoring/sample/spark-sample-job:/spark-monitoring -v "$HOME/.m2":/root/.m2 -w /spark-monitoring maven:3.6.1-jdk-8 mvn clean package

Windows:

docker run -it --rm -v %cd%/spark-monitoring/sample/spark-sample-job:/spark-monitoring -v "%USERPROFILE%/.m2":/root/.m2 -w /spark-monitoring maven:3.6.1-jdk-8 mvn clean package

Navigate to your Databricks workspace and create a new job, as described here.
In the job detail page, select Set JAR.
Upload the JAR file from /src/spark-jobs/target/spark-jobs-1.0-SNAPSHOT.jar.
For Main class, enter com.microsoft.pnp.samplejob.StreamingQueryListenerSampleJob.

When the job runs, you can view the application logs and metrics in your Log Analytics workspace. After you verify the metrics appear, stop the sample application job.

More information

For more information about using this library to monitor Azure Databricks, see Monitoring Azure Databricks

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
johntest		johntest
perftools		perftools
sample/spark-sample-job		sample/spark-sample-job
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Monitoring Azure Databricks in an Azure Log Analytics Workspace

Prerequisites

Build the Azure Databricks monitoring library

Option 1: Docker

Option 2: Maven

Configure the Databricks workspace

Create and configure the Azure Databricks cluster

Run the sample job (optional)

More information

About

Releases

Packages

Contributors 7

Languages

Project	JAR file
spark-listeners	spark-listeners__-.jar
spark-listeners-loganalytics	spark-listeners-loganalytics__-.jar

License

john-lourdu/spark-monitoring

Folders and files

Latest commit

History

Repository files navigation

Monitoring Azure Databricks in an Azure Log Analytics Workspace

Prerequisites

Build the Azure Databricks monitoring library

Option 1: Docker

Option 2: Maven

Configure the Databricks workspace

Create and configure the Azure Databricks cluster

Run the sample job (optional)

More information

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages