Skip to content

An introduction to analyzing data using Spark in Azure Databricks

License

Notifications You must be signed in to change notification settings

daisukei777/databricks-intro

 
 

Repository files navigation

Introduction to Databricks

Use the labs in this repo to get started with Spark in Azure Databricks.

Start by following the Setup Guide to prepare your Azure environment and download the labfiles used in the lab exercises. Then complete the labs in the following order:

  1. Lab 1 - Getting Started with Spark. In this lab you'll learn how to provision a Spark cluster in an Azure Databricks workspace, and use it to analyze data interactively using Python or Scala.
  2. Lab 2 - Running a Spark Job. In this lab, you'll learn how to configure a Spark job for unattended execution so that you can schedule batch processing workloads.
  3. Lab 3 - Using Structured Streaming. In this lab you'll learn how to use Spark to process an unbounded stream of realtime data; a common requirement in Internet-of-Things (IoT) scenarios.
  4. Lab 4 - Introduction to Machine Learning.pdf. In this lab you'll get started with machine learning by using Spark to train and evaluate a classification model.

About

An introduction to analyzing data using Spark in Azure Databricks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published