Skip to content

Commit

Permalink
Update links for Aug 2024 delivery
Browse files Browse the repository at this point in the history
  • Loading branch information
timothywarner committed Aug 14, 2024
1 parent 4bbc3cf commit 4c2ccba
Show file tree
Hide file tree
Showing 6 changed files with 92 additions and 0 deletions.
1 change: 1 addition & 0 deletions NetworkAdaptersReport.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,14 @@ A hand-curated learning resources list from me to you! Respectfully, Tim Warner
- [Bluesky](https://bsky.app/profile/techtrainertim.bsky.social)
- [Mastodon](https://mastodon.social/@techtrainertim)

## Tim's Essential Cert Prep Resources

- [Azure Free Account](https://azure.microsoft.com/en-us/free/)
- [Create an Azure Databricks workspace](https://learn.microsoft.com/en-us/azure/databricks/getting-started/)
- [NYC Taxi & Limousine Commission - yellow taxi trip records](https://learn.microsoft.com/en-us/azure/open-datasets/dataset-taxi-yellow?tabs=pyspark)
- [Practice Exams: Databricks Certified Data Engineer Associate](https://www.udemy.com/course/practice-exams-databricks-certified-data-engineer-associate/?couponCode=KEEPLEARNING)
- [Git client](https://git-scm.com/)

## Databricks Certified Data Engineer Associate certification

- [Official certification page](https://www.databricks.com/learn/certification/data-engineer-associate)
Expand Down
72 changes: 72 additions & 0 deletions sample-etl-notebook.dbc
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Databricks notebook source
# MAGIC %md
# MAGIC ## Sample ETL Process with PySpark and Spark SQL

# COMMAND ----------

# Step 1: Import necessary libraries
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, when

# COMMAND ----------

# Step 2: Create a SparkSession (automatically available in Databricks)
spark = SparkSession.builder.appName("ETL Example").getOrCreate()

# COMMAND ----------

# Step 3: Load Data
# Loading a sample CSV file into a DataFrame
df = spark.read.csv("/databricks-datasets/airlines/part-00000", header=True, inferSchema=True)

# COMMAND ----------

# Step 4: Data Exploration
# Display the first few rows of the DataFrame
df.show(5)

# COMMAND ----------

# Step 5: Data Transformation with PySpark
# Let's clean up some data, e.g., replacing null values in the 'ArrDelay' column with 0
df_cleaned = df.withColumn("ArrDelay", when(col("ArrDelay").isNull(), 0).otherwise(col("ArrDelay")))

# COMMAND ----------

# Step 6: Data Transformation with Spark SQL
# Register the DataFrame as a SQL temporary view
df_cleaned.createOrReplaceTempView("flights")

# COMMAND ----------

# Use Spark SQL to perform a simple transformation
transformed_df = spark.sql("""
SELECT
Year,
Month,
DayOfMonth,
Carrier,
Origin,
Dest,
ArrDelay
FROM flights
WHERE ArrDelay > 15
""")

# COMMAND ----------

# Step 7: Show the results of the transformation
transformed_df.show(5)

# COMMAND ----------

# Step 8: Save the transformed data
# Saving the transformed data to a Parquet file
transformed_df.write.mode("overwrite").parquet("/tmp/transformed_flights")

# COMMAND ----------

# MAGIC %md
# MAGIC ## Conclusion
# MAGIC This notebook demonstrated how to load, transform, and save data using both PySpark and Spark SQL in Databricks. You can adapt this example to fit your specific ETL requirements.

11 changes: 11 additions & 0 deletions utility.ps1
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Get all network adapters
$adapters = Get-NetAdapter

# Filter out virtual and disabled adapters
$activeAdapters = $adapters | Where-Object { $_.Physical -eq $true -and $_.Status -eq 'Up' }

# Select relevant properties and format the output
$report = $activeAdapters | Select-Object Name, InterfaceDescription, Status, MacAddress, LinkSpeed | Format-Table -AutoSize | Out-String

# Write the output to a file
$report | Out-File -FilePath "NetworkAdaptersReport.txt"
Binary file modified warner-databricks-slides.pptx
Binary file not shown.
Binary file added yellow_tripdata_2023-05.parquet
Binary file not shown.

0 comments on commit 4c2ccba

Please sign in to comment.