This is an example of how to ingest and process Open Street Maps using Databricks Delta Live Tablas and Mosaic.
The aim is to ingest and process OSM data and form the medallion layers of our delta lake following the best practices. Then used these tables to identify the density of residential buildings, hospitals and train stations across Italy as an example of the new insights this capability unlocks.
The pipeline will
- Download the OSM dataset for Italy
- Extract from it the buildings (with relative metadata)
- Index the buildings on an H3 grid
- Separate the buildings by type
- Display a building density by counting the records per grid index
The pipeline is divided in three notebooks
0_Download
Will download the OSM dataset and ingest it into three delta tables1_Process
Will create the Delta Live Table transformations to process the OSM data and extract the building shapes and metadata, index and categorise2_Explore
Will display the density of residential and train station buildings
You should be able to visualise the final result in the 2_Explore notebook