diff --git a/examples/advanced_examples/osm_tags_filter.ipynb b/examples/advanced_examples/osm_tags_filter.ipynb new file mode 100644 index 0000000..3956951 --- /dev/null +++ b/examples/advanced_examples/osm_tags_filter.ipynb @@ -0,0 +1,344 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# OSM tags filter\n", + "\n", + "**QuackOSM** allows users to filter the data from the `*.osm.pbf` file. Filtering will reduce a number of features parsed from the original file.\n", + "\n", + "This notebook will explain how to use the OSM tags filtering mechanism." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Filter format\n", + "\n", + "Library expects a filter in the `dict` form (or `JSON` if provided via CLI).\n", + "\n", + "**QuackOSM** uses two formats of filters: `OsmTagsFilter` and `GroupedOsmTagsFilter`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from quackosm._osm_tags_filters import GroupedOsmTagsFilter, OsmTagsFilter" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The first one, `OsmTagsFilter`, is a basic `dict` object that defines how to filter OSM based on their tags.\n", + "\n", + "It is based on the filter object used in the [OSMnx](https://osmnx.readthedocs.io/en/stable/index.html) library, but it has more functionalities." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "OsmTagsFilter" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The key of the `dict` is expected to be an OSM tag key and the value can be one of: `bool`, a single OSM tag value or a list of OSM tag values." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# amenity=bench\n", + "filter_1 = {\"amenity\": \"bench\"}\n", + "\n", + "# amenity=ice_cream and amenity=cafe\n", + "filter_2 = {\"amenity\": [\"ice_cream\", \"cafe\"]}\n", + "\n", + "# all amenities\n", + "filter_3 = {\"amenity\": True}\n", + "\n", + "# amenity=bar and building=office\n", + "filter_4 = {\"amenity\": \"bar\", \"building\": \"office\"}\n", + "\n", + "# all amenities and all highways\n", + "filter_5 = {\"amenity\": True, \"highway\": True}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Second object, `GroupedOsmTagsFilter`, allows assigning filters to groups. It is a `dict` object with a group name being a `key` and `OsmTagsFilter` being a value.\n", + "\n", + "This can become useful for grouping features into semantical categories for machine learning applications." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "GroupedOsmTagsFilter" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# benches\n", + "grouped_filter_1 = {\"benches\": {\"amenity\": \"bench\"}}\n", + "\n", + "# swimming sport facilities\n", + "grouped_filter_2 = {\"swimming_sport\": {\"leisure\": \"swimming_pool\", \"sport\": \"swimming\"}}\n", + "\n", + "# shops, tourism and traffic related objects\n", + "grouped_filter_3 = {\n", + " \"shopping\": {\"shop\": True, \"landuse\": \"retail\"},\n", + " \"tourism\": {\"tourism\": True, \"historic\": True},\n", + " \"traffic\": {\"amenity\": \"parking\", \"highway\": True},\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Basic usage\n", + "\n", + "Examples below show how to use the basic OSM tags filters." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import urllib.request\n", + "\n", + "from quackosm import get_features_gdf" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "monaco_pbf_url = \"https://download.geofabrik.de/europe/monaco-latest.osm.pbf\"\n", + "monaco_pbf_file = \"monaco.osm.pbf\"\n", + "urllib.request.urlretrieve(monaco_pbf_url, monaco_pbf_file)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Benches only" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tags_filter = {\"amenity\": \"bench\"}\n", + "get_features_gdf(monaco_pbf_file, tags_filter=tags_filter, silent_mode=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Cafes, bars and restaurants" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tags_filter = {\"amenity\": [\"cafe\", \"restaurant\", \"bar\"]}\n", + "get_features_gdf(monaco_pbf_file, tags_filter=tags_filter, silent_mode=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### All amenities and leisures" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "tags_filter = {\"amenity\": True, \"leisure\": True}\n", + "get_features_gdf(monaco_pbf_file, tags_filter=tags_filter, silent_mode=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Shopping and tourism related objects (grouped filters)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "grouped_tags_filter = {\n", + " \"shopping\": {\"shop\": True, \"landuse\": \"retail\"},\n", + " \"tourism\": {\"tourism\": True, \"historic\": True},\n", + "}\n", + "get_features_gdf(monaco_pbf_file, tags_filter=grouped_tags_filter, silent_mode=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Compact and exploded tags" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Paramaters logic table.\n", + "\n", + "The table shows how the columns for the result are generated based on value of the `explode_tags`, `keep_all_tags` parameters with and without OSM tags filter being present.\n", + "\n", + "Legend:\n", + "- ✔️ - `True`\n", + "- ❌ - `False`\n", + "- 📦 - Compact tags (single `all_tags` column)\n", + "- 💥 - Exploded tags (separate columns per each tag key, or group name)\n", + "\n", + "\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "\n", + "
explode_tagsNone✔️
keep_all_tags✔️✔️✔️
Without OSM tags filter📦📦💥💥📦📦
With OSM tags filter💥📦💥💥📦📦
" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Positive and negative filters" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Wildcard filters" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Invalid filters" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}