chore: prepare draft for OSM tags filter example notebook

kraina-ai · Apr 14, 2024 · c7904a0 · c7904a0
1 parent 5f0f623
commit c7904a0
Showing 1 changed file with 344 additions and 0 deletions.
diff --git a/examples/advanced_examples/osm_tags_filter.ipynb b/examples/advanced_examples/osm_tags_filter.ipynb
@@ -0,0 +1,344 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# OSM tags filter\n",
+    "\n",
+    "**QuackOSM** allows users to filter the data from the `*.osm.pbf` file. Filtering will reduce a number of features parsed from the original file.\n",
+    "\n",
+    "This notebook will explain how to use the OSM tags filtering mechanism."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Filter format\n",
+    "\n",
+    "Library expects a filter in the `dict` form (or `JSON` if provided via CLI).\n",
+    "\n",
+    "**QuackOSM** uses two formats of filters: `OsmTagsFilter` and `GroupedOsmTagsFilter`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from quackosm._osm_tags_filters import GroupedOsmTagsFilter, OsmTagsFilter"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The first one, `OsmTagsFilter`, is a basic `dict` object that defines how to filter OSM based on their tags.\n",
+    "\n",
+    "It is based on the filter object used in the [OSMnx](https://osmnx.readthedocs.io/en/stable/index.html) library, but it has more functionalities."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "OsmTagsFilter"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The key of the `dict` is expected to be an OSM tag key and the value can be one of: `bool`, a single OSM tag value or a list of OSM tag values."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# amenity=bench\n",
+    "filter_1 = {\"amenity\": \"bench\"}\n",
+    "\n",
+    "# amenity=ice_cream and amenity=cafe\n",
+    "filter_2 = {\"amenity\": [\"ice_cream\", \"cafe\"]}\n",
+    "\n",
+    "# all amenities\n",
+    "filter_3 = {\"amenity\": True}\n",
+    "\n",
+    "# amenity=bar and building=office\n",
+    "filter_4 = {\"amenity\": \"bar\", \"building\": \"office\"}\n",
+    "\n",
+    "# all amenities and all highways\n",
+    "filter_5 = {\"amenity\": True, \"highway\": True}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Second object, `GroupedOsmTagsFilter`, allows assigning filters to groups. It is a `dict` object with a group name being a `key` and `OsmTagsFilter` being a value.\n",
+    "\n",
+    "This can become useful for grouping features into semantical categories for machine learning applications."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "GroupedOsmTagsFilter"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# benches\n",
+    "grouped_filter_1 = {\"benches\": {\"amenity\": \"bench\"}}\n",
+    "\n",
+    "# swimming sport facilities\n",
+    "grouped_filter_2 = {\"swimming_sport\": {\"leisure\": \"swimming_pool\", \"sport\": \"swimming\"}}\n",
+    "\n",
+    "# shops, tourism and traffic related objects\n",
+    "grouped_filter_3 = {\n",
+    "    \"shopping\": {\"shop\": True, \"landuse\": \"retail\"},\n",
+    "    \"tourism\": {\"tourism\": True, \"historic\": True},\n",
+    "    \"traffic\": {\"amenity\": \"parking\", \"highway\": True},\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Basic usage\n",
+    "\n",
+    "Examples below show how to use the basic OSM tags filters."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import urllib.request\n",
+    "\n",
+    "from quackosm import get_features_gdf"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "monaco_pbf_url = \"https://download.geofabrik.de/europe/monaco-latest.osm.pbf\"\n",
+    "monaco_pbf_file = \"monaco.osm.pbf\"\n",
+    "urllib.request.urlretrieve(monaco_pbf_url, monaco_pbf_file)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Benches only"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "tags_filter = {\"amenity\": \"bench\"}\n",
+    "get_features_gdf(monaco_pbf_file, tags_filter=tags_filter, silent_mode=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Cafes, bars and restaurants"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "tags_filter = {\"amenity\": [\"cafe\", \"restaurant\", \"bar\"]}\n",
+    "get_features_gdf(monaco_pbf_file, tags_filter=tags_filter, silent_mode=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### All amenities and leisures"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "tags_filter = {\"amenity\": True, \"leisure\": True}\n",
+    "get_features_gdf(monaco_pbf_file, tags_filter=tags_filter, silent_mode=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Shopping and tourism related objects (grouped filters)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "grouped_tags_filter = {\n",
+    "    \"shopping\": {\"shop\": True, \"landuse\": \"retail\"},\n",
+    "    \"tourism\": {\"tourism\": True, \"historic\": True},\n",
+    "}\n",
+    "get_features_gdf(monaco_pbf_file, tags_filter=grouped_tags_filter, silent_mode=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Compact and exploded tags"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Paramaters logic table.\n",
+    "\n",
+    "The table shows how the columns for the result are generated based on value of the `explode_tags`, `keep_all_tags` parameters with and without OSM tags filter being present.\n",
+    "\n",
+    "Legend:\n",
+    "- ✔️ - `True`\n",
+    "- ❌ - `False`\n",
+    "- 📦 - Compact tags (single `all_tags` column)\n",
+    "- 💥 - Exploded tags (separate columns per each tag key, or group name)\n",
+    "\n",
+    "<style type=\"text/css\">\n",
+    ".tg  {border-collapse:collapse;border-spacing:0;}\n",
+    ".tg td{border-color:inherit;border-style:solid;border-width:1px;font-size:1em;\n",
+    "  overflow:hidden;padding:10px 5px;word-break:normal;}\n",
+    ".tg th{border-color:inherit;border-style:solid;border-width:1px;font-size:1em;\n",
+    "  font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}\n",
+    ".tg .tg-1wig{font-weight:bold;text-align:left;vertical-align:top}\n",
+    ".tg .tg-baqh{text-align:center;vertical-align:top}\n",
+    ".tg .tg-lqy6{text-align:right;vertical-align:top}\n",
+    ".tg .tg-8d8j{text-align:center;vertical-align:bottom}\n",
+    "</style>\n",
+    "<table class=\"tg\">\n",
+    "<thead>\n",
+    "  <tr>\n",
+    "    <th class=\"tg-lqy6\"><code>explode_tags</code></th>\n",
+    "    <th class=\"tg-baqh\" colspan=\"2\"><code>None</code></th>\n",
+    "    <th class=\"tg-baqh\" colspan=\"2\">✔️</th>\n",
+    "    <th class=\"tg-baqh\" colspan=\"2\">❌</th>\n",
+    "  </tr>\n",
+    "</thead>\n",
+    "<tbody>\n",
+    "  <tr>\n",
+    "    <td class=\"tg-lqy6\"><code>keep_all_tags</code></td>\n",
+    "    <td class=\"tg-baqh\">✔️</td>\n",
+    "    <td class=\"tg-baqh\">❌</td>\n",
+    "    <td class=\"tg-baqh\">✔️</td>\n",
+    "    <td class=\"tg-baqh\">❌</td>\n",
+    "    <td class=\"tg-baqh\">✔️</td>\n",
+    "    <td class=\"tg-baqh\">❌</td>\n",
+    "  </tr>\n",
+    "  <tr>\n",
+    "    <td class=\"tg-1wig\">Without OSM tags filter</td>\n",
+    "    <td class=\"tg-8d8j\">📦</td>\n",
+    "    <td class=\"tg-8d8j\">📦</td>\n",
+    "    <td class=\"tg-8d8j\">💥</td>\n",
+    "    <td class=\"tg-8d8j\">💥</td>\n",
+    "    <td class=\"tg-8d8j\">📦</td>\n",
+    "    <td class=\"tg-8d8j\">📦</td>\n",
+    "  </tr>\n",
+    "  <tr>\n",
+    "    <td class=\"tg-1wig\">With OSM tags filter</td>\n",
+    "    <td class=\"tg-8d8j\">💥</td>\n",
+    "    <td class=\"tg-8d8j\">📦</td>\n",
+    "    <td class=\"tg-8d8j\">💥</td>\n",
+    "    <td class=\"tg-8d8j\">💥</td>\n",
+    "    <td class=\"tg-8d8j\">📦</td>\n",
+    "    <td class=\"tg-8d8j\">📦</td>\n",
+    "  </tr>\n",
+    "</tbody>\n",
+    "</table>"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Positive and negative filters"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Wildcard filters"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Invalid filters"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}