Skip to content

Commit

Permalink
docs: Update IOC notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
pmav99 committed Jun 19, 2024
1 parent 6b71fbe commit 18b6a92
Show file tree
Hide file tree
Showing 4 changed files with 908 additions and 101 deletions.
183 changes: 86 additions & 97 deletions examples/IOC_data.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,24 +11,14 @@
"source": [
"import logging\n",
"\n",
"import shapely\n",
"import hvplot.pandas\n",
"import geopandas as gpd\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"import shapely\n",
"import xarray as xr\n",
"\n",
"from searvey import ioc\n",
"\n",
"logging.basicConfig(\n",
" level=20,\n",
" style=\"{\",\n",
" format=\"{asctime:s}; {levelname:8s}; {threadName:23s}; {name:<25s} {lineno:5d}; {message:s}\",\n",
")\n",
"\n",
"logging.getLogger(\"urllib3\").setLevel(30)\n",
"logging.getLogger(\"parso\").setLevel(30)\n",
"\n",
"logger = logging.getLogger(__name__)"
"import searvey"
]
},
{
Expand All @@ -38,7 +28,9 @@
"tags": []
},
"source": [
"## Retrieve Station Metadata"
"## Retrieve Station Metadata\n",
"\n",
"In order to retrieve station metadata we need to use the `get_ioc_stations()` function:"
]
},
{
Expand All @@ -50,8 +42,8 @@
},
"outputs": [],
"source": [
"ioc_stations = ioc.get_ioc_stations()\n",
"ioc_stations"
"ioc_stations = searvey.get_ioc_stations()\n",
"len(ioc_stations)"
]
},
{
Expand All @@ -63,13 +55,7 @@
},
"outputs": [],
"source": [
"figure, axis = plt.subplots(1, 1)\n",
"figure.set_size_inches(12, 12 / 1.61803398875)\n",
"\n",
"countries = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))\n",
"_ = countries.plot(color='lightgrey', ax=axis, zorder=-1)\n",
"_ = ioc_stations.plot(ax=axis)\n",
"_ = axis.set_title(f'all IOC stations')"
"ioc_stations.sample(3).sort_index()"
]
},
{
Expand All @@ -85,29 +71,28 @@
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"id": "5",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"## Retrieve station metadata from arbitrary polygon"
"world_plot = ioc_stations.hvplot(geo=True, tiles=True, hover_cols=[\"ioc_code\", \"location\"])\n",
"world_plot.opts(width=800, height=500)"
]
},
{
"cell_type": "code",
"execution_count": null,
"cell_type": "markdown",
"id": "6",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"east_coast = shapely.geometry.box(-85, 25, -65, 45)\n",
"east_coast\n",
"## Retrieve station metadata from arbitrary polygon\n",
"\n",
"east_stations = ioc.get_ioc_stations(region=east_coast)\n",
"east_stations"
"We can filter the IOC stations using any shapely object. E.g. to only select stations in the East Coast of US:"
]
},
{
Expand All @@ -119,143 +104,147 @@
},
"outputs": [],
"source": [
"east_stations[~east_stations.contacts.str.contains(\"NOAA\", na=False)]"
]
},
{
"cell_type": "markdown",
"id": "8",
"metadata": {},
"source": [
"## Retrieve IOC station data"
"east_coast = shapely.geometry.box(-85, 25, -65, 45)\n",
"east_stations = searvey.get_ioc_stations(region=east_coast)\n",
"len(east_stations)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9",
"id": "8",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"east_data = ioc.get_ioc_data(\n",
" ioc_metadata=east_stations,\n",
" endtime=\"2020-05-30\",\n",
" period=3,\n",
")\n",
"east_data"
"east_stations.hvplot.points(geo=True, tiles=True)"
]
},
{
"cell_type": "markdown",
"id": "9",
"metadata": {},
"source": [
"## Retrieve IOC station data\n",
"\n",
"The function for retrieving data is called `fetch_ioc_station()`. \n",
"\n",
"In its simplest form it only requires the station_id (i.e. IOC_CODE) and it will retrieve the last week of data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "10",
"metadata": {
"tags": []
},
"metadata": {},
"outputs": [],
"source": [
"def drop_all_nan_vars(ds: xr.Dataset) -> xr.Dataset:\n",
" for var in ds.data_vars:\n",
" if ds[var].notnull().sum() == 0:\n",
" ds = ds.drop_vars(var)\n",
" return ds\n",
"\n",
"ds = drop_all_nan_vars(east_data.sel(ioc_code=\"setp1\"))\n",
"ds"
"df = searvey.fetch_ioc_station(\"acap2\")\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "11",
"metadata": {
"tags": []
},
"metadata": {},
"source": [
"As you can see not all the data are suitable for use...\n",
"\n",
"More specifically, the `rad` seems to have been re-calibrated in the afternoon of 2020-05-28:"
"We can also explicitly specify the start and the end date. E.g. to retrieve the first 10 days of May 2024:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "12",
"metadata": {
"tags": []
},
"metadata": {},
"outputs": [],
"source": [
"fix, axes = plt.subplots(1, 1)\n",
"\n",
"_ = ds.prs.plot(ax=axes, label=\"prs\")\n",
"_ = ds.rad.plot(ax=axes, label=\"rad\")\n",
"_ = ds.ra2.plot(ax=axes, label=\"ra2\")\n",
"axes.legend()"
"df = searvey.fetch_ioc_station(\n",
" station_id=\"alva\",\n",
" start_date=pd.Timestamp(\"2024-05-01\"),\n",
" end_date=pd.Timestamp(\"2024-05-10\"),\n",
" progress_bar=False,\n",
")\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "13",
"metadata": {
"tags": []
},
"metadata": {},
"source": [
"Similarly some stations might have missing data"
"If we request more than 30 days, then multiple HTTP requests are send to the IOC servers via multithreading and the responses are merged to a single dataframe. \n",
"\n",
"In this case, setting `progress_bar=True` can be useful in monitoring the progress of HTTP requests. \n",
"For example to retrieve data for the first 6 months of 2020:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "14",
"metadata": {
"tags": []
},
"metadata": {},
"outputs": [],
"source": [
"bahamas = ds.where(ds.country == \"Bahamas\")\n",
"bahamas"
"df = searvey.fetch_ioc_station(\n",
" station_id=\"alva\",\n",
" start_date=pd.Timestamp(\"2020-01-01\"),\n",
" end_date=pd.Timestamp(\"2020-06-01\"),\n",
" progress_bar=True,\n",
")\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "15",
"metadata": {},
"source": [
"Keep in mind that each IOC station may return dataframes with different sensors/columns. For example the station in Bahamas returns a bunch of them:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "15",
"id": "16",
"metadata": {},
"outputs": [],
"source": [
"bahamas.ra2.plot()"
"bahamas = searvey.fetch_ioc_station(\n",
" station_id=\"setp1\",\n",
" start_date=pd.Timestamp(\"2020-05-25\"),\n",
" end_date=pd.Timestamp(\"2020-05-30\"),\n",
" progress_bar=False,\n",
")\n",
"bahamas"
]
},
{
"cell_type": "markdown",
"id": "16",
"metadata": {
"tags": []
},
"id": "17",
"metadata": {},
"source": [
"Trying to fill the missing values is not that difficult, but you probably need to review the results"
"Furthermore, not all of these timeseries are ready to be used. \n",
"\n",
"E.g. we see that in the last days of May the `rad` sensor was offline for some time:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "17",
"metadata": {
"tags": []
},
"id": "18",
"metadata": {},
"outputs": [],
"source": [
"bahamas.ra2.interpolate_na(dim=\"time\", method=\"linear\").plot()"
"bahamas.rad.hvplot()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "searvey",
"language": "python",
"name": "python3"
"name": "searvey"
},
"language_info": {
"codemirror_mode": {
Expand Down
Loading

0 comments on commit 18b6a92

Please sign in to comment.