diff --git a/book/03_Using_NASA_EarthData/02_Using_OPERA_DSWx_Products.md b/book/03_Using_NASA_EarthData/02_Using_OPERA_DSWx_Products.md index 84d5a7f..efdaf4a 100644 --- a/book/03_Using_NASA_EarthData/02_Using_OPERA_DSWx_Products.md +++ b/book/03_Using_NASA_EarthData/02_Using_OPERA_DSWx_Products.md @@ -5,7 +5,7 @@ jupyter: extension: .md format_name: markdown format_version: '1.3' - jupytext_version: 1.16.2 + jupytext_version: 1.16.4 kernelspec: display_name: Python 3 (ipykernel) language: python @@ -39,7 +39,7 @@ That is, OPERA is a NASA initiative that takes, e.g., optical or radar remote-se ## The OPERA Dynamic Surface Water Extent (DSWx) product -We've already looked at the DIST (i.e., land surface disturbance) family of OPERA data products. In this notebook, we'll examine another OPERA data product: the *Dynamic Surface Water Extent (DSWx)* product (more fully described in the [OPERA DSWx HLS product specification](https://d2pn8kiwq2w21t.cloudfront.net/documents/ProductSpec_DSWX_URS309746.pdf)). This data summarizes the extent of inland water (i.e., water on land masses as opposed to part of an ocean) that can be used to track flooding events. +We've already looked at the DIST (i.e., land surface disturbance) family of OPERA data products. In this notebook, we'll examine another OPERA data product: the *Dynamic Surface Water Extent (DSWx)* product. This data product summarizes the extent of inland water (i.e., water on land masses as opposed to part of an ocean) that can be used to track flooding and drought events. The DSWx product is fully described in the [OPERA DSWx HLS product specification](https://d2pn8kiwq2w21t.cloudfront.net/documents/ProductSpec_DSWX_URS309746.pdf). The DSWx data products are generated from HLS surface reflectance (SR) measurements; specifically, these are made by the Operational Land Imager (OLI) aboard the Landsat 8 satellite, the Operational Land Imager 2 (OLI-2) aboard the Landsat 9 satellite, and the MultiSpectral Instrument (MSI) aboard the Sentinel-2A/B satellites. As with the DIST products, the DSWx products consist of raster data stored in GeoTIFF format using the [Military Grid Reference System (MGRS)](https://en.wikipedia.org/wiki/Military_Grid_Reference_System) (the details are fully described in the [DSWx product specification](https://d2pn8kiwq2w21t.cloudfront.net/documents/ProductSpec_DSWX_URS309746.pdf)). Again, the OPERA DSWx products are distributed as [Cloud Optimized GeoTIFFs](https://www.cogeo.org/) storing different bands/layers in distinct TIFF files. @@ -47,132 +47,287 @@ The DSWx data products are generated from HLS surface reflectance (SR) measureme --- -### Band 1: Water classification (WTR) +## Band 1: Water classification (WTR) -There are ten bands or layers associated with the DSWx data product. For instance, band 3 is the *Confidence (CONF)* layer that provides, for each pixel, quantitative values describing the degree of confidence in the categories given in band 1 (the Water classification layer). Band 4 is a *Diagnostic (DIAG)* layer that encodes, for each pixel, which of five tests were positive in deriving the CONF layer. We will only use the first band — the *water classification (WTR)* layer — in this tutorial, but details of all the bands are given in the [DSWx product specification](https://d2pn8kiwq2w21t.cloudfront.net/documents/ProductSpec_DSWX_URS309746.pdf). +There are ten bands or layers associated with the DSWx data product. In this tutorial, we will focus strictly on the first band—the *water classification (WTR)* layer—but details of all the bands are given in the [DSWx product specification](https://d2pn8kiwq2w21t.cloudfront.net/documents/ProductSpec_DSWX_URS309746.pdf). For instance, band 3 is the *Confidence (CONF)* layer that provides, for each pixel, quantitative values describing the degree of confidence in the categories given in band 1 (the Water classification layer). Band 4 is a *Diagnostic (DIAG)* layer that encodes, for each pixel, which of five tests were positive in deriving the corresponding pixel value in the CONF layer. The water classification layer consists of unsigned 8-bit integer raster data (UInt8) meant to represent whether a pixel contains inland water (e.g., part of a reservoir, a lake, a river, etc., but not water associated with the open ocean). The values in this raster layer are computed from raw images acquired by the satellite with pixels being assigned one of 7 positive integer values; we'll examine these below. -Let's begin by importing the required libraries and loading a suitable file into an Xarray `DataArray`. +--- + + +### Examining an example WTR layer + + +Let's begin by importing the required libraries and loading a suitable file into an Xarray `DataArray`. The file in question contains raster data pertinent to the [Lake Powell reservoir](https://en.wikipedia.org/wiki/Lake_Powell) on the Colorado River in the United States. ```python jupyter={"source_hidden": true} -# Notebook dependencies import warnings warnings.filterwarnings('ignore') from pathlib import Path +import numpy as np, pandas as pd, xarray as xr import rioxarray as rio -import pandas as pd -import hvplot.xarray -from bokeh.models import FixedTicker +``` + +```python jupyter={"source_hidden": true} +import hvplot.pandas, hvplot.xarray import geoviews as gv gv.extension('bokeh') +from bokeh.models import FixedTicker ``` ```python jupyter={"source_hidden": true} -LOCAL_PATH = Path('..') / 'assets' / 'OPERA_L3_DSWx-HLS_T36RVU_20240601T082559Z_20240607T183514Z_S2B_30_v1.0_B01_WTR.tif' -data = rio.open_rasterio(LOCAL_PATH) -data = data.rename({'x':'easting', 'y':'northing', 'band':'band'}).squeeze() +LOCAL_PATH = Path().cwd() / '..' / 'assets' / 'OPERA_L3_DSWx-HLS_T12SVG_20230411T180222Z_20230414T030945Z_L8_30_v1.0_B01_WTR.tif' +b01_wtr = rio.open_rasterio(LOCAL_PATH).rename({'x':'longitude', 'y':'latitude'}).squeeze() ``` + +Remember, using the repr for `b01_wtr` in this Jupyter notebook is quite convenient. ++ By expanding the `Attributes` tab, we can see all the metadata associated with the data acquired. ++ By expanding the `Coordinates` tab, we can examine all the associated arrays of coordinate values. + + ```python jupyter={"source_hidden": true} # Examine data -data +b01_wtr ``` -As before, we define a basemap, this time using tiles from [ESRI](https://www.esri.com). We also set up dictionaries `image_opts` and `layout_opts` for common options we'll use when invoking `.hvplot.image`. +Let's examine the distribution of pixel values in `b01_wtr` using the Pandas `Series.value_counts` method. -```python editable=true jupyter={"source_hidden": true} slideshow={"slide_type": ""} -# Creates basemap -base = gv.tile_sources.EsriImagery.opts(width=1000, height=1000, padding=0.1) -# Initialize image options dictionary +```python jupyter={"source_hidden": true} +counts = pd.Series(b01_wtr.values.flatten()).value_counts().sort_index() +display(counts) +``` + + +These pixel values are *categorical data*. Specifically, the valid pixel values and their meanings—according to the [DSWx product specification](https://d2pn8kiwq2w21t.cloudfront.net/documents/ProductSpec_DSWX_URS309746.pdf)—are as follows: + ++ **0**: Not Water—any area with valid reflectance data that is not from one of the other allowable categories (open water, partial surface water, snow/ice, cloud/cloud shadow, or ocean masked). ++ **1**: Open Water—any pixel that is entirely water unobstructed to the sensor, including obstructions by vegetation, terrain, and buildings. ++ **2**: Partial Surface Water—an area that is at least 50% and less than 100% open water (e.g., inundated sinkholes, floating vegetation, and pixels bisected by coastlines). ++ **252**: Snow/Ice. ++ **253**: Cloud or Cloud Shadow—an area obscured by or adjacent to cloud or cloud shadow. ++ **254**: Ocean Masked—an area identified as ocean using a shoreline database with an added margin. ++ **255**: Fill value (missing data). + +--- + + +### Producing an initial plot + + +Let's make a first rough plot of the raster data using `hvplot.image`. As usual, we instantiate a `view` object that slices a smaller subset of pixels to make the image render quickly. + + +```python jupyter={"source_hidden": true} image_opts = dict( - x='easting', - y='northing', + x='longitude', + y='latitude', rasterize=True, dynamic=True, - frame_width=500, - frame_height=500, - aspect='equal', - alpha=0.8 + project=True ) -# Initialize layout options dictionary layout_opts = dict( xlabel='Longitude', - ylabel='Latitude' + ylabel='Latitude', + aspect='equal', ) + +steps = 100 +subset = slice(0, None, steps) +view = b01_wtr.isel(longitude=subset, latitude=subset) +view.hvplot.image(**image_opts).opts(**layout_opts) ``` -A continuous colormap is not all that helpful because this data is *categorical*. Specifically, the valid pixel values and their meanings are as follows: +The default colormap does not reveal the raster features very well. Also, notice that the colorbar axis covers the numerical range from 0 to 255 (approximately) even though most of those pixel values (i.e., from `3` to `251`) do not actually occur in the data. Annotating a raster image of categorical data with a legend may make more sense than using a colorbar. However, at present, `hvplot.image` does not support using a legend. So, for this tutorial, we'll stick to using a colorbar. Before assigning a colormap and appropriate labels for the colorbar, it makes sense to clean up the pixel values. -+ **0**: Not Water – an area with valid reflectance data that is not open water (class 1), partial surface water (class 2), snow/ice (class 252), cloud/cloud shadow (class 253), or ocean masked (class 254). Masking can result in “not water” (class 0) where land cover masking is applied. -+ **1**: Open Water – an area that is entirely water and unobstructed to the sensor, including obstructions by vegetation, terrain, and buildings. -+ **2**: Partial Surface Water – an area that is at least 50% and less than 100% open water (e.g., inundated sinkholes, floating vegetation, and pixels bisected by coastlines). This may be referred to as "subpixel inundation" when referring to a pixel's area. -+ **252**: Snow/Ice. -+ **253**: Cloud or Cloud Shadow – an area obscured by or adjacent to cloud or cloud shadow. -+ **254**: Ocean Masked - an area identified as ocean using a shoreline database with an added margin -+ **255**: Fill value (missing data). +--- + -Let's count the number of pixels in each category using the Pandas `Series.value_counts` method. +### Reassigning pixel values + + +We want to reassign the raster pixel values to a tighter range (i.e., from `0` to `5` instead of from `0` to `255`) to make a sensible colorbar. To do this, we'll start by copying the values from the `DataArray` `b01_wtr` into another `DataArray` `new_data` and by creating an array `values` to hold the full range of permissible pixel values. ```python jupyter={"source_hidden": true} -pd.Series(data.values.flatten()).value_counts() +new_data = b01_wtr.copy(deep=True) +values = np.array([0, 1, 2, 252, 253, 254, 255], dtype=np.uint8) +print(values) ``` -We'll choose color keys using the dictionary `color_key` with codes used frequently for this kind of data. For all the images plotted here, we'll use variants of the code in the cell below to update `layout_opts` so that plots generated for various layers/bands from the DSWx data products have suitable legends. +We first need to decide how to treat missing data, i.e., pixels with the value `255` in this raster. Let's choose to treat the missing data pixels the same as the `"Not water"` pixels. We can use the `DataArray.where` method to reassign pixels with value `null_val` (i.e., `255` in the code cell below) to the replacement value `transparent_val` (i.e., `0` in this case). Anticipating that we'll embed this code in a function later, we embed the computation in an `if`-block conditioned on a boolean value `replace_null`. -```python editable=true jupyter={"source_hidden": true} slideshow={"slide_type": ""} -# Defines colormap for visualization -levels = [0, 0.9, 1.9, 2.9, 7.9, 8.9, 10] - -color_key = { - "Not Water": (255,255,255,0.5), - "Open Water": (0,0,255,1), - "Partial Surface Water": (0,255,0,1), - "Reserved": (0,0,0,1), - "Snow/Ice": (0,255,255,1), - "Clouds/Cloud Shadow": (127,127,127,0.25) -} +```python jupyter={"source_hidden": true} +null_val = 255 +transparent_val = 0 +replace_null = True +if replace_null: + new_data = new_data.where(new_data!=null_val, other=transparent_val) + values = values[np.where(values!=null_val)] + +print(np.unique(new_data.values)) +``` -ticks = [0.5, 1.5, 2.5, 5.5, 8.5, 9.5] -ticker = FixedTicker(ticks=ticks) -labels = dict(zip(ticks, color_key)) + +Notice that `values` no longer includes `null_val`. Next, instantiate an array `new_values` to store the replacement pixel values. + -layout_opts.update( - title='B01_WTR', - color_levels=levels, - cmap=tuple(color_key.values()), - colorbar_opts={'ticker':ticker,'major_label_overrides':labels} - ) +```python jupyter={"source_hidden": true} +n_values = len(values) +start_val = 0 +new_values = np.arange(start=start_val, stop=start_val+n_values, dtype=values.dtype) +assert transparent_val in new_values, f"{transparent_val=} not in {new_values}" +print(values) +print(new_values) ``` + +Now we combine `values` and `new_values` into a dictionary `relabel` and use the dictionary to modify the entries of `new_data`. + + + + ```python jupyter={"source_hidden": true} -b01_wtr = data.where((data!=255) & (data!=0)) -image_opts.update(crs=data.rio.crs) +relabel = dict(zip(values, new_values)) +for old, new in relabel.items(): + if new==old: continue + new_data = new_data.where(new_data!=old, other=new) ``` + +We can encapsulate the logic of the preceding cells into a utility function `relabel_pixels` that condenses a broad range of categorical pixel values into a tighter one that will display better with a colormap. + + ```python jupyter={"source_hidden": true} -b01_wtr.hvplot.image(**image_opts).opts(**layout_opts) * base +# utility to remap pixel values to a sequence of contiguous integers +def relabel_pixels(data, values, null_val=255, transparent_val=0, replace_null=True, start=0): + """ + This function accepts a DataArray with a finite number of categorical values as entries. + It reassigns the pixel labels to a sequence of consecutive integers starting from start. + data: Xarray DataArray with finitely many categories in its array of values. + null_val: (default 255) Pixel value used to flag missing data and/or exceptions. + transparent_val: (default 0) Pixel value that will be fully transparent when rendered. + replace_null: (default True) Maps null_value->transparent_value everywhere in data. + start: (default 0) starting range of consecutive integer values for new labels. + The values returned are: + new_data: Xarray DataArray containing pixels with new values + relabel: dictionary associating old pixel values with new pixel values + """ + new_data = data.copy(deep=True) + if values: + values = np.sort(np.array(values, dtype=np.uint8)) + else: + values = np.sort(np.unique(data.values.flatten())) + if replace_null: + new_data = new_data.where(new_data!=null_val, other=transparent_val) + values = values[np.where(values!=null_val)] + n_values = len(values) + new_values = np.arange(start=start, stop=start+n_values, dtype=values.dtype) + assert transparent_val in new_values, f"{transparent_val=} not in {new_values}" + relabel = dict(zip(values, new_values)) + for old, new in relabel.items(): + if new==old: continue + new_data = new_data.where(new_data!=old, other=new) + return new_data, relabel ``` -The plot shows the southern end of the Suez Canal. Much of the region in the scene is land; some areas of open water are visible but most is obscured by cloud cover. +Let's apply the function just defined to `b01_wtr` and verify that the pixel values have been changed as desired. +```python jupyter={"source_hidden": true} +values = [0, 1, 2, 252, 253, 254, 255] +print(f"Before applying relabel_pixels: {np.unique(b01_wtr.values)}") +print(f"Original pixel values expected: {values}") +b01_wtr, relabel = relabel_pixels(b01_wtr, values=values) +print(f"After applying relabel_pixels: {np.unique(b01_wtr.values)}") +``` + + +Notice that the pixel value `5` does not occur in the relabelled array because the pixel value `254` (for "Ocean Masked" pixels) does not occur in the original GeoTIFF file. This is fine; the code writen below will still produce the full range of possible pixel values (& colors) in its colorbar. + --- + + +### Defining a colormap & plotting with a colorbar + + +We are now ready to define a colormap. We define the dictionary `COLORS` so that the pixel labels from `new_values` are the dictionary keys and some RGBA color tuples used frequently for this kind of data are the dictionary values. We'll use variants of the code in the cell below to update `layout_opts` so that plots generated for various layers/bands from the DSWx data products have suitable legends. + + +```python jupyter={"source_hidden": true} +COLORS = { +0: (255, 255, 255, 0.0), # No Water +1: (0, 0, 255, 1.0), # Open Water +2: (180, 213, 244, 1.0), # Partial Surface Water +3: ( 0, 255, 255, 1.0), # Snow/Ice +4: (175, 175, 175, 1.0), # Cloud/Cloud Shadow +5: ( 0, 0, 127, 0.5), # Ocean Masked +} + +c_labels = ["No Water", "Open Water", "Partial Surface Water", "Snow/Ice", "Cloud/Cloud Shadow", "Ocean Masked"] +c_ticks = list(COLORS.keys()) +limits = (c_ticks[0]-0.5, c_ticks[-1]+0.5) + +print(c_ticks) +print(c_labels) +``` + + +To use this colormap, these ticks, and these labels in a colorbar, we create a ditionary `c_bar_opts` that holds the objects to pass to the Bokeh rendering engine. + + +```python editable=true jupyter={"source_hidden": true} slideshow={"slide_type": ""} +c_bar_opts = dict( ticker=FixedTicker(ticks=c_ticks), + major_label_overrides=dict(zip(c_ticks, c_labels)), + major_tick_line_width=0, ) +``` + + +We need to update the dictionaries `image_opts` and `layout_opts` to include the data relevant to the colormap. + + +```python editable=true jupyter={"source_hidden": true} slideshow={"slide_type": ""} +image_opts.update({ 'cmap': list(COLORS.values()), + 'clim': limits, + 'colorbar': True + }) + +layout_opts.update(dict(title='B01_WTR', colorbar_opts=c_bar_opts)) +``` + + +Finally, we can render a quick plot to make sure that the colorbar is produced with suitable labels. + + +```python jupyter={"source_hidden": true} +steps = 100 +subset = slice(0, None, steps) +view = b01_wtr.isel(longitude=subset, latitude=subset) +view.hvplot.image( **image_opts).opts(frame_width=500, frame_height=500, **layout_opts) +``` + + +Finally, we can define a basemap, this time using tiles from [ESRI](https://www.esri.com). This time, we'll plot plot the raster at full resolution (i.e., we won't bother using `isel` to select a lower-resolution slice from the raster first). + + +```python editable=true jupyter={"source_hidden": true} slideshow={"slide_type": ""} +# Creates basemap +base = gv.tile_sources.EsriTerrain.opts(padding=0.1, alpha=0.25) +b01_wtr.hvplot(**image_opts).opts(**layout_opts) * base +``` This notebook provides an overview of how to visualize data extracted from OPERA DSWx data products that are stored locally. We're now ready to automate the search for such products in the cloud using the PySTAC API. - - ---- diff --git a/book/04_Case_Studies/01_Greece_Wildfires.md b/book/04_Case_Studies/01_Greece_Wildfires.md index 4af52da..1e75fd9 100644 --- a/book/04_Case_Studies/01_Greece_Wildfires.md +++ b/book/04_Case_Studies/01_Greece_Wildfires.md @@ -5,7 +5,7 @@ jupyter: extension: .md format_name: markdown format_version: '1.3' - jupytext_version: 1.16.2 + jupytext_version: 1.16.4 kernelspec: display_name: Python 3 (ipykernel) language: python @@ -14,13 +14,12 @@ jupyter: # Greece Wildfires - -## 2023 Greece wildfires - -In this example, we will retrieve data associated with the [2023 Greece wildfires](https://en.wikipedia.org/wiki/2023_Greece_wildfires) to understand its evolution and extent. We will generate a time series visualization of the event. +In this example, we will retrieve data associated with the [2023 Greece wildfires](https://en.wikipedia.org/wiki/2023_Greece_wildfires) to understand their evolution and extent. We will generate a time series associated with this data and two visualizations of the event. + +In particular, we will examine the area around the city of [Alexandroupolis](https://en.wikipedia.org/wiki/Alexandroupolis) which was severely impacted by the wildfires, resulting in loss of lives, property, and forested areas. -In particular, we will be examining the area around the city of [Alexandroupolis](https://en.wikipedia.org/wiki/Alexandroupolis) which was severely impacted by the wildfires, resulting in loss of lives, property, and forested areas. +--- ## Outline of steps for analysis @@ -41,34 +40,28 @@ In particular, we will be examining the area around the city of [Alexandroupolis + Download relevant granules into Xarray DataArray, stacked appropriately + Do intermediate computations as necessary + Assemble relevant data slices into visualization - --- - + ### Preliminary imports -```python jupyter={"outputs_hidden": true, "source_hidden": true} +```python jupyter={"source_hidden": true} from warnings import filterwarnings filterwarnings('ignore') -# data wrangling imports -import numpy as np -import pandas as pd -import xarray as xr +import numpy as np, pandas as pd, xarray as xr import rioxarray as rio -import rasterio ``` -```python jupyter={"outputs_hidden": true, "source_hidden": true} +```python jupyter={"source_hidden": true} # Imports for plotting -import hvplot.pandas -import hvplot.xarray +import hvplot.pandas, hvplot.xarray import geoviews as gv from geoviews import opts gv.extension('bokeh') ``` -```python jupyter={"outputs_hidden": true, "source_hidden": true} +```python jupyter={"source_hidden": true} # STAC imports to retrieve cloud data from pystac_client import Client from osgeo import gdal @@ -81,11 +74,7 @@ gdal.SetConfigOption('CPL_VSIL_CURL_ALLOWED_EXTENSIONS','TIF, TIFF') ### Convenient utilities - -These functions could be placed in module files for more developed research projects. For learning purposes, they are embedded within this notebook. - - -```python jupyter={"outputs_hidden": true, "source_hidden": true} +```python jupyter={"source_hidden": true} # simple utility to make a rectangle with given center of width dx & height dy def make_bbox(pt,dx,dy): '''Returns bounding-box represented as tuple (x_lo, y_lo, x_hi, y_hi) @@ -95,7 +84,7 @@ def make_bbox(pt,dx,dy): return tuple(coord+sgn*delta for sgn in (-1,+1) for coord,delta in zip(pt, (dx/2,dy/2))) ``` -```python jupyter={"outputs_hidden": true, "source_hidden": true} +```python jupyter={"source_hidden": true} # simple utility to plot an AOI or bounding-box def plot_bbox(bbox): '''Given bounding-box, returns GeoViews plot of Rectangle & Point at center @@ -104,12 +93,12 @@ def plot_bbox(bbox): ''' # These plot options are fixed but can be over-ridden point_opts = opts.Points(size=12, alpha=0.25, color='blue') - rect_opts = opts.Rectangles(line_width=0, alpha=0.1, color='red') + rect_opts = opts.Rectangles(line_width=2, alpha=0.1, color='red') lon_lat = (0.5*sum(bbox[::2]), 0.5*sum(bbox[1::2])) return (gv.Points([lon_lat]) * gv.Rectangles([bbox])).opts(point_opts, rect_opts) ``` -```python jupyter={"outputs_hidden": true, "source_hidden": true} +```python jupyter={"source_hidden": true} # utility to extract search results into a Pandas DataFrame def search_to_dataframe(search): '''Constructs Pandas DataFrame from PySTAC Earthdata search results. @@ -128,24 +117,27 @@ def search_to_dataframe(search): return df ``` ---- + +These functions could be placed in module files for more developed research projects. For learning purposes, they are embedded within this notebook. +--- + ## Identifying search parameters -```python jupyter={"outputs_hidden": true, "source_hidden": true} +```python jupyter={"source_hidden": true} dadia_forest = (26.18, 41.08) AOI = make_bbox(dadia_forest, 0.1, 0.1) -DATE_RANGE = '2023-08-01/2023-09-30' +DATE_RANGE = '2023-08-01/2023-09-30'.split('/') ``` -```python jupyter={"outputs_hidden": true, "source_hidden": true} +```python jupyter={"source_hidden": true} # Optionally plot the AOI -basemap = gv.tile_sources.OSM(width=500, height=500, padding=0.1) +basemap = gv.tile_sources.ESRI(width=500, height=500, padding=0.1, alpha=0.25) plot_bbox(AOI) * basemap ``` -```python jupyter={"outputs_hidden": true, "source_hidden": true} +```python jupyter={"source_hidden": true} search_params = dict(bbox=AOI, datetime=DATE_RANGE) print(search_params) ``` @@ -155,7 +147,7 @@ print(search_params) ## Obtaining search results -```python jupyter={"outputs_hidden": true, "source_hidden": true} +```python jupyter={"source_hidden": true} ENDPOINT = 'https://cmr.earthdata.nasa.gov/stac' PROVIDER = 'LPCLOUD' COLLECTIONS = ["OPERA_L3_DIST-ALERT-HLS_V1_1"] @@ -164,32 +156,40 @@ search_params.update(collections=COLLECTIONS) print(search_params) ``` -```python jupyter={"outputs_hidden": true, "source_hidden": true} +```python jupyter={"source_hidden": true} catalog = Client.open(f'{ENDPOINT}/{PROVIDER}/') search_results = catalog.search(**search_params) ``` -```python jupyter={"outputs_hidden": true, "source_hidden": true} + +As usual, let's encode the search result into a Pandas `DataFrame`, examine the results, and make a few transformations to clean up the results. + + +```python jupyter={"source_hidden": true} %%time df = search_to_dataframe(search_results) df.head() ``` -Clean DataFrame `df` in ways that make sense (e.g., dropping unneeded columns/rows, casting columns as fixed datatypes, setting the index, etc.). +We clean the `DataFrame` `df` in typical ways that make sense: ++ casting the `datetime` column as `DatetimeIndex`; ++ dropping extraneous `datetime` columns; ++ renaming the `eo:cloud_cover` column as `cloud_cover`; ++ casting the `cloud_cover` column as floating-point values; and ++ casting the remaining columns as strings; and ++ setting the `datetime` column as the `Index`. -```python jupyter={"outputs_hidden": true, "source_hidden": true} +```python jupyter={"source_hidden": true} df = df.drop(['end_datetime', 'start_datetime'], axis=1) df.datetime = pd.DatetimeIndex(df.datetime) -df['eo:cloud_cover'] = df['eo:cloud_cover'].astype(np.float16) -df = df.set_index('datetime').sort_index() -df.info() -``` - -```python jupyter={"outputs_hidden": true, "source_hidden": true} +df = df.rename(columns={'eo:cloud_cover':'cloud_cover'}) +df['cloud_cover'] = df['cloud_cover'].astype(np.float16) for col in ['asset', 'href', 'tile_id']: df[col] = df[col].astype(pd.StringDtype()) +df = df.set_index('datetime').sort_index() +df.info() ``` --- @@ -201,185 +201,233 @@ for col in ['asset', 'href', 'tile_id']: Let's examine the `DataFrame` `df` to better understand the search results. First, let's see how many different geographic tiles occur in the search results. -```python jupyter={"outputs_hidden": true, "source_hidden": true} +```python jupyter={"source_hidden": true} df.tile_id.value_counts() ``` -Let's examine the `asset` column. +So the AOI lies strictly within a single MGRS geographic tile, namely `T35TMF`. Let's examine the `asset` column. -```python jupyter={"outputs_hidden": true, "source_hidden": true} +```python jupyter={"source_hidden": true} df.asset.value_counts().sort_values(ascending=False) ``` -These `asset` names are not as simple and tidy as the ones we had with the DIST-ALERT data products. We can identify eseful substrings. +Some of these `asset` names are not as simple and tidy as the ones we encountered with the DIST-ALERT data products. We can, however, easily identify useful substrings. Here, we choose only those rows in which the `asset` column includes `'VEG-DIST-STATUS'` as a sub-string. -```python jupyter={"outputs_hidden": true, "source_hidden": true} -df.asset.str.contains('VEG-DIST-STATUS') +```python jupyter={"source_hidden": true} +idx_veg_dist_status = df.asset.str.contains('VEG-DIST-STATUS') +idx_veg_dist_status ``` We can use this boolean `Series` with the Pandas `.loc` accessor to filter out only the rows we want (i.e., that connect to raster data files storing the `VEG-DIST-STATUS` band). We can subsequently drop the `asset` column (it is no longer required). -```python jupyter={"outputs_hidden": true, "source_hidden": true} -veg_dist_status = df.loc[df.asset.str.contains('VEG-DIST-STATUS')] +```python jupyter={"source_hidden": true} +veg_dist_status = df.loc[idx_veg_dist_status] veg_dist_status = veg_dist_status.drop('asset', axis=1) veg_dist_status ``` -```python jupyter={"outputs_hidden": true, "source_hidden": true} +```python jupyter={"source_hidden": true} print(len(veg_dist_status)) ``` -We can aggregate the URLs into lists by *resampling* the time series and visualizing the result. +Notice some of the rows have the same date but different times (corresponding to multiple observations in the same UTC calendar day). We can aggregate the URLs into lists by *resampling* the time series by day; we can subsequently visualize the result. -```python jupyter={"outputs_hidden": true, "source_hidden": true} +```python jupyter={"source_hidden": true} by_day = veg_dist_status.resample('1d').href.apply(list) -# by_day = by_day.loc[by_day.href] # logical filtering out empty lists; CHECK display(by_day) -by_day.map(len).hvplot.line() +by_day.map(len).hvplot.scatter(grid=True).opts(title='# of observations') ``` -```python jupyter={"outputs_hidden": true, "source_hidden": true} -by_day = by_day.loc[by_day.map(bool)] # Filter out empty lists -by_day.map(len).hvplot.line() + +Let's clean up the `Series` `by_day` by filtering out the rows that have empty lists (i.e., dates on which no data was acquired). + + +```python jupyter={"source_hidden": true} +by_day = by_day.loc[by_day.map(bool)] +by_day.map(len).hvplot.scatter(ylim=(0,2.1), grid=True).opts(title="# of observations") ``` ---- + +We can now use the resampled series `by_day` to extract raster data for analysis. +--- + ## Data-wrangling to produce relevant output -The wildfire near Alexandroupolis started on August 21st and rapidly spread, particularly affecting the nearby Dadia Forest. Let's first calculate the area affected over time. - -As the lists in the `Series` `by_day` can contain one or more files, we can use the `merge` function to combine multiple raster data files acquired on the same day into a single raster image. +The wildfire near Alexandroupolis started around August 21st and rapidly spread, particularly affecting the nearby Dadia Forest. Let's first assemble a "data cube" (i.e., a stacked array of rasters) from the remote files indexed in the Pandas series `by_day`. We'll start by selecting and loading one of the remote GeoTIFF files to extract metadata that applies to all the rasters associated with this event and this MGRS tile. -```python jupyter={"outputs_hidden": true, "source_hidden": true} -from rasterio.merge import merge +```python jupyter={"source_hidden": true} +href = by_day[0][0] +data = rio.open_rasterio(href).rename(dict(x='longitude', y='latitude')) +crs = data.rio.crs +shape = data.shape ``` -Remember, values in the `VEG-DIST-STATUS` band of the DIST-ALERT product are interpreted as follows: +Before we build a stacked `DataArray` within a loop, we'll define a Python dictionary called `template` that will be used to instantiate the slices of the array. The dictionary `template` will store metadata extracted from the GeoTIFF file, notably the coordinates. + -* **0:** No disturbance -* **1:** First detection of disturbance with vegetation cover change <50% -* **2:** Provisional detection of disturbance with vegetation cover change <50% -* **3:** Confirmed detection of disturbance with vegetation cover change <50% -* **4:** First detection of disturbance with vegetation cover change >50% -* **5:** Provisional detection of disturbance with vegetation cover change >50% -* **6:** Confirmed detection of disturbance with vegetation cover change >50% -* **7:** Finished detection of disturbance with vegetation cover change <50% -* **8:** Finished detection of disturbance with vegetation cover change >50% - -The value we want to flag, then, is 6, i.e., only those pixels where at least 50% of the area was affected. - -The cell below will take a few minutes to run as we need to retrieve datw files for all the rows of `by_day`. +```python jupyter={"source_hidden": true} +template = dict() +template['coords'] = data.coords.copy() +del template['coords']['band'] +template['coords'].update({'time': by_day.index.values}) +template['dims'] = ['time', 'longitude', 'latitude'] +template['attrs'] = dict(description=f"OPERA DSWX: VEG-DIST-STATUS", units=None) + +print(template) +``` + + +We'll use a loop to build a stacked array of rasters from the Pandas series `by_day` (whose entries are lists of string, i.e., URIs). If the list has a singleton element, the URI can be read directly using `rasterio.open`; otherwise, the function [`rasterio.merge.merge`](https://rasterio.readthedocs.io/en/latest/api/rasterio.merge.html) combines multiple raster data files acquired on the same day into a single raster image. -```python jupyter={"outputs_hidden": true, "source_hidden": true} -%%time -damage_area = [] -conversion_factor = (30/1_000)**2 # to convert pixel count to area in km²; each pixel is 30x30m² +```python jupyter={"source_hidden": true} +import rasterio +from rasterio.merge import merge +``` +```python jupyter={"source_hidden": true} +%%time +rasters = [] for date, hrefs in by_day.items(): n_files = len(hrefs) if n_files > 1: - print(f"Merging {n_files} files for {date}...") + print(f"Merging {n_files} files for {date.strftime('%Y-%m-%d')}...") raster, _ = merge(hrefs) else: - print(f"Opening {n_files} file for {date}...") + print(f"Opening {n_files} file for {date.strftime('%Y-%m-%d')}...") with rasterio.open(hrefs[0]) as ds: raster = ds.read() - damage_area.append(np.sum(raster==6)*conversion_factor) + rasters.append(np.reshape(raster, newshape=shape)) ``` -```python jupyter={"outputs_hidden": true, "source_hidden": true} -damaged_area = pd.Series(index=by_day.index, data=damage_area,) -damaged_area.index.name = 'Date' -plot_title = 'Damaged Area (km²)' -damaged_area.hvplot.line(title=plot_title, grid=True, color='r') + +The data accumulated within the list `rasters` are all stored as NumPy arrays. Thus, rather than calling `xarray.concat`, we wrap a call to `numpy.concatenate` within a call to the `xarray.DataArray` constructor. We bind the object created to the identifier `stack`, making sure to also include the CRS information. + + +```python jupyter={"source_hidden": true} +stack = xr.DataArray(data=np.concatenate(rasters, axis=0), **template) +stack.rio.write_crs(crs, inplace=True) +stack ``` -Looking at the preceding plot, it seems the wildfires started around August 21 and spread rapidly, particularly affecting the nearby Dadia Forest. To see this, let's plot. the raster data to see the spatial distribution of those pixels on three dates: August 1st, August 25th, and September 19th. Again, we'll highlight only those pixels with value 6 from the raster data. We can extract those specific dates easily from the Time series `by_day` using a list. +The `DataArray` `stack` has `time`, `longitude`, and `latitude` as its main coordinate dimensions. We can use this to perform some computations and produce relevant visualizations. + +--- -```python jupyter={"outputs_hidden": true, "source_hidden": true} -dates_of_interest = ['2023-08-01', '2023-08-25', '2023-09-19'] -print(dates_of_interest) -snapshots = by_day.loc[dates_of_interest] +### Plotting the area damaged -snapshots + +To begin, let's use the data in `stack` to compute the total surface area damaged. The data in `stack` comes from `VEG-DIST-STATUS` band of the DIST-ALERT product. We interpret the pixel values in this band as follows: + +* **0:** No disturbance +* **1:** First detection of disturbance with vegetation cover change $<50\%$ +* **2:** Provisional detection of disturbance with vegetation cover change $<50\%$ +* **3:** Confirmed detection of disturbance with vegetation cover change $<50\%$ +* **4:** First detection of disturbance with vegetation cover change $\ge50\%$ +* **5:** Provisional detection of disturbance with vegetation cover change $\ge50\%$ +* **6:** Confirmed detection of disturbance with vegetation cover change $\ge50\%$ +* **7:** Finished detection of disturbance with vegetation cover change $<50\%$ +* **8:** Finished detection of disturbance with vegetation cover change $\ge50\%$ + +The particular pixel value we want to flag, then, is 6, i.e., a pixel in which at least 50% of the vegetation cover has been confirmed to be damaged and in which the disturbance is actively ongoing. We can use the `.sum` method to add up all the pixels with value `6` and the `.to_series` method to represent the sum as a time-indexed Pandas series. We also define `conversion_factor` that accounts for the area of each pixel in $\mathrm{km}^2$ (recall each pixel has area $30\mathrm{m}\times30\mathrm{m}$). + + +```python jupyter={"source_hidden": true} +pixel_val = 6 +conversion_factor = (30/1_000)**2 / pixel_val +damage_area = stack.where(stack==pixel_val, other=0).sum(axis=(1,2)) +damage_area = damage_area.to_series() * conversion_factor +damage_area +``` + +```python jupyter={"source_hidden": true} +plot_title = 'Damaged Area (km²)' +line_plot_opts = dict(title=plot_title, grid=True, color='r') +damage_area.hvplot.line(**line_plot_opts) ``` -Finally, let's make a more sophisticated static sequence of plots using Matplotlib. This is a bit more complicated, requiring imports from other packages. +Looking at the preceding plot, it seems the wildfires started around August 21 and spread rapidly. + +--- -```python jupyter={"outputs_hidden": true, "source_hidden": true} -# Some additional imports needed -import matplotlib.pyplot as plt -from matplotlib.colors import ListedColormap -from rasterio.plot import show -from mpl_toolkits.axes_grid1.anchored_artists import AnchoredSizeBar -import contextily as cx -``` +### Viewing selected time slices + + +The nearby Dadia Forest was particularly affected by the wildfires. To see this, let's plot the raster data to see the spatial distribution of damaged pixels on three particular dates: August 2, August 26th, and September 18th. Again, we'll highlight only those pixels with value 6 from the raster data. We can extract those specific dates easily from the Time series `by_day` using a list of dates (i.e., `dates_of_interest` in the cell below). + -```python jupyter={"outputs_hidden": true, "source_hidden": true} -# Define color map to generate plot (Red, Green, Blue, Alpha) -colors = [(1, 1, 1, 0) for _ in range(256)] # Initial set all values to white, with zero opacity -colors[6] = (1, 0, 0, 1) # Set class 6 to Red with 100% opacity -cmap = ListedColormap(colors) +```python jupyter={"source_hidden": true} +dates_of_interest = ['2023-08-01', '2023-08-26', '2023-09-18'] +print(dates_of_interest) +snapshots = stack.sel(time=dates_of_interest) +snapshots ``` -```python jupyter={"outputs_hidden": true, "source_hidden": true} -%%time -fig, ax = plt.subplots(1, 3, figsize = (30, 10)) -crs = None + +Let's make a static sequence of plots. We start by defining some standard options stored in dictionaries. + -for k, (date, hrefs) in enumerate(snapshots.items()): - # Read the crs to be used to generate basemaps - if crs is None: - with rasterio.open(hrefs[0]) as ds: - crs = ds.crs; +```python jupyter={"source_hidden": true} +image_opts = dict( + x='longitude', + y='latitude', + rasterize=True, + dynamic=True, + crs=crs, + shared_axes=False, + colorbar=False, + aspect='equal', + ) +layout_opts = dict(xlabel='Longitude', ylabel="Latitude") +``` - if len(hrefs) == 1: - with rasterio.open(hrefs[0]) as ds: - raster = ds.read() - transform = ds.transform - else: - raster, transform = merge(hrefs) + +We'll construct a colormap using a dictionary of RGBA values (i.e., tuples with three integer entries between 0 and 255 and a fourth floating-point entry between 0.0 and 1.0 for transparency). + - show(raster, ax=ax[k], transform=transform, interpolation='none') - cx.add_basemap(ax[k], crs=crs, zoom=9, source=cx.providers.OpenStreetMap.Mapnik) - show(raster, ax=ax[k], transform=transform, interpolation='none', cmap=cmap) +```python jupyter={"source_hidden": true} +COLORS = { k:(0,0,0,0.0) for k in range(256) } +COLORS.update({6: (255,0,0,1.0)}) +image_opts.update(cmap=list(COLORS.values())) +``` - scalebar = AnchoredSizeBar(ax[k].transData, - 10000 , '10 km', 'lower right', - color='black', - frameon=False, - pad = 0.25, - sep=5, - fontproperties = {'weight':'semibold', 'size':12}, - size_vertical=300) + +As usual, we'll start by slicing smaller images to make sure the call to `hvplot.image` works as intended. We can reduce the value of the parameter `steps` to `1` or `None` to get the images rendered at full resolution. + - ax[k].add_artist(scalebar) - ax[k].ticklabel_format(axis='both', style='scientific',scilimits=(0,0),useOffset=False,useMathText=True) - ax[k].set_xlabel('UTM easting (meters)') - ax[k].set_ylabel('UTM northing (meters)') - ax[k].set_title(f"Disturbance extent on: {date.strftime('%Y-%m-%d')}") +```python jupyter={"source_hidden": true} +steps = 100 +subset = slice(0,None, steps) +view = snapshots.isel(longitude=subset, latitude=subset) +(view.hvplot.image(**image_opts).opts(**layout_opts) * basemap).layout() ``` -The above code is inefficient; we're making repeated calls to `merge` that download the same cloud data more tha once. With some refactoring, we can be more efficient. But notice how the use of DataFrames, Series, and related data structures interactively shapes our analysis. +If we remove the call to `.layout`, we can produce an interactive slider that shows the progress of the wildfire using all the rasters in `stack`. +```python jupyter={"source_hidden": true} +steps = 100 +subset = slice(0,None, steps) +view = stack.isel(longitude=subset, latitude=subset,) +(view.hvplot.image(**image_opts).opts(**layout_opts) * basemap) +``` + --- diff --git a/book/04_Case_Studies/02_Lake_Mead_Mosaicking.md b/book/04_Case_Studies/02_Lake_Mead_Mosaicking.md new file mode 100644 index 0000000..3a43cda --- /dev/null +++ b/book/04_Case_Studies/02_Lake_Mead_Mosaicking.md @@ -0,0 +1,456 @@ +--- +jupyter: + jupytext: + text_representation: + extension: .md + format_name: markdown + format_version: '1.3' + jupytext_version: 1.16.4 + kernelspec: + display_name: Python 3 (ipykernel) + language: python + name: python3 +--- + +# Generating a Mosaicked Image of Lake Mead + + +[Lake Mead](https://en.wikipedia.org/wiki/Lake_Mead) is a water reservoir in southwestern United States and is significant for irrigation in neighboring states. The lake has experienced significant drought over the past decade and particularly between 2020 & 2023. In this notebook, we'll find GeoTIFF data related to this lake and synthesize several raster files to produce a plot. + +--- + + +## Outline of steps for analysis + + ++ Identifying search parameters + + AOI, time-window + + Endpoint, Provider, catalog identifier ("short name") ++ Obtaining search results + + Instrospect, examine to identify features, bands of interest + + Wrap results into a DataFrame for easier exploration ++ Exploring & refining search results + + Identify granules of highest value + + Filter extraneous granules with minimal contribution + + Assemble relevant filtered granules into DataFrame + + Identify kind of output to generate ++ Data-wrangling to produce relevant output + + Download relevant granules into Xarray DataArray, stacked appropriately + + Do intermediate computations as necessary + + Assemble relevant data slices into visualization + + +--- + + +### Preliminary imports + +```python jupyter={"source_hidden": true} +from warnings import filterwarnings +filterwarnings('ignore') +import numpy as np, pandas as pd, xarray as xr +import rioxarray as rio +``` + +```python jupyter={"source_hidden": true} +# Imports for plotting +import hvplot.pandas, hvplot.xarray +import geoviews as gv +from geoviews import opts +gv.extension('bokeh') +from bokeh.models import FixedTicker +``` + +```python jupyter={"source_hidden": true} +# STAC imports to retrieve cloud data +from pystac_client import Client +from osgeo import gdal +# GDAL setup for accessing cloud data +gdal.SetConfigOption('GDAL_HTTP_COOKIEFILE','~/.cookies.txt') +gdal.SetConfigOption('GDAL_HTTP_COOKIEJAR', '~/.cookies.txt') +gdal.SetConfigOption('GDAL_DISABLE_READDIR_ON_OPEN','EMPTY_DIR') +gdal.SetConfigOption('CPL_VSIL_CURL_ALLOWED_EXTENSIONS','TIF, TIFF') +``` + +### Convenient utilities + + +These functions could be placed in module files for more developed research projects. For learning purposes, they are embedded within this notebook. + + +```python jupyter={"source_hidden": true} +# simple utility to make a rectangle with given center of width dx & height dy +def make_bbox(pt,dx,dy): + '''Returns bounding-box represented as tuple (x_lo, y_lo, x_hi, y_hi) + given inputs pt=(x, y), width & height dx & dy respectively, + where x_lo = x-dx/2, x_hi=x+dx/2, y_lo = y-dy/2, y_hi = y+dy/2. + ''' + return tuple(coord+sgn*delta for sgn in (-1,+1) for coord,delta in zip(pt, (dx/2,dy/2))) +``` + +```python jupyter={"source_hidden": true} +# simple utility to plot an AOI or bounding-box +def plot_bbox(bbox): + '''Given bounding-box, returns GeoViews plot of Rectangle & Point at center + + bbox: bounding-box specified as (lon_min, lat_min, lon_max, lat_max) + Assume longitude-latitude coordinates. + ''' + # These plot options are fixed but can be over-ridden + point_opts = opts.Points(size=12, alpha=0.25, color='blue') + rect_opts = opts.Rectangles(line_width=0, alpha=0.1, color='red') + lon_lat = (0.5*sum(bbox[::2]), 0.5*sum(bbox[1::2])) + return (gv.Points([lon_lat]) * gv.Rectangles([bbox])).opts(point_opts, rect_opts) +``` + +```python jupyter={"source_hidden": true} +# utility to extract search results into a Pandas DataFrame +def search_to_dataframe(search): + '''Constructs Pandas DataFrame from PySTAC Earthdata search results. + DataFrame columns are determined from search item properties and assets. + 'asset': string identifying an Asset type associated with a granule + 'href': data URL for file associated with the Asset in a given row.''' + granules = list(search.items()) + assert granules, "Error: empty list of search results" + props = list({prop for g in granules for prop in g.properties.keys()}) + tile_ids = map(lambda granule: granule.id.split('_')[3], granules) + rows = (([g.properties.get(k, None) for k in props] + [a, g.assets[a].href, t]) + for g, t in zip(granules,tile_ids) for a in g.assets ) + df = pd.concat(map(lambda x: pd.DataFrame(x, index=props+['asset','href', 'tile_id']).T, rows), + axis=0, ignore_index=True) + assert len(df), "Empty DataFrame" + return df +``` + +```python jupyter={"source_hidden": true} +# utility to remap pixel values to a sequence of contiguous integers +def relabel_pixels(data, values, null_val=255, transparent_val=0, replace_null=True, start=0): + """ + This function accepts a DataArray with a finite number of categorical values as entries. + It reassigns the pixel labels to a sequence of consecutive integers starting from start. + data: Xarray DataArray with finitely many categories in its array of values. + null_val: (default 255) Pixel value used to flag missing data and/or exceptions. + transparent_val: (default 0) Pixel value that will be fully transparent when rendered. + replace_null: (default True) Maps null_value->transparent_value everywhere in data. + start: (default 0) starting range of consecutive integer values for new labels. + The values returned are: + new_data: Xarray DataArray containing pixels with new values + relabel: dictionary associating old pixel values with new pixel values + """ + new_data = data.copy(deep=True) + if values: + values = np.sort(np.array(values, dtype=np.uint8)) + else: + values = np.sort(np.unique(data.values.flatten())) + if replace_null: + new_data = new_data.where(new_data!=null_val, other=transparent_val) + values = values[np.where(values!=null_val)] + n_values = len(values) + new_values = np.arange(start=start, stop=start+n_values, dtype=values.dtype) + assert transparent_val in new_values, f"{transparent_val=} not in {new_values}" + relabel = dict(zip(values, new_values)) + for old, new in relabel.items(): + if new==old: continue + new_data = new_data.where(new_data!=old, other=new) + return new_data, relabel +``` + +--- + + +## Identifying search parameters + + +We'll identify a geographic point near the north shore of [Lake Mead](https://en.wikipedia.org/wiki/Lake_Mead), make a bounding box, and choose a date range that covers Marh and part of April 2023. + + +```python jupyter={"source_hidden": true} +lake_mead = (-114.754, 36.131) +AOI = make_bbox(lake_mead, 0.1, 0.1) +DATE_RANGE = "2023-03-01/2023-04-15" +``` + +```python jupyter={"source_hidden": true} +# Optionally plot the AOI +basemap = gv.tile_sources.OSM(width=500, height=500, padding=0.1) +plot_bbox(AOI) * basemap +``` + +```python jupyter={"source_hidden": true} +search_params = dict(bbox=AOI, datetime=DATE_RANGE) +print(search_params) +``` + +--- + + +## Obtaining search results + + +As usual, we'll specify the search endpoint, provider, & catalog. For the DSWx family of data products these are as follows. + + +```python jupyter={"source_hidden": true} +ENDPOINT = 'https://cmr.earthdata.nasa.gov/stac' +PROVIDER = 'POCLOUD' +COLLECTIONS = ["OPERA_L3_DSWX-HLS_V1_1.0"] +# Update the dictionary opts with list of collections to search +search_params.update(collections=COLLECTIONS) +print(search_params) +``` + +```python jupyter={"source_hidden": true} +catalog = Client.open(f'{ENDPOINT}/{PROVIDER}/') +search_results = catalog.search(**search_params) +``` + + +We convert the search results to a `DataFrame` for easier perusal. + + +```python jupyter={"source_hidden": true} +df = search_to_dataframe(search_results) +display(df.head()) +df.info() +``` + + +We'll clean the DataFrame `df` in standard ways by: ++ dropping the `start_datetime` & `end_datetime` columns; ++ renaming the `eo:cloud_cover` column; ++ converting the columns to suitable datatypes; and ++ assigning the `datetime` column as the index. + + +```python jupyter={"source_hidden": true} +df.datetime = pd.DatetimeIndex(df.datetime) +df = df.drop(['start_datetime', 'end_datetime'], axis=1) +df = df.rename({'eo:cloud_cover':'cloud_cover'}, axis=1) +df['cloud_cover'] = df['cloud_cover'].astype(np.float16) +for col in ['asset', 'href', 'tile_id']: + df[col] = df[col].astype(pd.StringDtype()) +df = df.set_index('datetime').sort_index() +``` + +```python jupyter={"source_hidden": true} +display(df.head()) +df.info() +``` + +--- + + +## Exploring & refining search results + + +We can look at the `assets` column to see which different bands are available in the results returned. + + +```python jupyter={"source_hidden": true} +df.asset.value_counts() +``` + + +The `0_B01_WTR` band is the one that we want to work with later. + +We can also see how much cloud cover there is in our search results. + + +```python jupyter={"source_hidden": true} +df.cloud_cover.agg(['min','mean','median','max']) +``` + + +We can extract selected rows from the `DataFrame` using boolean `Series`. Specifically, we'll select the rows that have less than 10% cloud cover and in which the `asset` is the `0_B01_WTR` band. + + +```python jupyter={"source_hidden": true} +c1 = (df.cloud_cover <= 10) +c2 = (df.asset.str.contains('B01_WTR')) +b01_wtr = df.loc[c1 & c2].drop(['asset', 'cloud_cover'], axis=1) +b01_wtr +``` + + +Finally, we can see how many different MGRS tiles intersect our AOI. + + +```python jupyter={"source_hidden": true} +b01_wtr.tile_id.value_counts() +``` + + +There are four distinct geographic tiles that intersect this particular AOI. + +--- + + +## Data-wrangling to produce relevant output + + +This time, we'll use a technique called *mosaicking* to combine raster data from adjacent tiles into a single raster data set. This requires the `rasterio.merge.merge` function as before. We'll also need the function `rasterio.transform.array_bounds` to get the coordinates aligned properly. + + +```python jupyter={"source_hidden": true} +import rasterio +from rasterio.merge import merge +from rasterio.transform import array_bounds +``` + + +We've used the function `merge` before to combine distinct raster data sets associated with a single MGRS tile. This time, the raster data merged will come from adjacent MGRS tiles. In calling the `merge` function in the next code cell, the column `b01_wtr.href` will be treated as a list of URLs ([Uniform Resource Locators](https://en.wikipedia.org/wiki/Uniform_Resource_Locator)). For each URL in that list, a GeoTIFF file will be downloaded and processed. The net result is a mosaicked image, i.e., a merged raster that contains a combination of all the rasters. The specifics of the merging algorithm are described in the [`rasterio.merge` module documentation](https://rasterio.readthedocs.io/en/latest/api/rasterio.merge.html). + + +```python jupyter={"source_hidden": true} +%%time +mosaicked_img, mosaic_transform = merge(b01_wtr.href) +``` + + +The output again consists of a NumPy array and coordinate transformation. + + +```python jupyter={"source_hidden": true} +print(f"{type(mosaicked_img)=}\n") +print(f"{mosaicked_img.shape=}\n") +print(f"{type(mosaic_transform)=}\n") +print(f"{mosaic_transform=}\n") +``` + + +The entries of `mosaic_transform` describe an [*affine transformation*](https://en.wikipedia.org/wiki/Affine_transformation) from pixel coordinates to continuous UTM coordinates. In particular: + ++ the entry `mosaic_transform[0]` is the horizontal width of each pixel in metres; and ++ the entry `mosaic_transform[4]` is the vertical height of each pixel in metres. + +Notice also that, in this instance, `mosaic_transform[4]` is a negative value (i.e., `mosaic_transform[4]==-30.0`). This tells us that the orientation of the continuous coordinate vertical axis opposes the orientation of the vertical pixel coordinate axis, i.e., the vertical continuous coordinate decreases in a downwards direction unlike the vertical pixel coordinate. + +When we pass the object `mosaic_transform` into the `rasterio.transform.array_bounds` function, the value returned is a bounding-box, i.e., a tuple of the form `(x_min, y_min, x_max, y_max)` describing the lower-left and upper-right corners of the resulting mosaicked image in continuous UTM coordinates. + + +```python jupyter={"source_hidden": true} +bounds = array_bounds(*mosaicked_img.shape[1:], mosaic_transform) + +bounds +``` + + +Combining all the preceding information allows us to reconstruct the continuous UTM coordinates associated with each pixel. We'll compute arrays for these continuous coordinates and label them `longitude` and `latitude`. These coordinates would more accurately be called `easting` and `northing`, but we'll use the labels `longitude` and `latitude` respectively when we attach the coordinate arrays to an Xarray `DataArray`. We choose these labels because, when the raster data is plotted with `hvplot.image`, the output will use longitude-latitude coordinates. + + +```python jupyter={"source_hidden": true} +longitude = np.arange(bounds[0], bounds[2], mosaic_transform[0]) +latitude = np.arange(bounds[3], bounds[1], mosaic_transform[4]) +``` + + +We'll wrap the mosaicked image and the relevant metadata into an Xarray `DataArray`. + + +```python jupyter={"source_hidden": true} +raster = xr.DataArray( + data=mosaicked_img, + dims=["band", "latitude", "longitude"], + coords=dict( + longitude=(["longitude"], longitude), + latitude=(["latitude"], latitude), + ), + attrs=dict( + description="OPERA DSWx B01", + units=None, + ), + ) +raster +``` + + +We need to attach a CRS object to the `raster` object. To do so, we'll use `rasterio.open` to load the relevant metadata from one of the granules associated with `b01_wtr` (these should be the same for all these particular files). + + +```python jupyter={"source_hidden": true} +with rasterio.open(b01_wtr.href[0]) as ds: + crs = ds.crs + +raster.rio.write_crs(crs, inplace=True) +print(raster.rio.crs) +``` + + +In research code, we could bundle the preceding commands within a function and save that to a module. We won't do that here because, for the purposes of this tutorial, it's preferable to make sure that we can examine the output of various lines of code interactively. + +With all the preceding steps completed, we're ready to produce a plot of the mosaicked raster. We'll relabel the pixel values so that the colorboar in the final result will be tidier. + + +```python jupyter={"source_hidden": true} +raster, relabel = relabel_pixels(raster, values=[0,1,2,253,254,255]) +``` + + +We'll define image options, layout options, & a colormap in dictionaries as we've done previously to produce a single plot. + + +```python jupyter={"source_hidden": true} +image_opts = dict( + x='longitude', + y='latitude', + rasterize=True, + dynamic=True, + crs=raster.rio.crs + ) +layout_opts = dict( + xlabel='Longitude', + ylabel='Latitude', + ) +``` + +```python jupyter={"source_hidden": true} +# Define a colormap using RGBA values; these need to be written manually here... +COLORS = { +0: (255, 255, 255, 0.0), # No Water +1: (0, 0, 255, 1.0), # Open Water +2: (180, 213, 244, 1.0), # Partial Surface Water +3: ( 0, 255, 255, 1.0), # Snow/Ice +4: (175, 175, 175, 1.0), # Cloud/Cloud Shadow +5: ( 0, 0, 127, 0.5), # Ocean Masked +} + +c_labels = ["Not water", "Open water", "Partial Surface Water", "Snow/Ice", + "Cloud/Cloud Shadow", "Ocean Masked"] +c_ticks = list(COLORS.keys()) +limits = (c_ticks[0]-0.5, c_ticks[-1]+0.5) + +c_bar_opts = dict( ticker=FixedTicker(ticks=c_ticks), + major_label_overrides=dict(zip(c_ticks, c_labels)), + major_tick_line_width=0, ) + +image_opts.update({ 'cmap': list(COLORS.values()), + 'clim': limits, + }) + +layout_opts.update(dict(colorbar_opts=c_bar_opts)) +``` + + +We'll define the basemap as a separate object to overlay using the `*` operator. + + +```python jupyter={"source_hidden": true} +basemap = gv.tile_sources.ESRI(frame_width=500, frame_height=500, padding=0.05, alpha=0.25) +``` + + +Finally, we can use the builtin Python `slice` function to extract downsampled images quickly before trying to view the entire image. Remember, reducing the value `steps` to `1` (or `None`) plots the raster at full resolution. + + +```python jupyter={"source_hidden": true} +%%time +steps = 1 +view = raster.isel(longitude=slice(0,None,steps), latitude=slice(0,None,steps)).squeeze() + +view.hvplot.image(**image_opts).opts(**layout_opts) * basemap +``` + + +This raster is much larger than ones we've previously examined (requiring roughly 4 times the storage). This process could be iterated to make a slider showing the merged results from neighboring tiles at different times. This, of course, requires that there is enough memory available. + +--- +