Visualizing cloud-optimized geotiffs #370

rsignell-usgs · 2019-10-25T20:14:23Z

Is your feature request related to a problem? Please describe.
Cloud-Optimized Geotiffs are quickly becoming a community standard for storing geospatial raster data on the Cloud. These COGs store the data in small tiles, and have built-in overviews, which makes extracting data fast (especially when using parallel processing). And the cogeo tools make it easy to create and validate COGs.

There is a AWS-lambda service that can create a WMTS tile service on-the-fly from any COG on object storage, and this works great in geoviews: https://nbviewer.jupyter.org/gist/jsignell/9980e640536180b44fb59e7812208e27/cog_tiling_small_panel.ipynb

But it would be way cooler for geoviews to be able to extract the chunks a render them in parallel using dask, so that on-the-fly service wouldn't even be needed.

Describe the solution you'd like
I'd like some tools to extract and visualize the correct chunks from the most appropriate overviews in a COG, using Dask to parallelize the effort. This would allow us to use geoviews to effectively and interactively browse any COG to full resolution in the browser without additional services.

Describe alternatives you've considered
We can use the "on-the-fly" WMTS service, but someone needs to run this and it also costs $$.

It would be way cooler to have this functionality native to geoviews!

rsignell-usgs · 2020-10-22T19:28:57Z

@jbednar, what would it take to get this on the docket?

jbednar · 2020-10-22T19:55:26Z

I'm not sure. From pangeo-data/pangeo-example-notebooks#21 it sounds like there has already been some effort at improving access to COGs for xarray usage? Is any of that simply usable with geoviews already, i.e. to use rasterio/gdal/etc. to make a lazy xarray that GeoViews can use without further work? I haven't looked closely at what's been done and what remains...

scottyhq · 2020-10-26T19:10:51Z

@jbednar and @rsignell-usgs the linked issue above is one of many where we've speculated on the ability to make xarray+dask+holoviews more efficient working with COGs . there hasn't been a lot of concentrated effort. I think a on-the-fly WMTS layer is a special case, and there are some lower-hanging fruit improvements that could be made that i'll try to clarify below.

Consider a single COG (maybe we open a separate issue in holoviews/hvplot):

import os, rioxarray, hvplot.xarray
os.environ['GDAL_DISABLE_READDIR_ON_OPEN']='EMPTY_DIR'
os.environ['AWS_NO_SIGN_REQUEST']='YES'
url = 'https://sentinel-s1-rtc-indigo.s3.us-west-2.amazonaws.com/tiles/RTC/1/IW/10/T/ET/2020/S1B_20200106_10TET_ASC/Gamma0_VV.tif'
da = rioxarray.open_rasterio(url)
da.hvplot.image(rasterize=True, cmap='gray', clim=(0,0.4))

This is amazing functionality because as we zoom into the plot we get updating resolution on-the-fly. I could be wrong, but I believe the entire .tif is read into memory and then rasterized according to zoom level? So effectively running something like da.coarsen(x=16,y=16, boundary='pad').mean(). However, a COG has precomputed 'overviews' so you don't actually have to do this computation yourself. For the initial plot for example the code below is extremely fast because instead of pulling 5490x5490 pixels and resampling, just pull the lowest resolution overview (344x344 pixels):

da = rioxarray.open_rasterio(url, overview_level=3)
da.hvplot.image(rasterize=True, cmap='gray', clim=(0,0.4))

It's a complicated stack of pieces coming together here. Ultimately the TIF is read with rasterio.open() and GDAL/libtiff which is clever about only reading from overviews depending on the read request. This code block is equivalent to the above, returning a numpy array.

import rasterio
with rasterio.open(url) as src:
    numpy_array = src.read(out_shape=(src.height//16, src.width//16))

Information on this particular test file:

with rasterio.open(url) as src:
    print(src.profile)  
    overview_factors = [src.overviews(i) for i in src.indexes][0]
    overview_levels = list(range(len(overview_factors)))
    print('Overview levels: ', overview_levels)
    print('Overview factors: ',  overview_factors)

{'driver': 'GTiff', 'dtype': 'float32', 'nodata': 0.0, 'width': 5490, 'height': 5490, 'count': 1, 'crs': CRS.from_epsg(32610), 'transform': Affine(20.0, 0.0, 499980.0,
       0.0, -20.0, 5300040.0), 'blockxsize': 512, 'blockysize': 512, 'tiled': True, 'compress': 'deflate', 'interleave': 'band'}
Overview levels:  [0, 1, 2, 3]
Overview factors:  [2, 4, 8, 16]

In summary, it seems like there is room to improve the rasterize=True efficiency when overviews are present in the underlaying data. I can think of a few approaches:

implement hvplot.cog(rasterize=True) that reads from a given overview depending on zoom level. Or maybe have has_overviews attribute in the DataArray that holoviews can work with.
figure out how to pass out_shape= to the xarray rasterio backend to offload efficient reading of the COG overviews to the underlaying GDAL driver.
Accessing COG overviews with read_rasterio pydata/xarray#3269

rsignell-usgs added the TRIAGE label Oct 25, 2019

philippjfr added type: enhancement and removed TRIAGE labels Oct 28, 2019

ppwadhwa added this to the Wishlist milestone Jul 20, 2020

rosepearson mentioned this issue Oct 3, 2024

Update streamlit niwa/kelp-detection#4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Visualizing cloud-optimized geotiffs #370

Visualizing cloud-optimized geotiffs #370

rsignell-usgs commented Oct 25, 2019

rsignell-usgs commented Oct 22, 2020

jbednar commented Oct 22, 2020

scottyhq commented Oct 26, 2020

Visualizing cloud-optimized geotiffs #370

Visualizing cloud-optimized geotiffs #370

Comments

rsignell-usgs commented Oct 25, 2019

rsignell-usgs commented Oct 22, 2020

jbednar commented Oct 22, 2020

scottyhq commented Oct 26, 2020