Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visualizing cloud-optimized geotiffs #370

Open
rsignell-usgs opened this issue Oct 25, 2019 · 3 comments
Open

Visualizing cloud-optimized geotiffs #370

rsignell-usgs opened this issue Oct 25, 2019 · 3 comments

Comments

@rsignell-usgs
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Cloud-Optimized Geotiffs are quickly becoming a community standard for storing geospatial raster data on the Cloud. These COGs store the data in small tiles, and have built-in overviews, which makes extracting data fast (especially when using parallel processing). And the cogeo tools make it easy to create and validate COGs.

There is a AWS-lambda service that can create a WMTS tile service on-the-fly from any COG on object storage, and this works great in geoviews: https://nbviewer.jupyter.org/gist/jsignell/9980e640536180b44fb59e7812208e27/cog_tiling_small_panel.ipynb

But it would be way cooler for geoviews to be able to extract the chunks a render them in parallel using dask, so that on-the-fly service wouldn't even be needed.

Describe the solution you'd like
I'd like some tools to extract and visualize the correct chunks from the most appropriate overviews in a COG, using Dask to parallelize the effort. This would allow us to use geoviews to effectively and interactively browse any COG to full resolution in the browser without additional services.

Describe alternatives you've considered
We can use the "on-the-fly" WMTS service, but someone needs to run this and it also costs $$.

It would be way cooler to have this functionality native to geoviews!

@ppwadhwa ppwadhwa added this to the Wishlist milestone Jul 20, 2020
@rsignell-usgs
Copy link
Contributor Author

@jbednar, what would it take to get this on the docket?

@jbednar
Copy link
Member

jbednar commented Oct 22, 2020

I'm not sure. From pangeo-data/pangeo-example-notebooks#21 it sounds like there has already been some effort at improving access to COGs for xarray usage? Is any of that simply usable with geoviews already, i.e. to use rasterio/gdal/etc. to make a lazy xarray that GeoViews can use without further work? I haven't looked closely at what's been done and what remains...

@scottyhq
Copy link

@jbednar and @rsignell-usgs the linked issue above is one of many where we've speculated on the ability to make xarray+dask+holoviews more efficient working with COGs . there hasn't been a lot of concentrated effort. I think a on-the-fly WMTS layer is a special case, and there are some lower-hanging fruit improvements that could be made that i'll try to clarify below.

Consider a single COG (maybe we open a separate issue in holoviews/hvplot):

import os, rioxarray, hvplot.xarray
os.environ['GDAL_DISABLE_READDIR_ON_OPEN']='EMPTY_DIR'
os.environ['AWS_NO_SIGN_REQUEST']='YES'
url = 'https://sentinel-s1-rtc-indigo.s3.us-west-2.amazonaws.com/tiles/RTC/1/IW/10/T/ET/2020/S1B_20200106_10TET_ASC/Gamma0_VV.tif'
da = rioxarray.open_rasterio(url)
da.hvplot.image(rasterize=True, cmap='gray', clim=(0,0.4))

This is amazing functionality because as we zoom into the plot we get updating resolution on-the-fly. I could be wrong, but I believe the entire .tif is read into memory and then rasterized according to zoom level? So effectively running something like da.coarsen(x=16,y=16, boundary='pad').mean(). However, a COG has precomputed 'overviews' so you don't actually have to do this computation yourself. For the initial plot for example the code below is extremely fast because instead of pulling 5490x5490 pixels and resampling, just pull the lowest resolution overview (344x344 pixels):

da = rioxarray.open_rasterio(url, overview_level=3)
da.hvplot.image(rasterize=True, cmap='gray', clim=(0,0.4))

It's a complicated stack of pieces coming together here. Ultimately the TIF is read with rasterio.open() and GDAL/libtiff which is clever about only reading from overviews depending on the read request. This code block is equivalent to the above, returning a numpy array.

import rasterio
with rasterio.open(url) as src:
    numpy_array = src.read(out_shape=(src.height//16, src.width//16))

Information on this particular test file:

with rasterio.open(url) as src:
    print(src.profile)  
    overview_factors = [src.overviews(i) for i in src.indexes][0]
    overview_levels = list(range(len(overview_factors)))
    print('Overview levels: ', overview_levels)
    print('Overview factors: ',  overview_factors)
{'driver': 'GTiff', 'dtype': 'float32', 'nodata': 0.0, 'width': 5490, 'height': 5490, 'count': 1, 'crs': CRS.from_epsg(32610), 'transform': Affine(20.0, 0.0, 499980.0,
       0.0, -20.0, 5300040.0), 'blockxsize': 512, 'blockysize': 512, 'tiled': True, 'compress': 'deflate', 'interleave': 'band'}
Overview levels:  [0, 1, 2, 3]
Overview factors:  [2, 4, 8, 16]

In summary, it seems like there is room to improve the rasterize=True efficiency when overviews are present in the underlaying data. I can think of a few approaches:

  1. implement hvplot.cog(rasterize=True) that reads from a given overview depending on zoom level. Or maybe have has_overviews attribute in the DataArray that holoviews can work with.

  2. figure out how to pass out_shape= to the xarray rasterio backend to offload efficient reading of the COG overviews to the underlaying GDAL driver.
    Accessing COG overviews with read_rasterio pydata/xarray#3269

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants