WarpedVRT.read(masked=True) and WarpedVRT.read_masks don't use overviews #1373

mojodna · 2018-06-12T20:44:50Z

In marblecutter, I've been using external masks with overviews and treating them as normal sources in order to take advantage of the overviews:

https://github.com/mojodna/marblecutter/blob/a4f66cbf7124f25cf15bec8bc0e37ef195119de6/marblecutter/__init__.py#L263-L327

I was thinking that #1361 would eliminate the need for this, but it turns out that the main reason that I'd done that in the first place was that overviews aren't used by read_masks (and when masked=True). (This can be observed both by watching a clock (masked reads are noticeably slower) and by enabling CPL_CURL_VERBOSE when reading remote sources.)

The text was updated successfully, but these errors were encountered:

sgillies · 2018-06-12T22:58:24Z

Thanks for pointing me at marblecutter's mask reader. That's what I'd like to accomplish in Rasterio, but for nodata more generally (nodata values, alpha bands, sidecars, and internal bitmasks).

As far as I can tell, reading a mask from Rasterio's WarpedVRT when the source has a sidecar or internal bitmask is going to be super slow (requiring materialization of much of the source) until we can get a proper VRT mask band added to the result of GDALCreateWarpedVRT in Rasterio's WarpedVRTReaderBase.

mojodna · 2018-06-12T23:40:48Z

Yup. I would love to use internal masks some day, but this is a reasonable workaround in the meantime.

sgillies · 2018-06-16T21:45:49Z

@mojodna #1378 forbids boundless reads from WarpedVRTs, because the boundless read implementation now uses an ordinary VRT (as an XML string) and there's no way to reference a WarpedVRT from the XML. I think that you should be able to make the call at https://github.com/mojodna/marblecutter/blob/a4f66cbf7124f25cf15bec8bc0e37ef195119de6/marblecutter/__init__.py#L277-L279 obsolete by setting the parent WarpedVRT's extents to this window.

Before I removed WarpedVRT from the boundless read implementation I found that I needed to buffer its extents by a few pixels in order to trigger GDAL's overview use. That's why the width and height of the WarpedVRT in the old boundless read impl were incremented by 1.

mojodna · 2018-06-18T18:06:55Z

by setting the parent WarpedVRT's extents to this window

Sorry, I don't follow. Are you suggesting that I set the extent for: https://github.com/mojodna/marblecutter/blob/a4f66cbf7124f25cf15bec8bc0e37ef195119de6/marblecutter/__init__.py#L264-L272

If so, how would I do that? Through the combination of dst_width, dst_height, and dst_transform? I'm unclear on how I'd calculate the transform.

Before I removed WarpedVRT from the boundless read implementation I found that I needed to buffer its extents by a few pixels in order to trigger GDAL's overview use. That's why the width and height of the WarpedVRT in the old boundless read impl were incremented by 1.

Does rasterio include a simple way to buffer? (It's been a while since I've touched this bit of code, so I'm less confident in my ability to buffer it myself.)

sgillies · 2018-06-19T03:16:54Z

@mojodna I haven't tested the code below, but given bounds and dst_transform it should be equivalent to your current function and work with Rasterio 1.0b2 unless I've slipped up.

From the bounds and dst_transform we can compute the transform of a VRT that will cover the bounds at the same resolution (have its origin at the upper left of the bounding box). Then we compute the number of rows and columns of pixels it would take to cover the bounds at that resolution.

Use these to construct a WarpedVRT and then read the full extent of it into the target_shape.

    def _read_data():

        w, s, e, n = bounds.bounds
        vrt_transform = Affine.translation(w, n) * Affine.scale(dst_transform.a, dst_transform.e) * Affine.identity()
        vrt_width = math.ceil((e - w) / dst_transform.a)
        vrt_height = math.ceil((s - n) / dst_transform.e)

        with WarpedVRT(
            src,
            src_nodata=nodata,
            dst_crs=bounds.crs,
            dst_width=vrt_width,
            dst_height=vrt_height,
            dst_transform=vrt_transform,
            resampling=resampling,
        ) as vrt:

            data = vrt.read(out_shape=(vrt.count,) + target_shape)

            # mask with NODATA values
            if nodata is not None and vrt.nodata is not None:
                data = _mask(data, vrt.nodata)
            else:
                data = np.ma.masked_array(data, mask=np.ma.nomask)

            return data

Please disregard what I wrote about the pixel buffer, it's not relevant. But if you did want to pad the VRT for other reasons, you could translate its transform matrix west and north by a pixel's distance, add 2 pixels to the width and height, and then call vrt.read(window=vrt.window(*bounds.bounds)).

The key to triggering use of the overviews is this: the resolution of your output, defined by out_shape in your case, must be smaller than the VRT resolution. For example, if your VRT is 1024 x 1024 pixels, you could expect to use any 2x overviews when you call read(out_shape=(3, 512, 512)).

I have an example at https://github.com/mapbox/rasterio/blob/master/tests/test_warpedvrt.py#L329 of external overviews being used when decreasing the resolution of read data by a factor of two.

vincentsarago · 2018-06-19T17:50:18Z

FYI, here is how I manage to fetch the overview and doing a boundless like tile read (using ☝️ Sean guidance) https://gist.github.com/vincentsarago/bf2949dc0e3488def3caab71f3ced416

mojodna · 2018-07-06T21:33:32Z

I'm still having problems with this. When overviews are triggered on a source with an internal mask, the materialized mask appears to be incorrect:

# rio insp https://mojodna-temp.s3.amazonaws.com/internal-mask.tif

from affine import Affine
from rasterio.vrt import WarpedVRT


# 21/1251419/852238
transform1 = Affine(0.04, 0, 3876179.03, 0, -0.04, 3751873.31)
bounds1 = (3876179.0321125016, 3751854.20560666, 3876198.141369573, 3751873.314863731)

# 20/625709/426119
transform2 = Affine(0.04, 0, 3876159.92, 0, -0.04, 3751873.31)
bounds2 = (3876159.9228554303, 3751835.0963495895, 3876198.141369573, 3751873.314863731)

with WarpedVRT(
    src,
    src_nodata=None,
    crs="EPSG:3857",
    width=480,
    height=480,
    transform=transform1,
    add_alpha=True,
) as vrt:
    alpha = vrt.read(4, out_shape=(512, 512), window=vrt.window(*bounds1))
    print(alpha)
    assert(alpha[0][-1] == 0)
    
with WarpedVRT(
    src,
    src_nodata=None,
    crs="EPSG:3857",
    width=960,
    height=960,
    transform=transform2,
    add_alpha=True,
) as vrt:
    alpha = vrt.read(4, out_shape=(512, 512), window=vrt.window(*bounds2))
    print(alpha)
    assert(alpha[0][-1] == 0)

21/1251419/852238:

20/625709/426119:

vincentsarago · 2018-07-06T23:04:25Z

@mojodna I'm pretty sure it's a conflict because your file has internal nodata value

gdalinfo internal-mask.tif | grep "NoData"
  NoData Value=-10000
  NoData Value=-10000
  NoData Value=-10000

I ran your file through rio-cogeo and the result looks 👌

mojodna · 2018-07-06T23:09:23Z

-10000 isn't a valid NODATA value for a byte source though...

I'll see if I can track down the original to figure out where that came from.

mojodna · 2018-07-06T23:10:51Z

Oh, and I'm explicitly stating that src_nodata=None

mojodna · 2018-07-06T23:11:56Z

>>> with WarpedVRT(
...     src,
...     src_nodata=None,
...     crs="EPSG:3857",
...     width=480,
...     height=480,
...     transform=transform1,
...     add_alpha=True,
... ) as vrt:
...     print(vrt.nodata)
...     print(src.nodata)
None
None

mojodna · 2018-07-06T23:13:27Z

@vincentsarago can you share the generated COG? I'd like to see how it's handling masking.

vincentsarago · 2018-07-06T23:19:02Z

@mojodna https://s3-us-west-2.amazonaws.com/remotepixel-pub/cog/internal-mask_cogeo.tif

created it just by doing rio cogeo internal-mask.tif internal-mask_cogeo.tif

mojodna · 2018-07-06T23:32:33Z

Hmm. According to gdalinfo, NoData is the only substantive difference.

According to rio info, they're identical. rio insp says src.nodata == None.

rasterio says: https://github.com/mapbox/rasterio/blob/6e1b93d7089c82301dba73e0337a52ba8a44ba6c/rasterio/_base.pyx#L458-L463

But GDAL is apparently doing something else under the hood under these circumstances.

sgillies · 2018-07-09T15:30:40Z

@mojodna is it correct to say that at zoom level 21, a proper alpha channel is sourced from the internal bitmask, but at zoom 20, the alpha channel is not sourced from the internal bitmask?

What is the native resolution of your source image? Does it have overviews?

@rouault does this sound like a known (or unknown) GDAL bug?

rouault · 2018-07-09T16:00:14Z

I've tried

gdalwarp /vsicurl/https://mojodna-temp.s3.amazonaws.com/internal-mask.tif out.vrt -overwrite -t_srs EPSG:3857 -ts 960 960 -te 3876159.9228554303 3751835.0963495895 3876198.141369573 3751873.314863731 -dstalpha  -of vrt -srcnodata none
gdal_translate out.vrt out.tif

and the result looks good.
That said I see that rasterio can take GDALAutoCreateWarpedVRT() or GDALCreateWarpedVRT(). If it takes GDALAutoCreateWarpedVRT(), then the source nodata value is automatically applied in the warping options (gdalwarp -of VRT takes another code path), which might explain the conflict with the internal mask. Having both nodata and mask is a recipee for trouble.

mojodna · 2018-07-09T16:00:25Z

Correct. I'm treating zoom 21 (w/ 512x512px output) as "native resolution".

Native resolution is ~3.3cm (UTM). Image overviews and mask overviews, both.

Size is 13266, 12631
Coordinate System is:
PROJCS["WGS 84 / UTM zone 36N",
    GEOGCS["WGS 84",
        DATUM["WGS_1984",
            SPHEROID["WGS 84",6378137,298.257223563,
                AUTHORITY["EPSG","7030"]],
            AUTHORITY["EPSG","6326"]],
        PRIMEM["Greenwich",0,
            AUTHORITY["EPSG","8901"]],
        UNIT["degree",0.0174532925199433,
            AUTHORITY["EPSG","9122"]],
        AUTHORITY["EPSG","4326"]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["latitude_of_origin",0],
    PARAMETER["central_meridian",33],
    PARAMETER["scale_factor",0.9996],
    PARAMETER["false_easting",500000],
    PARAMETER["false_northing",0],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]],
    AXIS["Easting",EAST],
    AXIS["Northing",NORTH],
    AUTHORITY["EPSG","32636"]]
Pixel Size = (0.033760000000000,-0.033760000000000)
Band 1 Block=512x512 Type=Byte, ColorInterp=Red
  NoData Value=-10000
  Overviews: 6633x6316, 3317x3158, 1659x1579, 830x790, 415x395
  Mask Flags: PER_DATASET
  Overviews of mask band: 6633x6316, 3317x3158, 1659x1579, 830x790, 415x395
Band 2 Block=512x512 Type=Byte, ColorInterp=Green
  NoData Value=-10000
  Overviews: 6633x6316, 3317x3158, 1659x1579, 830x790, 415x395
  Mask Flags: PER_DATASET
  Overviews of mask band: 6633x6316, 3317x3158, 1659x1579, 830x790, 415x395
Band 3 Block=512x512 Type=Byte, ColorInterp=Blue
  NoData Value=-10000
  Overviews: 6633x6316, 3317x3158, 1659x1579, 830x790, 415x395
  Mask Flags: PER_DATASET
  Overviews of mask band: 6633x6316, 3317x3158, 1659x1579, 830x790, 415x395

@vincentsarago pix4dmapper seems to be the culprit which introduced the bogus NODATA value.

mojodna · 2018-07-09T16:05:46Z

@rouault try gdal_translate -outsize 512 512 out.vrt out.png to trigger use of overviews--that exhibits the same problem as above.

rouault · 2018-07-09T16:29:40Z

OK, I could indeed reproduce the issue with gdal_translate -outsize 512 512 out.vrt out.png. Stepping through the execution shows that the mask is still being read. So I've just looked at the content of the TIFF, and apart from the full resolution image whose mask has values at 0 and 255, all masks of overviews have only values at 255. So the issue is in the data (or the software / software procedure that generated it)

sgillies · 2018-07-09T17:00:25Z

@rouault thanks for the diagnosis! 🙏

mojodna · 2018-07-09T17:44:53Z

@rouault awesome, thanks.

I'm still puzzling through how that might happen; the procedure is here: https://github.com/mojodna/marblecutter-tools/blob/46fed51b65f0f4cb2876cdb01cb214c0575ca64b/bin/transcode.sh#L146-L199

The step that loses the correct mask values is the 2nd gdal_translate where the COG is produced; breaking it down, it seems to be a conflict between -co COPY_SRC_OVERVIEWS=YES and --config GDAL_TIFF_INTERNAL_MASK YES

mojodna · 2018-07-09T17:54:42Z

COPY_SRC_OVERVIEWS=YES doesn't appear to copy the overviews from the mask band of the source (esp. if it's an external mask containing overviews).

mojodna mentioned this issue Jun 12, 2018

Puzzling mask behavior with WarpedVRT dataset masks #1374

Closed

sgillies closed this as completed Jun 20, 2018

mojodna mentioned this issue Jul 13, 2018

External mask overviews aren't copied correctly when creating internally-masked GeoTIFFs OSGeo/gdal#754

Closed

vincentsarago mentioned this issue Aug 13, 2018

Use ColorInterp or MaskFlags to check alpha band cogeotiff/rio-tiler#51

Merged

vincentsarago mentioned this issue Nov 22, 2019

force overview dataset for WarpedVRT cogeotiff/rio-tiler#132

Closed

vincentsarago mentioned this issue Feb 11, 2021

read low resolution version of the data cogeotiff/rio-tiler#348

Open

vincentsarago mentioned this issue Jan 9, 2023

stackstac.stack doesn't read overviews gjoseph92/stackstac#196

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WarpedVRT.read(masked=True) and WarpedVRT.read_masks don't use overviews #1373

WarpedVRT.read(masked=True) and WarpedVRT.read_masks don't use overviews #1373

mojodna commented Jun 12, 2018

sgillies commented Jun 12, 2018

mojodna commented Jun 12, 2018

sgillies commented Jun 16, 2018

mojodna commented Jun 18, 2018

sgillies commented Jun 19, 2018 •

edited

Loading

vincentsarago commented Jun 19, 2018

mojodna commented Jul 6, 2018

vincentsarago commented Jul 6, 2018

mojodna commented Jul 6, 2018

mojodna commented Jul 6, 2018

mojodna commented Jul 6, 2018

mojodna commented Jul 6, 2018

vincentsarago commented Jul 6, 2018

mojodna commented Jul 6, 2018

sgillies commented Jul 9, 2018 •

edited

Loading

rouault commented Jul 9, 2018

mojodna commented Jul 9, 2018

mojodna commented Jul 9, 2018

rouault commented Jul 9, 2018

sgillies commented Jul 9, 2018

mojodna commented Jul 9, 2018 •

edited

Loading

mojodna commented Jul 9, 2018

WarpedVRT.read(masked=True) and WarpedVRT.read_masks don't use overviews #1373

WarpedVRT.read(masked=True) and WarpedVRT.read_masks don't use overviews #1373

Comments

mojodna commented Jun 12, 2018

sgillies commented Jun 12, 2018

mojodna commented Jun 12, 2018

sgillies commented Jun 16, 2018

mojodna commented Jun 18, 2018

sgillies commented Jun 19, 2018 • edited Loading

vincentsarago commented Jun 19, 2018

mojodna commented Jul 6, 2018

vincentsarago commented Jul 6, 2018

mojodna commented Jul 6, 2018

mojodna commented Jul 6, 2018

mojodna commented Jul 6, 2018

mojodna commented Jul 6, 2018

vincentsarago commented Jul 6, 2018

mojodna commented Jul 6, 2018

sgillies commented Jul 9, 2018 • edited Loading

rouault commented Jul 9, 2018

mojodna commented Jul 9, 2018

mojodna commented Jul 9, 2018

rouault commented Jul 9, 2018

sgillies commented Jul 9, 2018

mojodna commented Jul 9, 2018 • edited Loading

mojodna commented Jul 9, 2018

sgillies commented Jun 19, 2018 •

edited

Loading

sgillies commented Jul 9, 2018 •

edited

Loading

mojodna commented Jul 9, 2018 •

edited

Loading