Skip to content
This repository has been archived by the owner on Sep 26, 2023. It is now read-only.

Update CRS for HLS Events Collections #147

Closed
4 of 6 tasks
anayeaye opened this issue Jun 21, 2022 · 3 comments
Closed
4 of 6 tasks

Update CRS for HLS Events Collections #147

anayeaye opened this issue Jun 21, 2022 · 3 comments
Assignees

Comments

@anayeaye
Copy link
Contributor

anayeaye commented Jun 21, 2022

Epic

#89

Description

Update the CRS info in the UAH-hosted COG assets of the HLS EJ subset collections to improve loading time.

Collections

  • hls-l30-002-ej
  • hls-s30-002-ej

Migrate to MCP datastore

Migrating to MCP is not a requirement for this task but if the newly produced assets are published to the MCP datastore it allows us to preserve the 'original' COG assets in UAH during the rollout of this CRS update.

Resources

One off ingestion script for HLS events collections in issue 146
hls_hdf_to_cog + datum=WGS84

Acceptance Criteria:

  • NEW HLS COGs have been created with the appropriate CRS headers and have been tested with a delta-backend tiler such as: https://staging-raster.delta-backend.com/cog/viewer
  • After verification, pgstac records have been updated to point at new assets OR original subset of HLS data have been overwritten

Checklist:

  • Epic Link
  • Detailed description
  • Concept diagrams
  • Assignee
@anayeaye anayeaye self-assigned this Jun 22, 2022
@anayeaye
Copy link
Contributor Author

The HLS events collection assets were reprocessed and ingested into new -reprocessed collections. Delta-config has been updated to use the reprocessed data and I will delete the original events collections after confirming there are no users.

Here is the reprocessing config and method

import os

import boto3
import rasterio
from rasterio.io import MemoryFile
from rio_cogeo.cogeo import cog_translate, cog_validate
from rio_cogeo.profiles import cog_profiles

# Config and COG profile settings    
blocksize = 256
config = dict(GDAL_NUM_THREADS="ALL_CPUS", GDAL_TIFF_OVR_BLOCKSIZE=str("128")) 
output_profile = cog_profiles.get("deflate")
output_profile.update(
    dict(blockxsize=str(blocksize), blockysize=str(blocksize), predictor="2")
)

def reprocess_and_upload_cog(s3_bucket:str, s3_src_key:str, s3_out_key:str) -> None:
    """Download geotiff from s3, add datum, and upload new tif to s3"""

    src_filename = os.path.basename(s3_src_key)
    temp_filename = f"/tmp/{src_filename}"

    # Download raw
    s3_client.download_file(
        Bucket=s3_bucket,
        Key=src_key,
        Filename=temp_filename,
    )

    try:
        assert(os.path.exists(temp_filename))

        with rasterio.open(temp_filename, "r+") as src:
            src.crs = src.crs.to_proj4() + "+datum=WGS84"

            # Open a destination memory file for the COG generated by cog_translate
            with MemoryFile() as dst:
                cog_translate(
                    src,
                    dst.name,
                    output_profile,
                    in_memory=True,
                    config=config,
                    forward_band_tags=True,
                    quiet=False
                )
                assert cog_validate(dst.name)[0]
                s3_client.upload_fileobj(
                    dst, 
                    s3_bucket, 
                    s3_out_key,
                )
                print(f"Uploaded {dst.name} to {s3_out_key}")
    except Exception as e:
        print(e)

    if os.path.exists(temp_filename):
        os.remove(temp_filename)

@anayeaye
Copy link
Contributor Author

anayeaye commented Jul 1, 2022

Update: It looks like I reprocessed without the last optimization we tested (blocksize) see slack.

This what should have been run. I may carefully reprocess (and re-verify) these after hours. Will update this issue with any changes.

# Cfg
blocksize = 512
config = dict(GDAL_NUM_THREADS="ALL_CPUS", GDAL_TIFF_OVR_BLOCKSIZE=str("512")) 
output_profile = cog_profiles.get("deflate")
output_profile.update(
    dict(blockxsize=str(blocksize), blockysize=str(blocksize), predictor="2")
)

# Uploaded object profile <edited s3 uri>
rio cogeo info /vsis3/covid-eo-data/hlsl30-ej-reprocessed/2021/15RYP/HLS.L30.T15RYP.2021294T163232.v2.0/HLS.L30.T15RYP.2021294T163232.v2.0.B01.tif
OUT>
IFD
    Id      Size           BlockSize     Decimation           
    0       3660x3660      512x512       0
    1       1830x1830      512x512       2
    2       915x915        512x512       4
    3       458x458        512x512       8

@anayeaye
Copy link
Contributor Author

anayeaye commented Jul 1, 2022

tl;dr I investigated a blocksize change but ultimately did not change the format (a second time) for the assets in the hls reprocessed collections

I re-ran the reprocessing with blocksize and overview blocksize 512 for two items and then compared the network response timing against items processed with blocksize 256 and overview blocksize 128 and saw no significant improvement. So I reprocessed the two test items to match the blocksize and overview settings of all of the other items in the reprocessed collections.

All items in both of the hls reprocessed settings have this configuration and output profile (processing matches snippet provided in earlier comment).

# Config and COG profile settings    
blocksize = 256
config = dict(GDAL_NUM_THREADS="ALL_CPUS", GDAL_TIFF_OVR_BLOCKSIZE=str("128")) 
output_profile = cog_profiles.get("deflate")
output_profile.update(
    dict(blockxsize=str(blocksize), blockysize=str(blocksize), predictor="2")
)

out> end of rio coge info /vsis3/obj-key
    Id      Size           BlockSize     Decimation           
    0       3660x3660      256x256       0
    1       1830x1830      128x128       2
    2       915x915        128x128       4
    3       458x458        128x128       8
    4       229x229        128x128       16

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant