Skip to content

Commit

Permalink
feat!: set update-dates on STAC item TDE-1298 (#1180)
Browse files Browse the repository at this point in the history
### Motivation

As a public bucket Imagery or Elevation consumer I want to know what new
datasets have been added, what datasets have had changes, what metadata
has changed, so that I can keep my data and processes in sync with data
referenced from the public buckets.
When an ODR path is provided to the imagery-standardising workflow, look
up previously published STAC Item documents (by filename) and use them
if the files are unchanged (file size and checksum).

### Modifications

- **Entry point `standardise_validate.py`**: 
- Added `--current-datetime` argument (defaults to "today" to avoid
making a breaking change).
  - Added `--odr-url` optional argument.
- **ImageryItem**:
  - Added `classmethod` to load an existing _Item_ from file/s3.
- Added `set_checksum()` method which will set the
`asset.visual.file:checksum` and associated attributes if the given
checksum differs from the current checksum.
- Removed `now_string` from class init. Created/updated dates now set
through respective `asset` properties.
  - Associated changes to reflect updated signature in all tests. 
  - ....

### Verification

Added and modified tests for `item`, `collection`, `create_item`,
`create_stac` and `collection_from_items`.
Manual end-to-end test using a custom argo workflow.
GitHub actions updated.
  • Loading branch information
schmidtnz authored Nov 25, 2024
1 parent ffe7f90 commit 9d95b02
Show file tree
Hide file tree
Showing 10 changed files with 319 additions and 33 deletions.
14 changes: 7 additions & 7 deletions .github/workflows/format-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,28 +31,28 @@ jobs:
- name: End to end test - Aerial Imagery
run: |
docker run -v "${{ runner.temp }}:/tmp/" topo-imagery python3 standardise_validate.py --from-file ./tests/data/aerial.json --preset webp --target-epsg 2193 --source-epsg 2193 --target /tmp/ --collection-id 123 --start-datetime 2023-01-01 --end-datetime 2023-01-01 --gsd 10 --create-footprints=true
docker run -v "${{ runner.temp }}:/tmp/" topo-imagery python3 standardise_validate.py --from-file ./tests/data/aerial.json --preset webp --target-epsg 2193 --source-epsg 2193 --target /tmp/ --collection-id 123 --start-datetime 2023-01-01 --end-datetime 2023-01-01 --gsd 10 --create-footprints=true --current-datetime=2010-09-18T12:34:56Z
cmp --silent "${{ runner.temp }}/BG35_1000_4829.tiff" ./scripts/tests/data/output/BG35_1000_4829.tiff
- name: End to end test - Elevation
run: |
docker run -v "${{ runner.temp }}:/tmp/" topo-imagery python3 standardise_validate.py --from-file ./tests/data/dem.json --preset dem_lerc --target-epsg 2193 --source-epsg 2193 --target /tmp/ --collection-id 123 --start-datetime 2023-01-01 --end-datetime 2023-01-01 --gsd 30 --create-footprints=true
docker run -v "${{ runner.temp }}:/tmp/" topo-imagery python3 standardise_validate.py --from-file ./tests/data/dem.json --preset dem_lerc --target-epsg 2193 --source-epsg 2193 --target /tmp/ --collection-id 123 --start-datetime 2023-01-01 --end-datetime 2023-01-01 --gsd 30 --create-footprints=true --current-datetime=2010-09-18T12:34:56Z
cmp --silent "${{ runner.temp }}/BK39_10000_0102.tiff" ./scripts/tests/data/output/BK39_10000_0102.tiff
cmp --silent "${{ runner.temp }}/BK39_10000_0101.tiff" ./scripts/tests/data/output/BK39_10000_0101.tiff
- name: End to end test - Historical Aerial Imagery
run: |
docker run -v "${{ runner.temp }}:/tmp/" topo-imagery python3 standardise_validate.py --from-file ./tests/data/hi.json --preset webp --target-epsg 2193 --source-epsg 2193 --target /tmp/ --collection-id 123 --start-datetime 2023-01-01 --end-datetime 2023-01-01 --gsd 60 --create-footprints=true
docker run -v "${{ runner.temp }}:/tmp/" topo-imagery python3 standardise_validate.py --from-file ./tests/data/hi.json --preset webp --target-epsg 2193 --source-epsg 2193 --target /tmp/ --collection-id 123 --start-datetime 2023-01-01 --end-datetime 2023-01-01 --gsd 60 --create-footprints=true --current-datetime=2010-09-18T12:34:56Z
cmp --silent "${{ runner.temp }}/BQ31_5000_0608.tiff" ./scripts/tests/data/output/BQ31_5000_0608.tiff
- name: End to end test - Cutline (Aerial Imagery)
run: |
docker run -v "${{ runner.temp }}:/tmp/" topo-imagery python3 standardise_validate.py --from-file ./tests/data/aerial.json --preset webp --target-epsg 2193 --source-epsg 2193 --target /tmp/cutline/ --collection-id 123 --start-datetime 2023-01-01 --end-datetime 2023-01-01 --cutline ./tests/data/cutline_aerial.fgb --gsd 10 --create-footprints=true
docker run -v "${{ runner.temp }}:/tmp/" topo-imagery python3 standardise_validate.py --from-file ./tests/data/aerial.json --preset webp --target-epsg 2193 --source-epsg 2193 --target /tmp/cutline/ --collection-id 123 --start-datetime 2023-01-01 --end-datetime 2023-01-01 --cutline ./tests/data/cutline_aerial.fgb --gsd 10 --create-footprints=true --current-datetime=2010-09-18T12:34:56Z
cmp --silent "${{ runner.temp }}/cutline/BG35_1000_4829.tiff" ./scripts/tests/data/output/BG35_1000_4829_cut.tiff
- name: End to end test - Footprint
run: |
docker run -v "${{ runner.temp }}:/tmp/" topo-imagery python3 standardise_validate.py --from-file ./tests/data/aerial.json --preset webp --target-epsg 2193 --source-epsg 2193 --target /tmp/ --collection-id 123 --start-datetime 2023-01-01 --end-datetime 2023-01-01 --gsd 10m --create-footprints=true
docker run -v "${{ runner.temp }}:/tmp/" topo-imagery python3 standardise_validate.py --from-file ./tests/data/aerial.json --preset webp --target-epsg 2193 --source-epsg 2193 --target /tmp/ --collection-id 123 --start-datetime 2023-01-01 --end-datetime 2023-01-01 --gsd 10m --create-footprints=true --current-datetime=2010-09-18T12:34:56Z
jq 'select(.xy_coordinate_resolution == 1E-8) // error("Wrong or missing X/Y coordinate resolution")' "${{ runner.temp }}/BG35_1000_4829_footprint.geojson"
cmp --silent <(jq "del(.features[0].properties.location, .xy_coordinate_resolution)" "${{ runner.temp }}/BG35_1000_4829_footprint.geojson") <(jq "del(.features[0].properties.location, .xy_coordinate_resolution)" ./scripts/tests/data/output/BG35_1000_4829_footprint.geojson)
Expand All @@ -64,7 +64,7 @@ jobs:
- name: End to end test - Restandardise Aerial Imagery
run: |
docker run -v "${{ runner.temp }}:/tmp/" topo-imagery python3 standardise_validate.py --from-file ./tests/data/restandardise.json --preset webp --target-epsg 2193 --source-epsg 2193 --target /tmp/restandardise/ --collection-id 123 --start-datetime 2023-01-01 --end-datetime 2023-01-01 --gsd 10 --create-footprints=true
docker run -v "${{ runner.temp }}:/tmp/" topo-imagery python3 standardise_validate.py --from-file ./tests/data/restandardise.json --preset webp --target-epsg 2193 --source-epsg 2193 --target /tmp/restandardise/ --collection-id 123 --start-datetime 2023-01-01 --end-datetime 2023-01-01 --gsd 10 --create-footprints=true --current-datetime=2010-09-18T12:34:56Z
cmp --silent "${{ runner.temp }}/restandardise/BG35_1000_4829.tiff" ./scripts/tests/data/output/BG35_1000_4829.tiff
- name: End to end test - Translate Ascii Files (Elevation)
Expand All @@ -74,7 +74,7 @@ jobs:
- name: End to end test - Remove empty files
run: |
docker run -v "${{ runner.temp }}/tmp-empty/:/tmp/" topo-imagery python3 standardise_validate.py --from-file=./tests/data/empty.json --preset=webp --target-epsg=2193 --source-epsg=2193 --target=/tmp --collection-id=123 --start-datetime=2023-01-01 --end-datetime=2023-01-01 --gsd 60 --create-footprints=true
docker run -v "${{ runner.temp }}/tmp-empty/:/tmp/" topo-imagery python3 standardise_validate.py --from-file=./tests/data/empty.json --preset=webp --target-epsg=2193 --source-epsg=2193 --target=/tmp --collection-id=123 --start-datetime=2023-01-01 --end-datetime=2023-01-01 --gsd 60 --create-footprints=true --current-datetime=2010-09-18T12:34:56Z
empty_target_directory="$(find "${{ runner.temp }}/tmp-empty" -maxdepth 0 -type d -empty)"
[[ -n "$empty_target_directory" ]]
Expand Down
43 changes: 32 additions & 11 deletions scripts/stac/imagery/create_stac.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@
from linz_logger import get_log
from shapely.geometry.base import BaseGeometry

from scripts.datetimes import format_rfc_3339_datetime_string, utc_now
from scripts.datetimes import utc_now
from scripts.files import fs
from scripts.files.files_helper import get_file_name_from_path
from scripts.files.fs import modified, read
from scripts.files.fs import NoSuchFileError, read
from scripts.files.geotiff import get_extents
from scripts.gdal.gdal_helper import gdal_info
from scripts.gdal.gdalinfo import GdalInfo
Expand Down Expand Up @@ -88,8 +88,10 @@ def create_item(
end_datetime: str,
collection_id: str,
gdal_version: str,
current_datetime: str,
gdalinfo_result: GdalInfo | None = None,
derived_from: list[str] | None = None,
odr_url: str | None = None,
) -> ImageryItem:
"""Create an ImageryItem (STAC) to be linked to a Collection.
Expand All @@ -99,13 +101,16 @@ def create_item(
end_datetime: end date of the survey
collection_id: collection id to link to the Item
gdal_version: GDAL version
current_datetime: date and time for setting consistent update and/or creation timestamp
gdalinfo_result: result of the gdalinfo command. Defaults to None.
derived_from: list of STAC Items from where this Item is derived. Defaults to None.
odr_url: S3 URL of the already published files in ODR (if this is a resupply). Defaults to None.
Returns:
a STAC Item wrapped in ImageryItem
"""
item = create_base_item(asset_path, gdal_version)
item = create_or_load_base_item(asset_path, gdal_version, current_datetime, odr_url)
base_stac = item.stac.copy()

if not gdalinfo_result:
gdalinfo_result = gdal_info(asset_path)
Expand All @@ -131,24 +136,31 @@ def create_item(
item.update_spatial(*get_extents(gdalinfo_result))
item.add_collection(collection_id)

if item.stac != base_stac and item.stac["properties"]["updated"] != current_datetime:
item.stac["properties"][
"updated"
] = current_datetime # some of the metadata has changed, so we need to make sure the `updated` time is set correctly

get_log().info("ImageryItem created", path=asset_path)
return item


def create_base_item(asset_path: str, gdal_version: str) -> ImageryItem:
def create_or_load_base_item(
asset_path: str, gdal_version: str, current_datetime: str, odr_url: str | None = None
) -> ImageryItem:
"""
Args:
asset_path: path of the visual asset (TIFF)
asset_path: path with filename of the visual asset (TIFF)
gdal_version: GDAL version string
current_datetime: date and time used for setting consistent update and/or creation timestamp
odr_url: S3 URL of the already published files in ODR (if this is a resupply). Defaults to None.
Returns:
An ImageryItem with basic information.
"""
id_ = get_file_name_from_path(asset_path)
file_content = fs.read(asset_path)
file_content_checksum = checksum.multihash_as_hex(file_content)
file_modified_datetime = format_rfc_3339_datetime_string(modified(asset_path))
now_string = format_rfc_3339_datetime_string(utc_now())

if (topo_imagery_hash := os.environ.get("GIT_HASH")) is not None:
commit_url = f"https://github.com/linz/topo-imagery/commit/{topo_imagery_hash}"
Expand All @@ -157,19 +169,28 @@ def create_base_item(asset_path: str, gdal_version: str) -> ImageryItem:

stac_processing = STACProcessing(
**{
"processing:datetime": now_string,
"processing:datetime": current_datetime,
"processing:software": STACProcessingSoftware(**{"gdal": gdal_version, "linz/topo-imagery": commit_url}),
"processing:version": os.environ.get("GIT_VERSION", "GIT_VERSION not specified"),
}
)

if odr_url:
try:
imagery_item = ImageryItem.from_file(os.path.join(odr_url, f"{id_}.json"))
imagery_item.update_checksum_related_metadata(file_content_checksum, stac_processing_data=stac_processing)
return imagery_item

except NoSuchFileError:
get_log().info(f"No Item is published for ID: {id_}")

stac_asset = STACAsset(
**{
"href": os.path.join(".", os.path.basename(asset_path)),
"file:checksum": file_content_checksum,
"created": file_modified_datetime,
"updated": file_modified_datetime,
"created": current_datetime,
"updated": current_datetime,
}
)

return ImageryItem(id_, now_string, stac_asset, stac_processing)
return ImageryItem(id_, stac_asset, stac_processing)
48 changes: 46 additions & 2 deletions scripts/stac/imagery/item.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
import json
from typing import Any, TypedDict

from scripts.files.fs import read
from scripts.stac.link import Link, Relation
from scripts.stac.util.STAC_VERSION import STAC_VERSION
from scripts.stac.util.media_type import StacMediaType
Expand All @@ -26,17 +28,59 @@
class ImageryItem:
stac: dict[str, Any]

def __init__(self, id_: str, now_string: str, stac_asset: STACAsset, stac_processing: STACProcessing) -> None:
def __init__(self, id_: str, stac_asset: STACAsset, stac_processing: STACProcessing) -> None:
self.stac = {
"type": "Feature",
"stac_version": STAC_VERSION,
"id": id_,
"links": [Link(path=f"./{id_}.json", rel=Relation.SELF, media_type=StacMediaType.GEOJSON).stac],
"assets": {"visual": {**stac_asset, "type": "image/tiff; application=geotiff; profile=cloud-optimized"}},
"stac_extensions": [StacExtensions.file.value, StacExtensions.processing.value],
"properties": {"created": now_string, "updated": now_string, **stac_processing},
"properties": {"created": stac_asset["created"], "updated": stac_asset["updated"], **stac_processing},
}

@classmethod
def from_file(cls, file_name: str) -> "ImageryItem":
"""Create an ImageryItem from a file.
Args:
file_name: The s3 URL or local path of the file to load.
Returns:
ImageryItem: The new ImageryItem.
"""
file_content = read(file_name)
stac_dict_from_s3 = json.loads(file_content.decode("UTF-8"))
if (bbox := stac_dict_from_s3.get("bbox")) is not None:
stac_dict_from_s3["bbox"] = tuple(bbox)
new_item = cls(
id_=stac_dict_from_s3["id"],
stac_asset=stac_dict_from_s3["assets"]["visual"],
stac_processing=stac_dict_from_s3["properties"],
)
new_item.stac = stac_dict_from_s3

return new_item

def update_checksum_related_metadata(self, file_content_checksum: str, stac_processing_data: STACProcessing) -> None:
"""Set the assets.visual.file:checksum attribute if it has changed.
If the checksum has changed, this also updates the following attributes:
assets.visual.updated
properties.updated
properties.processing:datetime
properties.processing:software
properties.processing:version
Args:
file_content_checksum (str): the new checksum
stac_processing_data (STACProcessing): new data for the STAC processing extension attributes for this asset/item
"""
if file_content_checksum != self.stac["assets"]["visual"]["file:checksum"]:
self.stac["assets"]["visual"]["file:checksum"] = file_content_checksum
self.stac["assets"]["visual"]["updated"] = stac_processing_data["processing:datetime"]
self.stac["properties"].update(stac_processing_data)
self.stac["properties"]["updated"] = stac_processing_data["processing:datetime"]

def update_datetime(self, start_datetime: str, end_datetime: str) -> None:
"""Update the Item `start_datetime` and `end_datetime` property.
Expand Down
3 changes: 1 addition & 2 deletions scripts/stac/imagery/tests/collection_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,6 @@ def test_add_item(fake_collection_metadata: CollectionMetadata, fake_linz_slug:
}
item = ImageryItem(
"BR34_5000_0304",
now_string,
STACAsset(
**{
"href": "any href",
Expand Down Expand Up @@ -163,7 +162,7 @@ def test_add_item(fake_collection_metadata: CollectionMetadata, fake_linz_slug:
assert collection.stac[property_name] == now_string

with subtests.test(msg=f"item properties.{property_name}"):
assert item.stac["properties"][property_name] == now_string
assert item.stac["properties"][property_name] == asset_datetimes[property_name]

with subtests.test(msg=f"item assets.visual.{property_name}"):
assert item.stac["assets"]["visual"][property_name] == asset_datetimes[property_name]
Expand Down
40 changes: 39 additions & 1 deletion scripts/stac/imagery/tests/conftest.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from datetime import datetime
from decimal import Decimal
from typing import Iterator
from typing import Any, Iterator

import pytest

Expand All @@ -21,3 +21,41 @@ def fake_collection_metadata() -> Iterator[CollectionMetadata]:
"geographic_description": None,
}
yield collection_metadata


@pytest.fixture
def fake_imagery_item_stac() -> dict[str, Any]:
return {
"type": "Feature",
"stac_version": "1.0.0",
"id": "empty",
"links": [{"href": "./empty.json", "rel": "self", "type": "application/geo+json"}],
"assets": {
"visual": {
"href": "any href",
"file:checksum": "my_checksum",
"created": "any created datetime",
"updated": "any processing datetime",
"type": "image/tiff; application=geotiff; profile=cloud-optimized",
}
},
"stac_extensions": [
"https://stac-extensions.github.io/file/v2.0.0/schema.json",
"https://stac-extensions.github.io/processing/v1.2.0/schema.json",
],
"properties": {
"created": "any created datetime",
"updated": "any processing datetime",
"processing:datetime": "any processing datetime",
"processing:software": {"gdal": "any GDAL version", "linz/topo-imagery": "any topo imagery version"},
"processing:version": "any processing version",
"start_datetime": "2021-01-27T00:00:00Z",
"end_datetime": "2021-01-29T00:00:00Z",
"datetime": None,
},
"geometry": {
"type": "Polygon",
"coordinates": [[[1799667.5, 5815977.0], [1800422.5, 5815977.0], [1800422.5, 5814986.0], [1799667.5, 5814986.0]]],
},
"bbox": (1799667.5, 5815977.0, 1800422.5, 5814986.0),
}
Loading

0 comments on commit 9d95b02

Please sign in to comment.