Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a trigger for updated_on in the catalog database #4520

Open
AetherUnbound opened this issue Jun 19, 2024 · 0 comments
Open

Add a trigger for updated_on in the catalog database #4520

AetherUnbound opened this issue Jun 19, 2024 · 0 comments
Labels
💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs

Comments

@AetherUnbound
Copy link
Collaborator

Problem

Unless explicitly set, the updated_on column for the primary media tables in the catalog is modified when a record is updated via ingestion, but is not modified when any other update (e.g. a `batched_update) is performed. This can create situations where a record is inaccurately assumed to have not been updated even it has been by some other process.

Description

We should add a trigger for the following columns which automatically sets the value to the current timestamp:

updated_on timestamp with time zone NOT NULL,

updated_on timestamp with time zone NOT NULL,

This can be done using the moddatetime extension (which I've verified is available on the catalog database). Here's an example from StackExchange for how it might look:

CREATE EXTENSION moddatetime;

CREATE TRIGGER mdt_image
  BEFORE UPDATE ON image
  FOR EACH ROW
  EXECUTE PROCEDURE moddatetime (updated_on);

The documntation for the updated_on column will also need to be updated with this change:

# updated_on
## Description
The timestamp of the last time any change was made to the media item. Unlike
`last_synced_with_source`, this can also be a change from a data cleaning step,
e.g. updating license URL in the `meta_data`, or fixing the URL using the
`batched_update` DAG.

Note

Once this is added, we'll need to remove the code that explicitly sets the updated_on column.
This may involve changes to the following areas:

Alternatives

We could rely on adding a step to update updated_on for every source of updates (see #4460 as an example). This would require more maintenance and touch points than modifying the tables themselves to perform this update for us.

Additional context

This has been discussed a few previous times, see: #4460, #4366 (comment), #4429 (comment).

@AetherUnbound AetherUnbound added 💻 aspect: code Concerns the software code in the repository 🟨 priority: medium Not blocking but should be addressed soon 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🧱 stack: catalog Related to the catalog and Airflow DAGs labels Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs
Projects
Status: 📋 Backlog
Development

No branches or pull requests

1 participant