Fix trim and deduplicate tags deduplication #4473
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes
Related to #4429
Description
I realised when working on #4452 that the deduplication does not actually work on the trimmed name 🤦. This fixes that by using the trimmed name in the distinct on clause.
Testing Instructions
To test, start with a fresh catalog DB (
just down -v && just catalog/init
) then run this sql injust catalog/pgcli
):Then run the following sql in
just catalog/pgcli
to find one of the providers that got augmented with the "magical_computer_vision" provided tag:Confirm the tags have duplicates both in terms of ones with whitespace, and ones with different providers (the computer vision ones).
Then, run the
trim_and_deduplicate_tags
DAG (you need to turn on the the batched update DAG as well).Confirm that the select task logs 10 records for the image table to update.
Run the select again from above to query the same 10 rows that got updated for testing, and confirm they no longer have any duplicates including within the
flickr
provider. Without this fix, that last part is not true, you'll end up with duplicate trimmed tags (onmain
).Checklist
Update index.md
).main
) or a parent feature branch.just catalog/generate-docs
for catalogPRs) or the media properties generator (
just catalog/generate-docs media-props
for the catalog or
just api/generate-docs
for the API) where applicable.Developer Certificate of Origin
Developer Certificate of Origin