Create popularity_refresh
DAG factory
#2089
Labels
💻 aspect: code
Concerns the software code in the repository
🌟 goal: addition
Addition of new feature
🟨 priority: medium
Not blocking but should be addressed soon
🧱 stack: catalog
Related to the catalog and Airflow DAGs
Milestone
Description
Create a
popularity_refresh_dag_factory
similar to the DAG factories for provider and data refresh DAGs. For eachmedia_type
, it should generate a<media_type>_popularity_refresh
DAG which does the following:<media>_popularity_metrics
table to include any newly addedmetrics.
<media>_popularity_constants
view to recalculate the popularityconstants.
CONCURRENTLY
so that provider DAGs can continue readingfrom the view while it updates.
provider
in the<media>_popularity_constants
view,generate a
refresh_<provider>_scores
task. The task will run anUPDATE
ofthe
standardized_popularity
on all records matching that provider whichwere last updated before the task began.
refresh_<provider>_scores
tasks in parallelto speed up the update.
SKIPLIST
of providers that are present inthe
<media>_popularity_constants
view, but for which we do not want tocreate a refresh task. We currently have some providers (Nappy, Rawpixel,
Stocksnap) that support popularity data but are not dated, meaning scores
for all of their records will be updated the next time the DAG runs.
Note that some of these DAGs are on a
@monthly
schedule however, whichmeans skipping them in this DAG could result in delayed recalculation
time.
Refer to this section in the IP to ensure that the refresh tasks avoid issues with deadlocking and timeouts.
Additional context
The text was updated successfully, but these errors were encountered: