This repository has been archived by the owner on Jan 13, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 60
Clean preexisting data using ImageStore #517
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mathemancer
requested review from
zackkrida,
aldenstpage,
a team and
ChariniNana
and removed request for
a team
October 20, 2020 14:00
aldenstpage
approved these changes
Oct 20, 2020
7 tasks
7 tasks
4 tasks
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes
Fixes #356 by @mathemancer
Description
This PR adds two new DAGs:
ImageStore
(defined atsrc/cc_catalog_airflow/dags/cleaner_workflow.py
), storing the resulting TSVs in a directory specific to cleaned, previously-ingested data, andTechnical details
The main file hooking things together is at
src/cc_catalog_airflow/dags/util/pg_cleaner.py
.Tests
There are numerous new tests covering the functionality. Also, one is welcome to set up the dockerized dev environment using the README, load some data into the local DB using the appropriate workflows, then turn on the two new workflows:
postgres_image_cleaner
tsv_to_postgres_loader_overwrite
to see their effects.
Checklist
Update index.md
).main
ormaster
).I added or updated documentation (if applicable).visible errors.
Developer Certificate of Origin
Developer Certificate of Origin