Drop GitHub tables before each run #53
Merged
+151
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Now that we're backfilling all data on each run we can simplify some of that by dropping the table each time we run the backfill. We still maintain the upsert functionality so this is not entirely necessary but it also helps in local development.
The backfill time is down to <2m locally and the expectation is that we'll run backfills in the middle of the night so I'm not expecting this to cause problems for users in production.
While we only have one GitHub table currently this looks for all
github_*
tables on the basis that we're not ingesting all the data all at once so future tables (eg issues) won't have to worry about doing this too.