Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This assumes the datastore will have been configured with the timescale extension already. Locally this is handled by the timescaledb container, and in production we have already configured it.
The --database-url option for the CLI ensures we have that env var set. We could always pass it through (since it's in the context) but I wasn't sure if we actually needed it set via the CLI, but I wanted click to give us a consistent error when it's not set, and to do so immediately, rather than when the writer is called.
The writer class means we can ensure the appropriate table exists and has had a hypertable created for it when we enter the context block.
All tables have a unique constraint with the common
_must_be_different
suffix so the INSERT's can act as an UPSERT (using ON CONFLICT) when it's just the value that needs updating.I tried out the schema-based method we discussed in slack:
This worked, as we expected, but didn't fit how we've been planning to run these scripts, where they are adding a single period of data (eg day, week, etc) on each invocation.
Looking forward I'm expecting the table definitions to move to something slightly more complex. The GitHub and Slack tables are very similar, having been copied over from influx, but I can easily imagine a world where we need entirely different tables, and even tables that are not timeseries (a primary reason we're doing this switch!). At that point I expect us to look into ORM[-adjacent] soluations. The INSERT call in
.write()
is already a little gnarly and I'm sure someone else (almost certainly SQLAlchemy…) has better code for this than we will want to maintain. I suspect this will also make table management a little easier too.However it didn't seem worth it for this PR and we can address that when it comes up.