Switch storage over to timescaledb #9

ghickman · 2023-11-03T16:12:05Z

This assumes the datastore will have been configured with the timescale extension already. Locally this is handled by the timescaledb container, and in production we have already configured it.

The --database-url option for the CLI ensures we have that env var set. We could always pass it through (since it's in the context) but I wasn't sure if we actually needed it set via the CLI, but I wanted click to give us a consistent error when it's not set, and to do so immediately, rather than when the writer is called.

The writer class means we can ensure the appropriate table exists and has had a hypertable created for it when we enter the context block.

All tables have a unique constraint with the common _must_be_different suffix so the INSERT's can act as an UPSERT (using ON CONFLICT) when it's just the value that needs updating.

I tried out the schema-based method we discussed in slack:

create table in another schema
add data
move to primary schema

This worked, as we expected, but didn't fit how we've been planning to run these scripts, where they are adding a single period of data (eg day, week, etc) on each invocation.

Looking forward I'm expecting the table definitions to move to something slightly more complex. The GitHub and Slack tables are very similar, having been copied over from influx, but I can easily imagine a world where we need entirely different tables, and even tables that are not timeseries (a primary reason we're doing this switch!). At that point I expect us to look into ORM[-adjacent] soluations. The INSERT call in .write() is already a little gnarly and I'm sure someone else (almost certainly SQLAlchemy…) has better code for this than we will want to maintain. I suspect this will also make table management a little easier too.

However it didn't seem worth it for this PR and we can address that when it comes up.

metrics/timescaledb/writer.py

This assumes the datastore will have been configured with the timescale extension already. Locally this is handled by the timescaledb container, and in production we have already configured it. The --database-url option for the CLI ensures we have that env var set. We could always pass it through (since it's in the context) but I wasn't sure if we actually needed it set via the CLI. The writer class means we can ensure the appropriate table exists and has had a hypertable created for it. All tables have a unique constraint with the common _must_be_different suffix so the INSERT's can act as an UPSERT (with ON CONFLICT) when it's just the value that needs updating.

madwort reviewed Nov 3, 2023

View reviewed changes

metrics/timescaledb/writer.py Outdated Show resolved Hide resolved

ghickman force-pushed the timescale branch from 712652f to 0b8d188 Compare November 3, 2023 16:26

madwort approved these changes Nov 3, 2023

View reviewed changes

ghickman force-pushed the slack branch from 9a08f13 to 514e048 Compare November 3, 2023 16:55

ghickman force-pushed the timescale branch from 0b8d188 to 5a1cac9 Compare November 3, 2023 16:55

ghickman force-pushed the timescale branch from 5a1cac9 to f249614 Compare November 3, 2023 17:03

Base automatically changed from slack to main November 6, 2023 11:01

ghickman merged commit a10da2d into main Nov 6, 2023
4 checks passed

ghickman deleted the timescale branch November 6, 2023 11:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch storage over to timescaledb #9

Switch storage over to timescaledb #9

ghickman commented Nov 3, 2023

Switch storage over to timescaledb #9

Switch storage over to timescaledb #9

Conversation

ghickman commented Nov 3, 2023