You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is something I've had in mind for the last few days, but it's still incomplete.
Right now the history table is populated with question snapshots. This seems suboptimal:
We'll probably want to normalize the questions data eventually, e.g. extract options in a separate table (for performance and to unlock the possibility of more complex SQL queries), and it's unclear how to adapt the current history schema for that
history table is denormalized and includes a lot of duplicate data; also, it grows proportionally to the frequency with which we fetch the sources, and so doesn't play well with plans from Independent update schedules for different platforms #35
Alternative: implement an event-based storage which tracks only the changes in fields.
E.g., list of fields for the new table:
pk (serial id)
question_id
field (can be title, description, stars)
value (new value)
timestamp
Unique index by question_id + field.
This table would be populated only if the field value has changed. If the field hasn't changed from the previous fetch then there's no need to save it again.
This proposal is incomplete:
it doesn't explain how to track "deep" properties, e.g. if question had a change in one of the option titles, it's unclear what to put in field
I'm still confused on "we just fetched the new question data with its entire forecasts history from the platform" (because the platform provides the historical data) vs "we fetched the new question data and store its snapshot in our history table" — these are two different scenarios, ideally we need to handle both and abstract it away from the end users
I'll think about this some more before doing any code changes, and I'll wait until I become more familiar with the specifics of different platforms that we support. Just throwing this idea out there to gestate for now.
The text was updated successfully, but these errors were encountered:
Also, this is a better approach if we ever get more platforms with realtime capabilities (e.g., with webhooks for every event that happens on the platform), or if we implement pseudo-realtime capabilities ourselves (e.g., "fetch metaculus frontpage ordered by activity, and refetch only the questions which we haven't seen yet").
This is something I've had in mind for the last few days, but it's still incomplete.
Right now the
history
table is populated with question snapshots. This seems suboptimal:options
in a separate table (for performance and to unlock the possibility of more complex SQL queries), and it's unclear how to adapt the currenthistory
schema for thathistory
table is denormalized and includes a lot of duplicate data; also, it grows proportionally to the frequency with which we fetch the sources, and so doesn't play well with plans from Independent update schedules for different platforms #35Alternative: implement an event-based storage which tracks only the changes in fields.
E.g., list of fields for the new table:
pk
(serial id)question_id
field
(can betitle
,description
,stars
)value
(new value)timestamp
Unique index by
question_id
+field
.This table would be populated only if the field value has changed. If the field hasn't changed from the previous fetch then there's no need to save it again.
This proposal is incomplete:
field
I'll think about this some more before doing any code changes, and I'll wait until I become more familiar with the specifics of different platforms that we support. Just throwing this idea out there to gestate for now.
The text was updated successfully, but these errors were encountered: