You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We'll be truncating SDF Horizon's history retention to 1 year later this year. To our knowledge, most partners that enable history retention use 1-3 months of history, so it's possible that there could be issues that only present with this data profile (not retaining full history, but large retention window) that we simply haven't seen or heard of yet.
We should dry-run the truncation and mirror traffic to it for some amount of time, observing the performance impact and resolving any issues that arise from this process. Given the timing and the need to continue using staging to test/issue releases of Horizon prior to the truncation, we should not be doing this on the staging cluster and will need to spin up a new/independent one.
Enabling reaping on that instance and set the retention to 1 year
There are different ways this can be accomplished and we need time to evaluate that. For example, we could turn on reaping on the whole DB and see what happens (which may result in a lockup due to the massive amount of data that needs to be reaped, plus a possible full vacuum) or we could start from scratch and ingest a year+ of data and then enable reaping, or there may be other options.
After discussion, it appears we must approach this by reaping the whole DB, because reingestion may take on the order of months. The hope 🤞 is that because the database will be much smaller, the full vacuum will be feasible without any extra operational concerns.
@tamirms , what is the latest status on spinning up the test db cluster for this reaping test effort? I think you've mentioned it was in progress but pending due to PG16 issues?
I ask b/c @aditya1702 and I are triaging reports of the reaper sql becoming non-performant in pubnet db ingestion deployments of horizon, such as this one from community member observed reaper timeouts on issues/5299 and #5320
triaging the reported problem is very similar to doing the dry-run validation effort, we can probably converge on this and join the effort to obtain reaper results in a staging environment as it helps both cases?
We'll be truncating SDF Horizon's history retention to 1 year later this year. To our knowledge, most partners that enable history retention use 1-3 months of history, so it's possible that there could be issues that only present with this data profile (not retaining full history, but large retention window) that we simply haven't seen or heard of yet.
We should dry-run the truncation and mirror traffic to it for some amount of time, observing the performance impact and resolving any issues that arise from this process. Given the timing and the need to continue using staging to test/issue releases of Horizon prior to the truncation, we should not be doing this on the staging cluster and will need to spin up a new/independent one.
At a minimum:
Upgrade PostgreSQL 12 ➡️ 16 (services/horizon: upgrade psql support to most recent versions #4831)The text was updated successfully, but these errors were encountered: