add tasks back to MoH KE #157

witash · 2024-09-24T15:21:13Z

Describe the issue
MoH KE provisioned 1TB storage for the postgres database, in a postgres managed by kubernetes
The persistent volume for this cannot be easily resized.
1TB was not enough to store the entire postgres database, and the initial dbt run to populate the database was failing.
To get the dashboards working as quickly as possible, all documents with type 'task' were deleted.
Since tasks accounted for 2/3 of the storage usage (before material views), this freed up enough space to allow the initial dbt run to complete
However, tasks are still useful for analytics and debugging, and deleting them is not a permanent solution.

Describe the improvement you'd like

When all the materialized views are finished, calculate exact storage requirements by adding a buffer of 1/2 (for temp tables and later additions) to the storage usage when everything is finished
Create a new postgres instance with at least this much storage; either it could be a new node in the cluster, or, if the external postgres has enough free storage, switch to using that
Create a new cht-sync instance using that postgres. It will take some time (about one week) to do the initial run, so leave the old one running.
when it is finished, switch dashboards to the new postgres and stop and delete the old cht-sync

Describe alternatives you've considered
There's other ways to get the old tasks, I think this will be the simplest overall.
We could try and save time by pgdump/restore the current database, but it then gets complicated with how merge the old tasks

To save time for the inital dbt run, pgdump and restore to the new postgres, excluding the source table. This may take a long time but cht-sync can keep updating while that is running.
in the new database, truncate couch2pg_progress
Run a new instance of cht-sync, without dbt, poitning to the new database. This will sync
when its finished (1-3 days), stop it and restore couch2pg_progress from the saved table
Switch the normal cht-sync to the new postgres. This will cause dashboards to be out of date again; it's starting over from where it was at step 4. That should be only a few days (at most week) of data so it should catch up quickly. couch2pg will try to sync some tasks that are already in the db; it will update saved_timestamp and treat them like updates, but it will not create duplicates.

dianabarsan · 2024-09-24T18:00:01Z

I think going with the simple solution is the better way, and involves less human intervention, less steps and less potential ways where this can go wrong.
Thanks for writing this up

witash added the Type: Technical issue Improve something that users won't notice label Sep 24, 2024

njuguna-n mentioned this issue Sep 27, 2024

DBT Pod running out of disk space #145

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add tasks back to MoH KE #157

add tasks back to MoH KE #157

witash commented Sep 24, 2024

dianabarsan commented Sep 24, 2024

add tasks back to MoH KE #157

add tasks back to MoH KE #157

Comments

witash commented Sep 24, 2024

dianabarsan commented Sep 24, 2024