Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add tasks back to MoH KE #157

Open
witash opened this issue Sep 24, 2024 · 1 comment
Open

add tasks back to MoH KE #157

witash opened this issue Sep 24, 2024 · 1 comment
Labels
Type: Technical issue Improve something that users won't notice

Comments

@witash
Copy link
Contributor

witash commented Sep 24, 2024

Describe the issue
MoH KE provisioned 1TB storage for the postgres database, in a postgres managed by kubernetes
The persistent volume for this cannot be easily resized.
1TB was not enough to store the entire postgres database, and the initial dbt run to populate the database was failing.
To get the dashboards working as quickly as possible, all documents with type 'task' were deleted.
Since tasks accounted for 2/3 of the storage usage (before material views), this freed up enough space to allow the initial dbt run to complete
However, tasks are still useful for analytics and debugging, and deleting them is not a permanent solution.

Describe the improvement you'd like

  1. When all the materialized views are finished, calculate exact storage requirements by adding a buffer of 1/2 (for temp tables and later additions) to the storage usage when everything is finished
  2. Create a new postgres instance with at least this much storage; either it could be a new node in the cluster, or, if the external postgres has enough free storage, switch to using that
  3. Create a new cht-sync instance using that postgres. It will take some time (about one week) to do the initial run, so leave the old one running.
  4. when it is finished, switch dashboards to the new postgres and stop and delete the old cht-sync

Describe alternatives you've considered
There's other ways to get the old tasks, I think this will be the simplest overall.
We could try and save time by pgdump/restore the current database, but it then gets complicated with how merge the old tasks

  1. To save time for the inital dbt run, pgdump and restore to the new postgres, excluding the source table. This may take a long time but cht-sync can keep updating while that is running.
  2. in the new database, truncate couch2pg_progress
  3. Run a new instance of cht-sync, without dbt, poitning to the new database. This will sync
  4. when its finished (1-3 days), stop it and restore couch2pg_progress from the saved table
  5. Switch the normal cht-sync to the new postgres. This will cause dashboards to be out of date again; it's starting over from where it was at step 4. That should be only a few days (at most week) of data so it should catch up quickly. couch2pg will try to sync some tasks that are already in the db; it will update saved_timestamp and treat them like updates, but it will not create duplicates.
@witash witash added the Type: Technical issue Improve something that users won't notice label Sep 24, 2024
@dianabarsan
Copy link
Member

I think going with the simple solution is the better way, and involves less human intervention, less steps and less potential ways where this can go wrong.
Thanks for writing this up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Technical issue Improve something that users won't notice
Projects
None yet
Development

No branches or pull requests

2 participants