You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thanks for putting so much effort into making the RDS system work on Hetzner. It's a great set of tools and Hetzner should consider adopting it. We want to start using it in a production environment and we are testing out a lot of scenarios to make sure we can be up and running with our project at all times.
Today we ran into an issue when rotating the db host. I admit it's an edge case, but wanted to report it anyway. Here's what we did:
Current setup
We have a running RDS Host with a database, our app is able to connect
We have a mounted data volume that has the DB data
We have an S3 bucket in place containing full and incremental backups
What is the issue
We discovered, that the incremental backups caused a lot of HEAD requests to the S3 API, which exhausted our free quota in Backblaze. I configured the incremental backups to be done every 4 hours instead of every hour by setting this value in Terraform:
backup_incr_calendar = "*-*-* 00/4:30"
This caused the cloud-init script to change, which in turn forced the RDS host to be rotated.
When the new host started, it found the PGDATA directory on the mounted data volume. It starts the postgres server and i can connect to it. However, the S3 API returned a 403 forbidden when it wanted to check the database backups on S3. This caused the container to stop, leaving the database server not running.
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m 2024-02-19 14:03:08.328 P00 INFO: stanza-create for stanza 'database01' on repo1
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m ERROR: [039]: HTTP request failed with 403:
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m *** Path/Query ***:
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m HEAD /pgbackrest/archive/database01/archive.info
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m *** Request Headers ***:
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m authorization: <redacted>
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m content-length: 0
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m host: <redacted>.backblazeb2.com
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m x-amz-content-sha256: <redacted>
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m x-amz-date: <redacted>
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m *** Response Headers ***:
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m cache-control: max-age=0, no-cache, no-store
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m content-length: 259
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m content-type: application/xml
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m date: Mon, 19 Feb 2024 14:03:08 GMT
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m x-amz-id-2: <redacted>
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m x-amz-request-id: <redacted>
What do I expect?
If I have a working Postgres database with data volume and the Postgres server came up successfully, I would appreciate a warning about the S3 backup not being reachable. But I would expect the Postgres server to stay running instead of the container to stop.
Appreciate your thoughts on this.
The text was updated successfully, but these errors were encountered:
I am a little torn on this issue. I understand the expectation that the database should start no matter what, but my fear is that this could lead to backups silently failing without anyone noticing.
Even worse ignoring errors here could lead to a situation where the backup never works but the database behaves as if it was working.
Maybe a flag like ignore_backup_errors would make this explicit and force a conscious decision by the user.
I know, it's not easy. And you have a point! Is there a way to notify somehow that the backup was not working? Or the backup host is not reachable? I know it all happens in the context of the cloud-init script which is by design not interactive.
Hi @pellepelster,
thanks for putting so much effort into making the RDS system work on Hetzner. It's a great set of tools and Hetzner should consider adopting it. We want to start using it in a production environment and we are testing out a lot of scenarios to make sure we can be up and running with our project at all times.
Today we ran into an issue when rotating the db host. I admit it's an edge case, but wanted to report it anyway. Here's what we did:
Current setup
What is the issue
We discovered, that the incremental backups caused a lot of HEAD requests to the S3 API, which exhausted our free quota in Backblaze. I configured the incremental backups to be done every 4 hours instead of every hour by setting this value in Terraform:
This caused the cloud-init script to change, which in turn forced the RDS host to be rotated.
When the new host started, it found the PGDATA directory on the mounted data volume. It starts the postgres server and i can connect to it. However, the S3 API returned a 403 forbidden when it wanted to check the database backups on S3. This caused the container to stop, leaving the database server not running.
What do I expect?
If I have a working Postgres database with data volume and the Postgres server came up successfully, I would appreciate a warning about the S3 backup not being reachable. But I would expect the Postgres server to stay running instead of the container to stop.
Appreciate your thoughts on this.
The text was updated successfully, but these errors were encountered: