Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not start database server #31

Open
christoph-buente opened this issue Feb 19, 2024 · 2 comments
Open

Could not start database server #31

christoph-buente opened this issue Feb 19, 2024 · 2 comments

Comments

@christoph-buente
Copy link
Contributor

Hi @pellepelster,

thanks for putting so much effort into making the RDS system work on Hetzner. It's a great set of tools and Hetzner should consider adopting it. We want to start using it in a production environment and we are testing out a lot of scenarios to make sure we can be up and running with our project at all times.

Today we ran into an issue when rotating the db host. I admit it's an edge case, but wanted to report it anyway. Here's what we did:

Current setup

  • We have a running RDS Host with a database, our app is able to connect
  • We have a mounted data volume that has the DB data
  • We have an S3 bucket in place containing full and incremental backups

What is the issue

We discovered, that the incremental backups caused a lot of HEAD requests to the S3 API, which exhausted our free quota in Backblaze. I configured the incremental backups to be done every 4 hours instead of every hour by setting this value in Terraform:

    backup_incr_calendar = "*-*-* 00/4:30"

This caused the cloud-init script to change, which in turn forced the RDS host to be rotated.

When the new host started, it found the PGDATA directory on the mounted data volume. It starts the postgres server and i can connect to it. However, the S3 API returned a 403 forbidden when it wanted to check the database backups on S3. This caused the container to stop, leaving the database server not running.

Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m 2024-02-19 14:03:08.328 P00   INFO: stanza-create for stanza 'database01' on repo1
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m ERROR: [039]: HTTP request failed with 403:
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m        *** Path/Query ***:
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m        HEAD /pgbackrest/archive/database01/archive.info
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m        *** Request Headers ***:
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m        authorization: <redacted>
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m        content-length: 0
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m        host: <redacted>.backblazeb2.com
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m        x-amz-content-sha256: <redacted>
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m        x-amz-date: <redacted>
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m        *** Response Headers ***:
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m        cache-control: max-age=0, no-cache, no-store
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m        content-length: 259
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m        content-type: application/xml
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m        date: Mon, 19 Feb 2024 14:03:08 GMT
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m        x-amz-id-2: <redacted>
Feb 19 14:03:08 database01 docker-compose[15574]: #033[36mdatabase01_postgresql |#033[0m        x-amz-request-id: <redacted>

What do I expect?

If I have a working Postgres database with data volume and the Postgres server came up successfully, I would appreciate a warning about the S3 backup not being reachable. But I would expect the Postgres server to stay running instead of the container to stop.

Appreciate your thoughts on this.

@pellepelster
Copy link
Owner

I am a little torn on this issue. I understand the expectation that the database should start no matter what, but my fear is that this could lead to backups silently failing without anyone noticing.
Even worse ignoring errors here could lead to a situation where the backup never works but the database behaves as if it was working.
Maybe a flag like ignore_backup_errors would make this explicit and force a conscious decision by the user.

@christoph-buente
Copy link
Contributor Author

I know, it's not easy. And you have a point! Is there a way to notify somehow that the backup was not working? Or the backup host is not reachable? I know it all happens in the context of the cloud-init script which is by design not interactive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants