Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tapdb: backup sqlite db before running database migration #968

Merged
merged 3 commits into from
Jun 28, 2024

Conversation

ffranr
Copy link
Contributor

@ffranr ffranr commented Jun 24, 2024

This commit adds functionality to back up any existing SQLite database file before attempting a database migration.

The process involves creating a backup file, migrating the database, and removing the backup if no changes occurred to the
database file. This approach avoids unnecessary backups but results in a backup being created and deleted each time tapd starts.

@ffranr ffranr force-pushed the backup-sqlite-db-before-migration branch 2 times, most recently from feecc5c to 53a1a97 Compare June 24, 2024 21:22
Copy link
Member

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice idea!

tapdb/sqlite.go Outdated Show resolved Hide resolved
tapdb/sqlite.go Outdated Show resolved Hide resolved
tapdb/sqlite.go Outdated
@@ -140,7 +144,41 @@ func NewSqliteStore(cfg *SqliteConfig) (*SqliteStore, error) {
// Now that the database is open, populate the database with our set of
// schemas based on our embedded in-memory file system.
if !cfg.SkipMigrations {
if err := s.ExecuteMigrations(TargetLatest); err != nil {
backupAndMigrate := func(mig *migrate.Migrate) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep the mechanism with the MigrationTarget? So we can do a backup while also being able to target a specific version? Would be handy for unit tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard for a skip backup flag to reach that point.

@ffranr ffranr force-pushed the backup-sqlite-db-before-migration branch 2 times, most recently from d4f8354 to ad71ab8 Compare June 25, 2024 14:33
@ffranr
Copy link
Contributor Author

ffranr commented Jun 25, 2024

I think the unit test is failing because the backup copies the file on disk but the backup procedure doesn't force sqlite to dump the db state to that file.

It might be wise to run the command sqlite3 my_database.db .dump > backup.sql rather than try to copy. And the command should run before we open a handle to the db.

@ffranr ffranr force-pushed the backup-sqlite-db-before-migration branch 2 times, most recently from 8d63551 to c2938e8 Compare June 27, 2024 13:05
@ffranr ffranr marked this pull request as ready for review June 27, 2024 13:05
@ffranr ffranr force-pushed the backup-sqlite-db-before-migration branch 2 times, most recently from fdcaf2a to 14c2a63 Compare June 27, 2024 13:18
@ffranr
Copy link
Contributor Author

ffranr commented Jun 27, 2024

I think this PR should be ready for reviews and merge now. I got rid of the big downside risk I was worried about (naively copying the database file). We now execute a VACUUM INTO query from the source db. See https://sqlite.org/lang_vacuum.html . This is a documented sqlite database backup strategy.

I've also added a unit test which checks the backup.

@ffranr ffranr requested review from guggero and jharveyb June 27, 2024 13:23
Copy link
Contributor

@jharveyb jharveyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good & great feature!

IIUC this won't delete any existing backups right? I think that's the correct behavior for now but just checking.

I imagine a similar feature is available for Postgres but can be deferred to another PR.

timestamp := time.Now().UnixNano()

// Add the timestamp to the backup name.
backupFullFilePath := fmt.Sprintf(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://sqlite.org/lang_vacuum.html

'The file named by the INTO clause must not previously exist, or else it must be an empty file, or the VACUUM INTO command will fail with an error.'

So we should check that the path isn't non-empty before we try the backup, and make an empty file there before the actual VACUUM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read that as: if a file does exist, it must be empty, otherwise it will fail with an error.

If a file does not exist at the backup path at all, then VACUUM INTO will just create one. I don't think we need to add an empty file as a sort of placeholder. And I think we can skip checking for an existing file because of the nanosecond timestamp included in the file name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read it the same way, should have tagged it with a nit. As in, the timestamp should indeed protect us from a file conflict.

Was suggesting the placeholder as a way to catch some errors before we call VACUUM vs. have SQLite err and possibly return less useful information on err.

Non-blocking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was suggesting the placeholder as a way to catch some errors before we call VACUUM vs. have SQLite err and possibly return less useful information on err.

I'm not sure what error info that we might catch with a placeholder that we wouldn't have otherwise seen. Do you know of any? I appreciate that this is a nit now. Just curious.

Copy link
Member

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea!

I think with the latest update that disallows downgrades, we can simplify some things here. But really like the approach and test!

tapdb/migrations.go Outdated Show resolved Hide resolved
tapdb/migrations.go Outdated Show resolved Hide resolved
tapdb/sqlite.go Outdated Show resolved Hide resolved
tapdb/sqlite.go Show resolved Hide resolved
tapdb/sqlite.go Outdated Show resolved Hide resolved
tapdb/sqlite.go Outdated Show resolved Hide resolved
tapdb/sqlite.go Outdated Show resolved Hide resolved
tapdb/sqlite.go Outdated Show resolved Hide resolved
tapdb/migrations_test.go Show resolved Hide resolved
tapdb/migrations_test.go Outdated Show resolved Hide resolved
ffranr added 3 commits June 27, 2024 19:17
This commit modifies the database migration procedure. Before performing
any actual migration, we first compute the current database migration
version and the maximum known migration version (the highest version we
could migrate to). This information is logged. We then pass this
information into the `MigrationTarget` closure, allowing it to decide
whether and how to migrate. This enables us to conditionally create a
database backup if a migration upgrade is possible.
This commit adds functionality to back up any existing SQLite
database file before attempting a database migration.

A backup file is only created if the database driver reports a different
current and max database versions.

This commit also adds a command line flag called `SkipMigrationDbBackup`
which can be set to skip database backup creation before migration.
@ffranr ffranr force-pushed the backup-sqlite-db-before-migration branch from 14c2a63 to 538aa43 Compare June 27, 2024 18:17
@ffranr ffranr requested a review from guggero June 27, 2024 18:18
@dstadulis dstadulis added this to the v0.4 milestone Jun 28, 2024
Copy link
Member

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, LGTM 🎉

@guggero guggero merged commit 311231d into main Jun 28, 2024
16 checks passed
@guggero guggero deleted the backup-sqlite-db-before-migration branch June 28, 2024 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

4 participants