Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: squashing migrations #345

Open
ericmeter opened this issue May 6, 2022 · 7 comments
Open

feature: squashing migrations #345

ericmeter opened this issue May 6, 2022 · 7 comments
Assignees

Comments

@ericmeter
Copy link

Hi,

I have a long-running project and the number of sequential migration files is getting pretty high; it's mainly cosmetic, but I was wondering what the best way to squash all existing migrations into a single migration would be?

My first thought is to write a script that loops through the *.sql files and concats the sequentially into a single new migration file, adding IF EXISTS/IF NOT EXISTS to the various CREATE/ALTER statements to make the actions idempotent. Then I can simply run the migration (which should have no effect on the DB), and delete the old files.

Is this the easiest way? Is there a way I can achieve this without adding the IF NOT EXISTS logic into my script (updating the pointer of the latest applied migration to my migration without actually having to run it)? Or is there an even easier way to achieve this?

Thank you!

@VojtechVitek
Copy link
Collaborator

Hi there,

We have this command in our CI pipeline, which we run on each commit:

pg_dump db --schema-only | grep -v -e '^--' -e '^COMMENT ON' -e '^REVOKE' -e '^GRANT' -e '^SET' -e 'ALTER DEFAULT PRIVILEGES' -e 'OWNER TO' | cat -s > ./data/schema/schema.sql

It exports the latest schema from our Postgres DB into a a schema.sql file. We check in this file into our git repository -- we find it useful to have a full schema in a single file, as it's easy to review incremental changes in Pull Requests.

We "squash" all schema migrations by removing all existing migrations and renaming this schema.sql to 000001.sql (a version that has ran already on all environments).

Hope this helps.

@wizhi
Copy link

wizhi commented Mar 6, 2023

@VojtechVitek sorry to necro this thread, but I'm trying to reproduce the squashing flow you mention here.

After squashing, do you also delete all recorded versions from the goose_db_version table?
At least, when using goose fix, the goose status and goose up commands will think that any migrations created after a "squash" are already applied.

I could see this working, just trying to figure out if I'm missing anything. :)

> goose -dir db/migrations create a sql
2023/03/06 12:12:45 Created new file: db/migrations/20230306121245_a.sql
> goose -dir db/migrations create b sql
2023/03/06 12:12:47 Created new file: db/migrations/20230306121247_b.sql
> goose -dir db/migrations create c sql
2023/03/06 12:12:49 Created new file: db/migrations/20230306121249_c.sql
> goose -dir db/migrations fix
2023/03/06 12:12:55 RENAMED 20230306121245_a.sql => 00001_a.sql
2023/03/06 12:12:55 RENAMED 20230306121247_b.sql => 00002_b.sql
2023/03/06 12:12:55 RENAMED 20230306121249_c.sql => 00003_c.sql
> goose -dir db/migrations up
2023/03/06 12:13:21 OK   00001_a.sql (4.57ms)
2023/03/06 12:13:21 OK   00002_b.sql (2.5ms)
2023/03/06 12:13:21 OK   00003_c.sql (4.97ms)
2023/03/06 12:13:21 goose: no migrations to run. current version: 3
> pg_dump --schema-only > db/schema.sql
> goose -dir db/migrations status
2023/03/06 12:14:02     Applied At                  Migration
2023/03/06 12:14:02     =======================================
2023/03/06 12:14:02     Mon Mar  6 11:13:21 2023 -- 00001_a.sql
2023/03/06 12:14:02     Mon Mar  6 11:13:21 2023 -- 00002_b.sql
2023/03/06 12:14:03     Mon Mar  6 11:13:21 2023 -- 00003_c.sql
> rm db/migrations/*
> cp db/schema.sql db/migrations/00001_schema.sql
> goose -dir db/migrations status
2023/03/06 12:15:11     Applied At                  Migration
2023/03/06 12:15:11     =======================================
2023/03/06 12:15:11     Mon Mar  6 11:13:21 2023 -- 00001_schema.sql
> goose -dir db/migrations create d sql
2023/03/06 12:15:45 Created new file: db/migrations/20230306121545_d.sql
> goose -dir db/migrations status
2023/03/06 12:15:48     Applied At                  Migration
2023/03/06 12:15:48     =======================================
2023/03/06 12:15:48     Mon Mar  6 11:13:21 2023 -- 00001_schema.sql
2023/03/06 12:15:48     Pending                  -- 20230306121545_d.sql
> goose -dir db/migrations fix
2023/03/06 12:15:51 RENAMED 20230306121545_d.sql => 00002_d.sql
> goose -dir db/migrations status
2023/03/06 12:15:53     Applied At                  Migration
2023/03/06 12:15:53     =======================================
2023/03/06 12:15:53     Mon Mar  6 11:13:21 2023 -- 00001_schema.sql
2023/03/06 12:15:53     Mon Mar  6 11:13:21 2023 -- 00002_d.sql
> goose -dir db/migrations up
2023/03/06 12:16:08 goose: no migrations to run. current version: 3
> psql -c 'DELETE FROM goose_db_version WHERE version_id > 1;'
DELETE 2
> goose -dir db/migrations status
2023/03/06 12:23:27     Applied At                  Migration
2023/03/06 12:23:27     =======================================
2023/03/06 12:23:27     Mon Mar  6 11:13:21 2023 -- 00001_schema.sql
2023/03/06 12:23:27     Pending                  -- 00002_d.sql
> goose -dir db/migrations up
2023/03/06 12:25:32 OK   00002_d.sql (4.86ms)
2023/03/06 12:25:32 goose: no migrations to run. current version: 2

@VojtechVitek
Copy link
Collaborator

We keep the goose table in tact.

Before:

0001-add-table-1.sql
0002-update-column.sql
0003-add-table-2.sql
0004-alter-table.sql
0005-add-index.sql

pg_dump db --schema-only | grep -v -e '^--' -e '^COMMENT ON' -e '^REVOKE' -e '^GRANT' -e '^SET' -e 'ALTER DEFAULT PRIVILEGES' -e 'OWNER TO' | cat -s > ./0001-squashed.sql

After

0001-squashed.sql
0006-add-table-3.sql
0007-etc.sql

@wizhi
Copy link

wizhi commented Mar 6, 2023

How does that work with goose fix?
Specifically, it seems goose fix will base the sequence number on the existing migration files, not the goose table.

Here's an example where migrations a, b, c, d, e are created, fixed, and upped.
Next, they're removed and replaced with a single 00001_squashed.sql migration.
Finally, a migration f is created and fixed, which renames it to 00002_f.sql.
Any attempt to up will now think no migrations exist, since the current version is 5 in the goose table.

Before

00001_a.sql
00002_b.sql
00003_c.sql
00004_d.sql
00005_e.sql

After

00001_squashed.sql
00002_f.sql
> goose -dir db/migrations status
2023/03/06 14:33:50     Applied At                  Migration
2023/03/06 14:33:50     =======================================
> goose -dir db/migrations create a sql
2023/03/06 14:33:54 Created new file: db/migrations/20230306143354_a.sql
> goose -dir db/migrations create b sql
2023/03/06 14:33:55 Created new file: db/migrations/20230306143355_b.sql
> goose -dir db/migrations create c sql
2023/03/06 14:33:57 Created new file: db/migrations/20230306143357_c.sql
> goose -dir db/migrations create d sql
2023/03/06 14:34:00 Created new file: db/migrations/20230306143400_d.sql
> goose -dir db/migrations create e sql
2023/03/06 14:34:02 Created new file: db/migrations/20230306143402_e.sql
> goose -dir db/migrations fix
2023/03/06 14:34:07 RENAMED 20230306143354_a.sql => 00001_a.sql
2023/03/06 14:34:07 RENAMED 20230306143355_b.sql => 00002_b.sql
2023/03/06 14:34:07 RENAMED 20230306143357_c.sql => 00003_c.sql
2023/03/06 14:34:07 RENAMED 20230306143400_d.sql => 00004_d.sql
2023/03/06 14:34:07 RENAMED 20230306143402_e.sql => 00005_e.sql
> goose -dir db/migrations up
2023/03/06 14:34:25 OK   00001_a.sql (5.1ms)
2023/03/06 14:34:25 OK   00002_b.sql (3.96ms)
2023/03/06 14:34:25 OK   00003_c.sql (3.6ms)
2023/03/06 14:34:25 OK   00004_d.sql (2.79ms)
2023/03/06 14:34:25 OK   00005_e.sql (2.5ms)
2023/03/06 14:34:25 goose: no migrations to run. current version: 5
> rm db/migrations/*
> pg_dump db --schema-only | grep -v -e '^--' -e '^COMMENT ON' -e '^REVOKE' -e '^GRANT' -e '^SET' -e 'ALTER DEFAULT PRIVILEGES' -e 'OWNER TO' | cat -s > db/migrations/00001_squashed.sql
> goose -dir db/migrations status
2023/03/06 14:35:16     Applied At                  Migration
2023/03/06 14:35:16     =======================================
2023/03/06 14:35:16     Mon Mar  6 13:34:25 2023 -- 00001_squashed.sql
> goose -dir db/migrations create f sql
2023/03/06 14:35:23 Created new file: db/migrations/20230306143523_f.sql
> goose -dir db/migrations status
2023/03/06 14:35:26     Applied At                  Migration
2023/03/06 14:35:26     =======================================
2023/03/06 14:35:26     Mon Mar  6 13:34:25 2023 -- 00001_squashed.sql
2023/03/06 14:35:26     Pending                  -- 20230306143523_f.sql
> goose -dir db/migrations fix
2023/03/06 14:35:30 RENAMED 20230306143523_f.sql => 00002_f.sql
> goose -dir db/migrations status
2023/03/06 14:35:32     Applied At                  Migration
2023/03/06 14:35:32     =======================================
2023/03/06 14:35:32     Mon Mar  6 13:34:25 2023 -- 00001_squashed.sql
2023/03/06 14:35:32     Mon Mar  6 13:34:25 2023 -- 00002_f.sql
> goose -dir db/migrations up
2023/03/06 14:35:38 goose: no migrations to run. current version: 5

Also, thanks a bunch for answering questions like this, it helps a lot. :)

@VojtechVitek
Copy link
Collaborator

I see, so go fix updates the number for you.

Maybe try this instead:

- 00001_squashed.sql
- 00002_f.sql
+ 00006_squashed.sql
+ 00007_f.sql

and keep incrementing the new migrations

@wizhi
Copy link

wizhi commented Mar 7, 2023

Yeah that seems like the obvious solution, thanks! :)

@justenwalker
Copy link

justenwalker commented Feb 1, 2024

Here is my current hack for creating a "squashed" schema SQL for Postgres.

You'll need:

  • goose (of course)
  • docker
  • pg_dump
  • sed
#!/bin/bash
set -euo pipefail

# --- Settings (update these) --- #
# - Where should the schema be written
SCHEMA_OUT=$PWD/schema/schema.sql
# - Where is the migration directory
MIGRATIONS_IN=$PWD/migration
# - Which postgres docker image to use
POSTGRES_IMAGE=postgres:14
# --- End of Settings --- #

# Start Postgres Server
docker run -d --name goose-postgres -p 15432:5432 --rm -e POSTGRES_PASSWORD=secret ${POSTGRES_IMAGE}
sleep 5 # TODO: better way to wait for db ready

export PGDATABASE=postgres
export PGHOST=127.0.0.1
export PGPORT=15432
export PGUSER=postgres
export PGPASSWORD=secret

export GOOSE_DRIVER=postgres
export GOOSE_DBSTRING="host=${PGHOST} port=${PGPORT} user=${PGUSER} dbname=${PGDATABASE} password=${PGPASSWORD} sslmode=disable"
goose status
goose -dir ${MIGRATIONS_IN} up
goose status
pg_dump --schema-only \
  --no-comments \
  --quote-all-identifiers \
  -T public.goose_db_version \
  -T public.goose_db_version_id_seq | sed \
    -e '/^--.*/d' \
    -e '/^SET /d' \
    -e '/^[[:space:]]*$/d' \
    -e '/^SELECT pg_catalog./d' \
    -e '/^ALTER TABLE .* OWNER TO "postgres";/d' \
    -e 's/"public"\.//' \
      > ${SCHEMA_OUT}
docker kill goose-postgres

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants