sequential numbering with postgres or redshift schemas #63

rkulla · 2017-07-21T07:12:53Z

I'm just started using this fork because it has support for Redshift. However, I use schemas to organize tables with databases like redshift and postgres but the new sequential numbering of files, instead of timestamps, means that I have to put all my migration files in the same directory, otherwise the counter will start again at 00001 for each directory.

Say I have a single database in redshift or postgres, and it has multiple schemas CREATE SCHEMA foo; and CREATE SCHEMA bar;. If I wanted to organize my goose migrations by folders, I might have:

foo/
bar/

If i then ran cd foo && goose create add_cars sql then cd ../bar && goose_create add_trains sql, goose will create foo/00001_add_cars.sql and bar/00001_add_trains.sql. So running:

$ cd foo
$ goose postgres "user=rkulla dbname=postgres sslmode=disable" up
goose: no migrations to run. current version: 1
$ cd ../bar
$ goose postgres "user=rkulla dbname=postgres sslmode=disable" up
goose: no migrations to run. current version: 1

would only apply the migration under foo/ but not the one under bar/, because both migrations start with 00001_ and postgres or redshift only get ONE goose_db_version table to share amongst the different schema names. This makes them harder to organize unless I do goose create add_cars_foo sql' and goose create add_trains_bar sql, then run commands like ls *_foo.sqlandls *_bar.sql`.

It's not that big of a deal right now I guess, but I'm wondering if there's a better way and if it's really worth not using timestamps. IIRC, Ruby on Rails's migration feature used to use sequential numbers but developers complained because it caused a lot of conflicts when 2 different developers working on the same project made separate changes but both generated a migration with the same number. So Rails switched to UTC timestamps.

The text was updated successfully, but these errors were encountered:

duffn · 2017-07-21T11:21:24Z

I also prefer the timestamp naming convention. I'd suggest that both are offered in goose. If you prefer sequential, then default to that, but offer UTC timestamps with create NAME sql ts perhaps.

VojtechVitek · 2017-07-21T15:19:15Z

@rkulla sounds like a bug. Goose binary should see the subdirectories and give you latest_version_number++ properly. Anyone, who's using sub-dirs too and wants to solve this?

I'm strongly against Timestamp-based versioning, since it causes much worse unpredictable schema conflicts than sequential versioning. See discussion in #27.

In short:

Given two developers work on different git branches and they both create migrations that change the schema in "account" table. Let's say they both rename a column, change a default value and change an index constraint, for example.

Sequential versions

First dev merges his PR to master. Travis passes.
Second dev's Pull Request's Travis job will start blowing up on a goose migration name conflict (given the Travis builds off the PR branch merged into master using the "Build branch updates" Travis feature). The job will fail on conflicting version 0005:

0005_first_developer_account_changes.sql
0006_first_developer_some_more_changes.sql

0005_second_developer_different_account_changes.sql

Thus, the second dev will need to resolve the conflict (rename the migration) and will probably rebase against master to update his PR.

Timestamp-based versions

2017...08_first_developer_account_changes.sql
2017...15_first_developer_some_more_changes.sql
^ gets merged and deployed first

2017...03_second_developer_different_account_changes.sql
^ gets merged second ............. and will be ignored (because its version name is not higher than the latest applied version)

** Note that this is how goose work by design from its very beginnings in https://bitbucket.org/liamstask/goose/ repo.

As you can see, the sequential versions provide more safety during the development cycle. (Even though they might seem annoying and unnecessary from the beginning.)

But more importantly, they provide predictable order of schema migrations.

The predictable order has more benefits, though:

You can start storing the initial (pre-migration) and final (after-migration) schemas as golden files in your git repository. Very handy for PR reviews! With timestamp-based versions, you'd have to resolve git conflicts in these files anyway.
Also, you can configure your CI (Travis) to test goose up && goose reset && goose up as part of your test suite. From that point, you'll always make sure that your database schema is in a good shape, even when you need to roll-back for some reason. With timestamp-based versions, some migrations could be silently ignored .. and this test would not blow up.

Hope this makes sense.

I'm not saying sequential versioning is perfect - but it has some real benefits over timestamp-based versions.

I'd prefer if we thought about some tooling around conflicts (ie. bump version of all files that were not applied yet to resolve conflicts or something), rather than falling back to timestamps.

rkulla · 2017-07-21T16:27:21Z

@VojtechVitek I see. Thanks for the quick response.

I'm not particularly adamant about timestamps if the sequential versioning would work with sub-dirs. I do have some older timestamp based migrations that were made with liamstask/goose that would now be a mix of timestamps and sequential numbers. Do you think that would pose a problem if i don't rework them as sequential numbers, for people "migrating" from the old goose to this fork?

VojtechVitek · 2017-07-21T19:21:41Z

@rkulla it will still work. But goose create from this repo won't give you timestamp by default, though.

If you want to switch to sequential versions, you can always bump the latest migration to something like 10000000001 (something higher than latest timestamp) and go from there.

pawelma · 2017-08-22T14:36:13Z

@VojtechVitek IMO timestamp based migrations are a lot better than sequential ones. Major issue with sequential ones are collisions after merge. Why can't you check all timestamp versions against goose_db_version? In #27 someone provided argument with initial migration as example but in real world this case is minor and conflicts after merge are real problem. Logic and storage schemat should be discussed between developers, not migration numbers.

VojtechVitek · 2017-08-22T15:08:43Z

@pawelma please read my comment at #63 (comment).

Timestamp based approach doesn't guarantee ordering, in which the migrations get run. I think of DB schema migrations as of git tree with well defined history. Git history is based on commit hashes rather than on timestamps for the very same reason.

Problem is not when you merge two branches together, but in which order you run the DB migrations from these branches. Version-based naming prevents any conflicts, as it errors out on version collisions. I totally understand how annoying is to rename files manually. I don't like it either. But we don't have anything better right now. I wish we could just pick up the order from git history one day. Or automate the renaming somehow.

I'll give you one more example with timestamp-based approach:
Imagine two developers working on their branches with 5 migrations each and merging their work to master branch at the same time. After the merge, the master branch will have 10 new timestamp-based migrations that don't have order defined. If these developers changed schema of some related DB tables, you'll get schema errors (ie. first developer renamed column or added new index but the other one just wanted to enforce NOT NULL or remove a foreign key). This is something that version-based approach avoids.

pawelma · 2017-08-22T17:50:25Z

@VojtechVitek I read it earlier and it didn't convince me :)

We can use any prefix which provides uniqueness and some sort of order. The order should matter but should not take over implementation. The order should matter in a longer period but in short time it shouldn't be forced. As you mentioned git model is a tree, history there can be linear but by default, it isn't and developers take care to resolve conflicts or make history linear.

I agree with the problem described in your example but sequential versioning doesn't solve it. Imagin same issue with sequential versioning. Developer tries to merge his changes into master, CI tests are failing because there is another migration with the same number - so developer fixes the number and I can bet that in 90% (or even more) cases he doesn't check the code. If there is no restriction for 'merge only when tests pass' he may merge broken code. Let's assume that there is such restriction or he waits for another test execution - the result is red again because the original error was hidden by migration framework (instead of 1 debugging step he had to go through 2). What I want to say: You are not able to protect developers from mistakes but you can make their lives easier. Good practices like well-written tests, working CI, etc... should solve such problems.

VojtechVitek · 2017-08-22T19:40:34Z

but by default, it isn't and developers take care to resolve conflicts or make history linear.

Yea, it would be nice if we could get the version number automatically somehow, like in git. Technically, goose binary could look into git repository and recognize number of commits since the initial commit for each migration merged into master. git rev-list --count $(git rev-list --max-parents=0 HEAD | tail -n 1).. or something like that. But I don't really have time to explore this further at this moment, it's not that easy.

Sequential versioning enforces consistency of the DB schema on each environment. It's a stability decision. You probably want to have the same DB state on your localhost, in your CI tests, and then on staging/production.

You are not able to protect developers from mistakes but you can make their lives easier.

I can rely on CI tests that enforce consistent and predictable DB schema, thus protecting developers and production devops from issues on production. That's imho more important than forcing them to rename bunch of files.

Yes, it is annoying. But how can we do it better?

Unfortunately, timestamps are not the answer.

However, if you still prefer timestamp-based approach, you can simply edit this line: https://github.com/pressly/goose/blob/master/create.go#L22 and build your custom goose binary.
Custom binaries are easy to set up and allow you to write Go migrations: See https://github.com/pressly/goose/tree/master/examples/go-migrations

toudi · 2018-02-18T12:42:13Z

could a hinting solution be concidered? Django uses sequential number migrations, however it also allows one to specify dependencies. Therefore by adding one line:

-- +goose depends-on 0004_foo

-- +goose Up
....
-- +goose Down
....

goose could know how to arange the history.

croaker8 · 2018-07-24T19:21:27Z

I am running into serious problems when we have development going on on multiple branches. Scripts get created with the same sequence number on multiple branches. Then there is a conflict after a merge and deployment fails. And even worse if a different branch gets deployed with different numbering the state is hard to decipher. Not sure we will continue using goose.

mvrhov · 2018-07-25T04:20:14Z

@croaker8 : May I suggest that you add a timestamp into the sql/go file automatically so you can resolve the sequence problem. Even though you would have the time based sequences there is no guarantee that the migrations would be in order. And you would still run into the same problems e.g manually solving the order of them except there would be no merge conflicts.

Now on the other hand what @toudi suggested would solve this problem partially as the migrations would still be numbered out of the order and there would still be conflicts with merge however they might not be run out of order.

eaglemoor · 2018-09-27T15:53:49Z

We can use full name of file as version. It's give more information witch revision you apply. And the risk to create the same filename very low. And we don't need use timestamp in filename.

About prefix for sort. It's not a problem, we will have key '0001_create_user', '0001_create_fiends', some people can't create depending migration without parent migration.
If we have filename /migration/mike/0001_create_user and /migration/alex/0001_create_news all migrations will be apply, because they have different key

eaglemoor · 2018-09-27T16:10:11Z

And don't forget. All migration and branch, before release, merge in release branch and apply on staging step by step. If developers catch error on this step, they will roll back and try resolve problem localy.

And goose have well api goose status who show sorted migration list.

I think that goose need give to people simple tools for create migration and show clear logic how it work. And don't need resolved 0.0001% cases, which people need resolve by hand, because they wrong use goose and git

eaglemoor · 2018-09-27T16:21:16Z

Or, if you don't want destroy logic this number prefix, you need panic (exit 1), if you see 2 version 0003_.... with different filename. Then people will see this and will change filename.

In this moment, you just ignore different migration this different people with different filename, but with 0003_. prefix

toudi · 2018-09-29T22:24:15Z

Hi guys. I created a proof of concept with migration dependencies support and revisions (where the revision can be anything) and two types of inputs - SQL and fizz. Could this be a possible solution to this problem? I'm quite new to golang so I'd love your insight. There's one hardcoded path in there, but I just wanted to know what you'd think about the API

https://github.com/toudi/gorpheus

VojtechVitek · 2018-10-10T17:43:28Z

We came up with an interesting approach internally in Pressly with @1vn & @diogogmt :

Use timestamps in development branch, to avoid annoying file renaming & conflicts on rebase
Rename those timestamps-based filenames to serial numbers on git merge via simple merge bot to enforce order of the migrations and thus protect the state of DB schema on all environments

We're working on a proof on concept right now, and we might merge it back here, if there's enough interest. It's a tool that can be either run manually on master branch, or automatically in the CI/CD pipeline or via a "merge bot".

eaglemoor · 2018-10-10T18:09:35Z

And what about CVS, Subversion, Bazaar and Mercurial ?)

eaglemoor · 2018-10-10T18:15:21Z

I think timestamp (or version number) + filename will easy good solution problem. And u don't need (and don't) use some dependencies of other tools. No ?

VojtechVitek · 2018-10-10T18:23:57Z

@eaglemoor The solution we're thinking of doesn't depend on git exclusively. You can run the same thing against mercurial/CVS repo in your central "production" branch.

VojtechVitek · 2018-10-10T18:30:36Z

The timestamp migration order problem explained:

Let's assume we have two developers working on fairly big development branches:

joe branch:

migrations/2018-09-15-12:00:00_joe_1.sql
migrations/2018-10-10-12:00:00_joe_2.sql

alice branch:

migrations/2018-10-01-12:00:00_alice_1.sql
migrations/2018-10-18-12:00:00_alice_2.sql

Joe's branch get merged into master branch first:

master branch:

+migrations/2018-09-15-12:00:00_joe_1.sql
+migrations/2018-10-10-12:00:00_joe_2.sql

The master branch gets deployed to STAGING environment:

$ goose up
$ goose: migrating db environment 'staging', current version: 0, target: 2
$ OK    2018-09-15-12:00:00_joe_1.sql
$ OK    2018-10-10-12:00:00_joe_2.sql

Alice's branch gets merged into master branch right after:

master branch:

 migrations/2018-09-15-12:00:00_joe_1.sql
+migrations/2018-10-01-12:00:00_alice_1.sql
 migrations/2018-10-10-12:00:00_joe_2.sql
+migrations/2018-10-18-12:00:00_alice_2.sql

^ Note the order of these migrations. Goose orders migration files alphabetically.

The QA reports about a regression on STAGING, a hotfix must be provided before going to PRODUCTION
A hotfix is provided and merged into master
The master branch gets deployed to STAGING environment:

$ goose up
$ goose: migrating db environment 'staging', current version: 0, target: 0
$ OK    2018-10-02-12:00:00_alice_2.sql

QA team reports that STAGING looks good, we can go to production!
Hah, no one noticed that 2018-10-01-12:00:00_alice_1.sql migration was not applied on STAGING environment!!
The master branch gets deployed to PRODUCTION environment:

$ goose up
$ goose: migrating db environment 'staging', current version: 0, target: 2
$ OK    2018-09-15-12:00:00_joe_1.sql
$ OK    2018-10-01-12:00:00_alice_1.sql
$ OK    2018-10-02-12:00:00_joe_1.sql
$ OK    2018-10-18-12:00:00_alice_2.sql

The release engineering reports a NOW bug. The application is down after the PRODUCTION deployment.
After hours of debugging, someone notices a serious bug in 2018-10-01-12:00:00_alice_1.sql that was never executed and tested on STAGING. Oooops.

Now, the above story can be even worse, if you have multiple production environments, as we do in Pressly. We have a separate databases in Germany, Canada, US etc. because of data residency regulations. It's impossible to keep all the environments in sync, if we don't use sequential ordering.

The migration order mismatch can cause both

schema mismatches (ie. unexpected missing column, index, or a table .. you name it)
data inconsistencies (ie. UPDATE users SET deleted_at=NOW() WHERE id IN (23, 93, 442) that was never run)

VojtechVitek · 2018-11-06T23:39:17Z

Fixed in #120.

VojtechVitek · 2018-11-06T23:49:29Z

Released as v2.4.0.

mvrhov · 2018-11-07T08:23:44Z

IMO this would be better solved with migrations depending on another migration.

VojtechVitek · 2018-11-07T16:12:34Z

@mvrhov I disagree. Dependencies would be much more complex to implement & and also harder to maintain from the user perspective.

Dependencies between migrations would also lock you in, since you wouldn't be able to remove any older migration files that are referenced from the newer migrations. You'd have to think about all of that when maintaining the code.

In Pressly, we remove old migrations from our codebase every couple months and leave only the latest ~5 migrations. Regular clean-up keeps our codebase small and clean.

mvrhov · 2018-11-07T18:26:49Z

Well removing older migrations might not be an option. e.g. It depend on how many copies of your product do you have. Also you need older migrations for someone to set up the dev environment. Unless you are scrubbing sensitive data from database an your devs are then using almost a production base.. But If you run large saas, then this is not an option.

VojtechVitek · 2018-11-07T19:11:55Z

Well removing older migrations might not be an option. e.g. It depend on how many copies of your product do you have.
But If you run large saas, then this is not an option.

We run a large SAAS. We have five productions in five different AWS regions and millions of users in each copy of the product. And yes, we still do remove old migrations. Out of ~600 migrations, we have about 50 in our codebase right now.

Also you need older migrations for someone to set up the dev environment.

Let me explain what we do in order to remove the old migrations in Pressly:

We maintain two DB schema files in our codebase that are the "source" of truth for the initial state of the DB:

schema.sql
The latest schema of the application, which is used in all application tests & in our CI/CD pipeline. This file gets updated regularly in Pull Requests that have migrations changing the schema.
schema-pre-migration.sql
An older schema, which represents the state of the DB before running any migrations currently present in the codebase. If you import schema-pre-migration.sql into a DB and run goose up, you should end up with the identical state to schema.sql file. We actually run this check as a "DB migration test" in our CI/CD pipeline and run a diff between the current schema and the schema.sql file to ensure consistency and to catch any discrepancies.

When we remove old migration files, we have to update the schema-pre-migration.sql, of course. We pretty much "freeze" the schema.sql to the certain point of time.

Let me know if you have any questions. This could be probably worth a blog post :)

eeshugerman · 2019-04-02T02:51:59Z

@VojtechVitek

The QA reports about a regression on STAGING, a hotfix must be provided before going to PRODUCTION

A hotfix is provided and merged into master

What is the significance of points 5 and 6? Wouldn't the "versioning problem" still be present without this bit? Just want to make sure I understand this correctly as we're finally getting around to updating our CI/CD to use hybrid versioning.

eliaperantoni · 2019-05-10T12:44:03Z

@VojtechVitek Is the fix subcommand intended to be used in a development environment? Say me and a cowerker are sharing a develop branch in which we merge our changes .Would running goose fix after each merge in this branch be bad? So far I can only imagine it beeing beneficial for keeping a consistent and predictable order right from develop, even before mergin in master.

VojtechVitek mentioned this issue Oct 4, 2017

Missing migrations handling. BMP-596 #72

Closed

VojtechVitek mentioned this issue Nov 22, 2017

Timestamp migration ID #82

Closed

VojtechVitek mentioned this issue Sep 27, 2018

More information about migration version #117

Closed

eaglemoor mentioned this issue Oct 10, 2018

Add go.mod #103

Closed

VojtechVitek closed this as completed Nov 6, 2018

VojtechVitek mentioned this issue Dec 11, 2018

help me understand hybrid versioning? #134

Closed

eeshugerman mentioned this issue Apr 3, 2019

[Support][Documentation] Confused about how to use fix #168

Closed

bugzpodder mentioned this issue Jun 24, 2019

include missing migrations #183

Closed

dhui mentioned this issue Aug 16, 2019

History of applied migrations golang-migrate/migrate#179

Open

linux2647 mentioned this issue Feb 18, 2021

Use UTC for timestamps when creating a migration #242

Merged

chiiph mentioned this issue Sep 13, 2021

Make app config json migration not be skipped by patch release fleetdm/fleet#2034

Merged

1 task

mcosta74 mentioned this issue Dec 28, 2022

Another approach to conflicting migrations jackc/tern#55

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sequential numbering with postgres or redshift schemas #63

sequential numbering with postgres or redshift schemas #63

rkulla commented Jul 21, 2017

duffn commented Jul 21, 2017

VojtechVitek commented Jul 21, 2017 •

edited

Loading

rkulla commented Jul 21, 2017 •

edited

Loading

VojtechVitek commented Jul 21, 2017

pawelma commented Aug 22, 2017

VojtechVitek commented Aug 22, 2017

pawelma commented Aug 22, 2017

VojtechVitek commented Aug 22, 2017

toudi commented Feb 18, 2018

croaker8 commented Jul 24, 2018

mvrhov commented Jul 25, 2018

eaglemoor commented Sep 27, 2018 •

edited

Loading

eaglemoor commented Sep 27, 2018

eaglemoor commented Sep 27, 2018 •

edited

Loading

toudi commented Sep 29, 2018

VojtechVitek commented Oct 10, 2018

eaglemoor commented Oct 10, 2018

eaglemoor commented Oct 10, 2018

VojtechVitek commented Oct 10, 2018

VojtechVitek commented Oct 10, 2018

VojtechVitek commented Nov 6, 2018

VojtechVitek commented Nov 6, 2018 •

edited

Loading

mvrhov commented Nov 7, 2018

VojtechVitek commented Nov 7, 2018 •

edited

Loading

mvrhov commented Nov 7, 2018

VojtechVitek commented Nov 7, 2018

eeshugerman commented Apr 2, 2019 •

edited

Loading

eliaperantoni commented May 10, 2019

sequential numbering with postgres or redshift schemas #63

sequential numbering with postgres or redshift schemas #63

Comments

rkulla commented Jul 21, 2017

duffn commented Jul 21, 2017

VojtechVitek commented Jul 21, 2017 • edited Loading

Sequential versions

Timestamp-based versions

rkulla commented Jul 21, 2017 • edited Loading

VojtechVitek commented Jul 21, 2017

pawelma commented Aug 22, 2017

VojtechVitek commented Aug 22, 2017

pawelma commented Aug 22, 2017

VojtechVitek commented Aug 22, 2017

toudi commented Feb 18, 2018

croaker8 commented Jul 24, 2018

mvrhov commented Jul 25, 2018

eaglemoor commented Sep 27, 2018 • edited Loading

eaglemoor commented Sep 27, 2018

eaglemoor commented Sep 27, 2018 • edited Loading

toudi commented Sep 29, 2018

VojtechVitek commented Oct 10, 2018

eaglemoor commented Oct 10, 2018

eaglemoor commented Oct 10, 2018

VojtechVitek commented Oct 10, 2018

VojtechVitek commented Oct 10, 2018

VojtechVitek commented Nov 6, 2018

VojtechVitek commented Nov 6, 2018 • edited Loading

mvrhov commented Nov 7, 2018

VojtechVitek commented Nov 7, 2018 • edited Loading

mvrhov commented Nov 7, 2018

VojtechVitek commented Nov 7, 2018

eeshugerman commented Apr 2, 2019 • edited Loading

eliaperantoni commented May 10, 2019

VojtechVitek commented Jul 21, 2017 •

edited

Loading

rkulla commented Jul 21, 2017 •

edited

Loading

eaglemoor commented Sep 27, 2018 •

edited

Loading

eaglemoor commented Sep 27, 2018 •

edited

Loading

VojtechVitek commented Nov 6, 2018 •

edited

Loading

VojtechVitek commented Nov 7, 2018 •

edited

Loading

eeshugerman commented Apr 2, 2019 •

edited

Loading