History of applied migrations #179

jabbors · 2019-02-26T05:43:40Z

Problem
A use case that have come up with a customer would require us to keep a history of when migrations have been applied.

Solution
Most migration system I've seen include history of all applied migrations in the schema_migrations table. Instead of just containing the last applied version and dirty state I would propose that each migration applied would end up as a new row in the table, which includes the number, name and date applied. The table structure could be something like.

create table schema_migrations(
    id serial primary key not null,
    number int not null,
    name text not null,
    dirty bool not null default false,
    applied_at timestamp not null default now()
);

Which would yield something like this when migrations are applied

select * from schema_migrations;
 id | number |     name     | dirty |         applied_at
----+--------+--------------+-------+----------------------------
  1 |      1 | table bar    | f     | 2019-02-26 07:39:27.302288
  3 |      2 | table bar    | f     | 2019-02-26 07:40:21.939487
  4 |      3 | table foobar | t     | 2019-02-26 07:40:54.243851

The text was updated successfully, but these errors were encountered:

dhui · 2019-02-26T23:16:58Z

A use case that have come up with a customer would require us to keep a history of when migrations have been applied.

Why do they want this?

Also see: #65

I'm hesitant to add this functionality to migrate unless there's a good reason since the required work is non-trivial. e.g. need to make a schema change to the schema version table for every DB driver and what dates should be used to backfill existing migrations?

It may be easier to add lifecycle hooks to migrate and have the consumer track the state for each migration...

jabbors · 2019-02-27T06:51:51Z

It's not the customer that requires it explicitly, it's more of an internal compliance towards customers' requirements where a history would be useful.

It can be accomplished with SQL functions, e.g. a trigger to copy every add/update to a new table with a timestamp column. The important part is the number and date.

I wanted to bring up this question if other users would find it useful and since it hasn't been asked before. #65 is similar but not what I'm looking for.

The implementation details can be decided upon later if this issue is accepted and put on the roadmap. I understand that it might not be trivial to implement and some decision must be made for regarding existing migrations etc.

dhui · 2019-02-27T19:42:15Z

I wanted to bring up this question if other users would find it useful and since it hasn't been asked before.

I'll leave this issue open to gather feedback

It can be accomplished with SQL functions, e.g. a trigger to copy every add/update to a new table with a timestamp column.

IDK if every DB supported by migrate supports triggers. That being said, each schema version table is independently managed by the driver. e.g. the schema version table doesn't need to support the same structure across different DB drivers unless we want to change the driver interface to support history

In the meanwhile, so you're not blocked, it sounds like you could start tracking history yourself by adding a trigger.

jabbors · 2019-02-28T05:28:28Z

Yeah, in the mean time we can track history with a trigger. I plan on creating the table and trigger with a migration. I can share the final result when it's done should anyone else finds it useful.

deuill · 2019-03-18T15:39:44Z

As an additional data-point, we're running a service where database migrations are applied on startup with Migration.Up(), and never noticed having missed some migrates until a few days later where a database migration that referenced a table created in an earlier, missed migration, was failing. We run automated deploys on merges, and this was on staging instances of the service, thankfully.

This, of course, comes down to the fact that certain pull-requests introducing migrates were merged out-of-order, and we should've been more diligent with at least merging these in-order, if not re-numbering for the right order.

However, the fact still remains that the migration system did not error out with these missing migrates, as it does when a version stored in the database does not have a corresponding migrate file (this sometimes happens when testing changes locally, and we're jumping between branches).

It would be wonderful if we could apply these missing migrates in their versioned order, but throwing an error is imperative in catching these issues early, and tracking the history is, I believe, mandatory in implementing this.

I can try and work up some reasonable implementation for at least the subset of database engines I can test for, if you'd be willing to entertain the idea.

dhui · 2019-03-23T00:13:06Z

@deuill I'm glad you caught the issue before going to production!

Have you seen my comment here? #65 (comment)

I'm interested in solving the problem, but I'm not sure what the best solution for this problem (detecting changed/missing migrations) is.
The earlier the missing migration is detected in the dev process, the better. The problem is that everyone/team has a different dev and deploy process.

We should discuss solutions for this problem before implementing anything.

Goals for any solution (ordered by importance):

Correctly catch missing migrations e.g. no false positives or false negatives
Seamlessly integrate with existing users of migrate
- migrate doesn't use migrate to manage it's schema verison table, but could start packing migrations in it's code.
Support everyone's workflow
- Would need to explicitly list workflows we support
- May need overrides similar to migrate force
Catch the missing migration as early as possible

Nice to have for solutions:

Detect modified migrations
- Also support overriding/ignoring flags if someone intentionally modifies a migration

Current solutions:

Track history of applied migrations in DB (schema version table)
Manage manifest file of migration sources

OscarHanzely · 2019-04-02T21:58:16Z

I very much agree with @deuill
This is first time I see migration system that is so much unsafe.
Ina team of multiple people with dependencies we expect that migrations will be applied in order merged in the code, not necesarily based on the developer's file creation time since some features could take longer to develop. Even if we would agree to the time sensitive order, we still have to track the files not just the head of the state. This way it is too dangerous to ignore/skip migration file. Of course it also breaks the option to run migration manually from psql,sql consoles command, but that's the point of migrations. The package should be only authority managing it.

I don;t see much benefit from maintaining/detecting modifications on migration files, once merged with code or release it should not be touched at all. History is essential through to catch possible production issues especially with dependencies.

Apparently this issue has been pointed two years ago mattes/migrate#237 with no resolution agreed on.

dahu33 · 2019-04-04T02:04:09Z

100% agree as well. Having an history of the migrations and being able to apply missing migrations (out of order) are a must. Maybe a feature for v5?

jabbors · 2019-04-04T03:55:47Z

A migration (for PostgresSQL) to track history of applied migrations.

BEGIN;

SET ROLE 'postgres';

CREATE TABLE schema_migrations_history (
    id SERIAL PRIMARY KEY NOT NULL,
    version BIGINT NOT NULL,
    applied_at timestamptz NOT NULL DEFAULT NOW()
);

CREATE OR REPLACE FUNCTION track_applied_migration()
RETURNS TRIGGER AS $$
DECLARE _current_version integer;
BEGIN
    SELECT COALESCE(MAX(version),0) FROM schema_migrations_history INTO _current_version;
    IF new.dirty = 'f' AND new.version > _current_version THEN
        INSERT INTO schema_migrations_history(version) VALUES (new.version);
    END IF;
    RETURN NEW;
END;
$$ language 'plpgsql';

-- TRIGGER
CREATE TRIGGER track_applied_migrations AFTER INSERT ON schema_migrations FOR EACH ROW EXECUTE PROCEDURE track_applied_migration();

COMMIT;

This adds a new row to table schema_migrations_history when a row, with a greater version number, is inserted to table schema_migrations. It solves our use case where we use sequence numbers as versions instead of epoch time. Our migrations are also tested as part of our CI/CD pipeline where we can spot and prevent adding migrations with conflicting (or smaller) version numbers that would result in unapplied migrations.

OscarHanzely · 2019-04-04T14:05:50Z

@jabbors good implementation of history for a use case like mentioned above that the contract with client required to know the timestamp. It does not prevent the issues with skipping the source by migration tool, nor it updates history on migrate down, but it is great database implementation of tracking. We were thinking about something like that just for sake fo checking if any error occurs.

I am most afraid of the skipped migration file from source, we will be adding a policy that upon code merge the migration file timestamp version must be the current time to avoid issues. Otherwise, the newer timestamp merged/released before would cause any migration file with an old timestamp to never being applied.

dhui · 2019-08-16T06:57:33Z

Turns out pressly/goose has a different solution to this problem, which is to use timestamped migrations in dev and sequential migrations in prod and to explicitly serialize the migrations by converting the timestamped migrations to be sequential before deploy. I'm not a fan of this approach because of the following shortcomings:

Doesn't address the issue that the migrations created in one order but merged in a different order may have incompatible schema changes.
The developer needs to be aware of migration file names differing between the different environments which increases the amount of knowledge/context necessary during deploy and can complicate debugging any issues.

dahu33 · 2019-08-30T09:47:28Z

Probably one of the best migration tool out there is FlywayDB. Why not getting a bit of inspiration from them? https://flywaydb.org/documentation/migrations#overview

Something I really like with FlywayDB is they let you manage your migration the "way you want". For example, down migrations are optional and migrations can be run "out of order" (optional):
https://flywaydb.org/documentation/commandline/migrate#outOfOrder

I would love to have similar features and flexibility with golang-migrate. That would make the tool less opinionated on "how migration should be or should not be run" and it would most likely adapt better to different type of workflow/environments.

Just my 2cents :)

tsauvajon · 2019-10-22T00:03:05Z

This would be a great addition to go-migrate indeed, it's hard to keep track of what has been applied and when.

nazar-pc · 2019-11-29T14:12:10Z

We have the same issue with multiple migrations being merged not necessarily in the same order as they were created and expectation is that non-conflicting migrations will still be applied. So there should be a history of applied migrations and missing ones should be running. Non-conflicting merges are, obviously, task for developers to ensure they merge what should be merged and when.

NickOttrando · 2020-06-19T18:41:50Z

If this isn't going to be implemented with the suggested approach, i'd recommend changing the table name from schema_migrations to current_schema_version. The current name is very misleading when I apply multiple migrations and only see the most recent one in the table. Documentation around how this table works would also be very helpful.

dhui · 2020-11-08T04:42:54Z

Will this proposed design (#470) meet most of your needs? e.g. everything except for auditability

jabbors · 2020-11-08T06:11:11Z

We have several production environments and it's common that some environments fall behind both in regards of deploying application upgrades and new migrations. For this reason, auditability is preferred in case we need to do some troubleshooting and track when a migration has been applied. But we have also solved that in Postgres by applying the above trigger function as an initial migration.

As for the proposed designed, it will be a great addition and it will help us with some issues relating to CI.

amanangira · 2021-09-29T07:56:21Z

Problem A use case that have come up with a customer would require us to keep a history of when migrations have been applied.

Solution Most migration system I've seen include history of all applied migrations in the schema_migrations table. Instead of just containing the last applied version and dirty state I would propose that each migration applied would end up as a new row in the table, which includes the number, name and date applied. The table structure could be something like.
create table schema_migrations(
    id serial primary key not null,
    number int not null,
    name text not null,
    dirty bool not null default false,
    applied_at timestamp not null default now()
);
Which would yield something like this when migrations are applied
select * from schema_migrations;
 id | number |     name     | dirty |         applied_at
----+--------+--------------+-------+----------------------------
  1 |      1 | table bar    | f     | 2019-02-26 07:39:27.302288
  3 |      2 | table bar    | f     | 2019-02-26 07:40:21.939487
  4 |      3 | table foobar | t     | 2019-02-26 07:40:54.243851

@jabbors we are planning to implement a similar approach. Is there any references that you can share for the implementation that you have made? I am still new to go-lang-migrate and it seems most of the migration logic is part of the package.

jabbors · 2021-09-30T05:58:40Z

@amanangira this is the implementation that we are using #179 (comment).

It meets our requirements and have been working so far. Below is a sample from the table in a live system. Note: we are using a sequential version number in our migrations files {version}_{title}.up.{extension}, where a file would look something like 0165_create_table_foorbar.up.sql

SELECT * FROM schema_migrations_history ORDER BY applied_at DESC LIMIT 10;
 id  | version |          applied_at
-----+---------+-------------------------------
 168 |     182 | 2021-09-28 15:06:02.894574+00
 167 |     181 | 2021-09-28 12:59:10.771168+00
 166 |     180 | 2021-09-28 10:10:01.494816+00
 165 |     179 | 2021-06-11 09:38:54.363129+00
 164 |     178 | 2021-06-10 07:34:28.292296+00
 163 |     177 | 2021-06-08 08:31:25.304139+00
 162 |     176 | 2021-06-03 10:53:32.802223+00
 161 |     175 | 2021-06-01 07:35:30.812291+00
 160 |     174 | 2021-05-26 09:13:36.25021+00
 159 |     173 | 2021-05-25 06:13:45.053859+00
(10 rows)

jabbors · 2021-09-30T06:04:02Z

A good practice that we have adapted is to create the trigger as the very first migration for new databases, ie. 0000_initial.up.sql

amanangira · 2021-09-30T06:05:37Z

@amanangira this is the implementation that we are using #179 (comment).

It meets our requirements and have been working so far. Below is a sample from the table in a live system. Note: we are using a sequential version number in our migrations files {version}_{title}.up.{extension}, where a file would look something like 0165_create_table_foorbar.up.sql
SELECT * FROM schema_migrations_history ORDER BY applied_at DESC LIMIT 10;
 id  | version |          applied_at
-----+---------+-------------------------------
 168 |     182 | 2021-09-28 15:06:02.894574+00
 167 |     181 | 2021-09-28 12:59:10.771168+00
 166 |     180 | 2021-09-28 10:10:01.494816+00
 165 |     179 | 2021-06-11 09:38:54.363129+00
 164 |     178 | 2021-06-10 07:34:28.292296+00
 163 |     177 | 2021-06-08 08:31:25.304139+00
 162 |     176 | 2021-06-03 10:53:32.802223+00
 161 |     175 | 2021-06-01 07:35:30.812291+00
 160 |     174 | 2021-05-26 09:13:36.25021+00
 159 |     173 | 2021-05-25 06:13:45.053859+00
(10 rows)

I probably missed this in the thread. This looks interesting. I will spend more time putting the approach against our use case. Thank you @jabbors

dillonstreator · 2021-10-19T20:50:34Z

As a workaround, my team has opted to use a custom required GitHub action which runs on new PRs. This also requires enforcing the strict status check branch protection rule so that the head branch is up to date with the base branch (ensuring the script compares against the latest migrations).

# .github/workflows/migration-validation.yml

name: migration-validation
on:
  pull_request:
    types: [opened, synchronize, edited, reopened]
    branches:
      - main
jobs:
  migration-validation:
    name: migration-validation
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: update checkout
        run: git fetch --prune
      - name: migration-validation
        run: scripts/migration-validation.sh ${{ github.event.pull_request.base.ref }}
        shell: bash

# scripts/migration-validation.sh

BASE_BRANCH=$1

OLDEST_NEW_MIGRATION_FILE=$(git diff --name-only origin/$BASE_BRANCH --diff-filter=d | grep -m1 db/migrations/)

if [[ -z $OLDEST_NEW_MIGRATION_FILE ]]; then
    echo "no new migrations"
    exit 0
fi

NEWEST_EXISTING_MIGRATION_FILE=$(git ls-tree -r origin/$BASE_BRANCH --name-only | grep db/migrations/ | tail -1)

if [[ -z $NEWEST_EXISTING_MIGRATION_FILE ]]; then
    echo "no existing migrations"
    exit 0
fi

echo "oldest new migration $OLDEST_NEW_MIGRATION_FILE"
echo "newest existing migration $NEWEST_EXISTING_MIGRATION_FILE"

EXISTING_TIMESTAMP="$(basename $NEWEST_EXISTING_MIGRATION_FILE | cut -d '_' -f 1)"

NEW_TIMESTAMP="$(basename $OLDEST_NEW_MIGRATION_FILE | cut -d '_' -f 1)"

if [[ $EXISTING_TIMESTAMP -ge $NEW_TIMESTAMP ]]; then
    echo "existing migration timestamp is greater than or equal to incoming migration timestamp. please update your migrations timestamp."
    exit 1
fi

echo "new migration(s) are safe to merge"
exit 0

Running this as a Github action provides a good feedback loop and seems like a decent hack until a solution is settled on and implemented.

lzap · 2022-05-06T05:29:45Z

Just an idea - how about checking the filenames before loading them for processing?

package db

import (
	"embed"
	"log"
	"strconv"

	migrate "github.com/golang-migrate/migrate/v4"
	_ "github.com/golang-migrate/migrate/v4/database/postgres"
	"github.com/golang-migrate/migrate/v4/source/iofs"
)

//go:embed migrations
var fs embed.FS

func Migrate() {
	d, err := iofs.New(fs, "migrations")
        // ...
}

// Checks that migration files are in proper format and index has no gaps or reused numbers.
func init() {
	dir, err := fs.ReadDir("migrations")
	if err != nil {
		log.Fatal("Unable to open migrations embedded directory")
	}
	if len(dir)%2 != 0 {
		log.Fatal("Migration files must be even")
	}
	// count migration prefixes
	checks := make([]int, len(dir)/2)
	for _, de := range dir {
		ix, err := strconv.Atoi(de.Name()[:5])
		if err != nil {
			log.Fatalf("Migration %s does not start with an integer?", de.Name())
		}
		if ix-1 > len(checks)-1 {
			log.Fatalf("Is there a gap in migration numbers? Number %d is way too high", ix)
		}
		checks[ix-1]++
	}
	// check expected result
	for i, x := range checks {
		if x != 2 {
			log.Fatalf("There are not exactly two migration files with index %05d, found: %d", i+1, x)
		}
	}
}

amenowanna · 2022-09-28T20:43:14Z

A migration (for PostgresSQL) to track history of applied migrations.
BEGIN;

SET ROLE 'postgres';

CREATE TABLE schema_migrations_history (
    id SERIAL PRIMARY KEY NOT NULL,
    version BIGINT NOT NULL,
    applied_at timestamptz NOT NULL DEFAULT NOW()
);

CREATE OR REPLACE FUNCTION track_applied_migration()
RETURNS TRIGGER AS $$
DECLARE _current_version integer;
BEGIN
    SELECT COALESCE(MAX(version),0) FROM schema_migrations_history INTO _current_version;
    IF new.dirty = 'f' AND new.version > _current_version THEN
        INSERT INTO schema_migrations_history(version) VALUES (new.version);
    END IF;
    RETURN NEW;
END;
$$ language 'plpgsql';

-- TRIGGER
CREATE TRIGGER track_applied_migrations AFTER INSERT ON schema_migrations FOR EACH ROW EXECUTE PROCEDURE track_applied_migration();

COMMIT;
This adds a new row to table schema_migrations_history when a row, with a greater version number, is inserted to table schema_migrations. It solves our use case where we use sequence numbers as versions instead of epoch time. Our migrations are also tested as part of our CI/CD pipeline where we can spot and prevent adding migrations with conflicting (or smaller) version numbers that would result in unapplied migrations.

Thank you for sharing this solution this is what we chose to do to address this gap in the tool. Elegant and reverse compatible with the way the tool currently works. Would encourage that a similar solution is added to the tool to address all of the concerns on this issue. Similarly as others have noted here this seems like a big gap when compared to other migration tools I have used.

Adding a Table, Function, and Trigger to track the history of our schema migrations. Currently the `schema_migrations` table only stores the latest migration so it is difficult to track what migrations were applied and when and this is an issue when troubleshooting failed or missed migrations. The solution was found in an [issue](golang-migrate/migrate#179 (comment)) for our migration tool Resolves #330 Example output of the table on a clean database where all migrations ran ![image](https://user-images.githubusercontent.com/10135546/192883450-a90ec30b-0db3-45b5-9ac9-42f2259e8dfb.png)

This was referenced Aug 8, 2019

协同开发执行up 的时候发现 no change #259

Closed

Add getting started docs and Postgres tutorial #258

Merged

dhui mentioned this issue Aug 30, 2019

Support non-linear migrations #278

Open

dhui mentioned this issue Jan 3, 2020

checksum migrations (repeatable migrations) #310

Open

dhui mentioned this issue Jun 19, 2020

schema_migrations table only ever contains a single record for MySQL driver, docker image #405

Closed

dhui mentioned this issue Sep 9, 2020

Flatten Schemas #438

Open

dhui mentioned this issue Nov 8, 2020

[PROPOSAL] Use manifest file to track migration sources #470

Open

dhui mentioned this issue Feb 25, 2021

Tracking which migrations have been applied in the schema_migrations table #510

Open

AnatolyRugalev mentioned this issue Jun 21, 2021

Migrations sequential consistency problem #587

Closed

dillonstreator mentioned this issue Jun 7, 2022

Support for adding applied migration in schema_migrations table #679

Open

amenowanna mentioned this issue Sep 28, 2022

feat: adding schema migrations history table mergestat/mergestat#333

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

History of applied migrations #179

History of applied migrations #179

jabbors commented Feb 26, 2019

dhui commented Feb 26, 2019

jabbors commented Feb 27, 2019

dhui commented Feb 27, 2019

jabbors commented Feb 28, 2019

deuill commented Mar 18, 2019 •

edited

Loading

dhui commented Mar 23, 2019

OscarHanzely commented Apr 2, 2019 •

edited

Loading

dahu33 commented Apr 4, 2019 •

edited

Loading

jabbors commented Apr 4, 2019

OscarHanzely commented Apr 4, 2019

dhui commented Aug 16, 2019

dahu33 commented Aug 30, 2019

tsauvajon commented Oct 22, 2019

nazar-pc commented Nov 29, 2019

NickOttrando commented Jun 19, 2020

dhui commented Nov 8, 2020

jabbors commented Nov 8, 2020

amanangira commented Sep 29, 2021

jabbors commented Sep 30, 2021

jabbors commented Sep 30, 2021

amanangira commented Sep 30, 2021

dillonstreator commented Oct 19, 2021

lzap commented May 6, 2022

amenowanna commented Sep 28, 2022

History of applied migrations #179

History of applied migrations #179

Comments

jabbors commented Feb 26, 2019

dhui commented Feb 26, 2019

jabbors commented Feb 27, 2019

dhui commented Feb 27, 2019

jabbors commented Feb 28, 2019

deuill commented Mar 18, 2019 • edited Loading

dhui commented Mar 23, 2019

Goals for any solution (ordered by importance):

Nice to have for solutions:

Current solutions:

OscarHanzely commented Apr 2, 2019 • edited Loading

dahu33 commented Apr 4, 2019 • edited Loading

jabbors commented Apr 4, 2019

OscarHanzely commented Apr 4, 2019

dhui commented Aug 16, 2019

dahu33 commented Aug 30, 2019

tsauvajon commented Oct 22, 2019

nazar-pc commented Nov 29, 2019

NickOttrando commented Jun 19, 2020

dhui commented Nov 8, 2020

jabbors commented Nov 8, 2020

amanangira commented Sep 29, 2021

jabbors commented Sep 30, 2021

jabbors commented Sep 30, 2021

amanangira commented Sep 30, 2021

dillonstreator commented Oct 19, 2021

lzap commented May 6, 2022

amenowanna commented Sep 28, 2022

deuill commented Mar 18, 2019 •

edited

Loading

OscarHanzely commented Apr 2, 2019 •

edited

Loading

dahu33 commented Apr 4, 2019 •

edited

Loading