Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Squash migrations #2174

Open
Tracked by #22945 ...
rowanmiller opened this issue May 12, 2015 · 111 comments
Open
Tracked by #22945 ...

Squash migrations #2174

rowanmiller opened this issue May 12, 2015 · 111 comments

Comments

@rowanmiller
Copy link
Contributor

It would be good to have the ability to squash several migrations into a single file to help reduce the number of files in a project.

We probably want to keep track of the original list of migration names so that we can reason about this when targeting an existing database that the original migrations were applied to in their un-squashed form.

@popcatalin81
Copy link

@rowanmiller how will squash work? Merge Migrations into one or simply place them in a single file.

I think merging will be problematic when a target database has partial migrations but not all from a squash.

@bricelam
Copy link
Contributor

bricelam commented May 14, 2015

@popcatalin81 I suspect at first, it will simpy concatenate all the operations together into one migration. In the future, it may try and simplify the operations (e.g. renaming A -> B -> C will become just A -> C)

Correct, "rewriting history" is always a bad idea. Before squashing, you'll have to revert all the migrations you want to squash, squash them, then re-apply the new one. You shouldn't do it if the migrations have been applied on any database other than your local one.

This operation would be useful while developing a new feature. You could add all the migrations locally you want, but before merging your feature, you could squash them all down into a single migration.

@TheMadKow
Copy link

👍 For this idea

@markusvt
Copy link

markusvt commented Feb 3, 2016

I just wanted to suggest that idea also.. the migrations folder gets quite large quite fast if the projects develops over time

@bricelam bricelam added the tools label Jan 13, 2017
@bricelam bricelam removed the tools label Apr 26, 2017
@dario-hd
Copy link

dario-hd commented May 3, 2017

I was wondering if removing them all and creating an "initial migration" would be a better approach. In the end this is what I did recently. Of course the "initial migration" should be executed only on database creation. This will not only reduce the number of files in the project but it will also speed up the initial database creation if you recreate it multiple times e.g. for development and testing purposes. What do you guys think? @rowanmiller @bricelam

@pgrm
Copy link

pgrm commented Nov 15, 2017

I'm puzzled that this issue is so inactive. How are others solving the issue of the ever growing Migrations folder?

Could there be at least some best practice described in the documentation @AndriySvyryd ?

@bricelam
Copy link
Contributor

bricelam commented Nov 15, 2017

@pgrm I briefly mentioned a strategy in the Migrations docs I'm adding...

@replaysMike
Copy link

This is a common problem we run into every once in a while. It's quite simple to accomplish. If your database is already up to date just delete all the migration files and truncate the dbo.__EFMigrationsHistory table. Generate a new initial create migration and you have now squashed all your migrations. You lose any comments but that's minor if you're needing to do it.

@PMExtra
Copy link

PMExtra commented Apr 12, 2019

@replaysMike It will lost my custom migration operations. (For example, I set a custom default value for a new field.)

@replaysMike
Copy link

@PMExtra that’s surprising since you’re basically creating a migration based on the current state of the database. Is the default value being applied at the db level, or code level when the entity is created?

@TsengSR
Copy link

TsengSR commented Apr 9, 2024

The disadvantage is that you lock on the db engine and compatibility level, but it would be optional.

Migrations already lock you into a db and db engine, as of right now you already have to create multiple migrations for each database engine you intend to support, ideally in different assemblies

@ronnyek
Copy link

ronnyek commented Apr 9, 2024

@ajcvickers Is there any opportunity where someone could potentially provide ideas/ submit PR's for this functionality? I wouldn't mind beginning to look into this... It's a massive problem we've run into time and time again, and wouldn't mind contributing some time and doing some research, if I thought it was something that might be evaluated for inclusion

@ajcvickers
Copy link
Contributor

@ronnyek Yes, we're open to ideas. In particular, we are open radical ideas that don't involve this feature being implemented, since it is difficult to do right. For example, the idea @titobf posted is interesting.

Things to consider:

  • This could be something that deals with existing migrations, but this may be difficult because custom migrations are difficult to handle in an automated way.
  • Instead, we could implement a new flavor of migrations which doesn't leave a lot of artifacts in its wake.
    • This system might have limitations. For example, it might be up-only.
    • This system should not have a backing model for each migration. This means that a migration cannot depend on the current model, which is relatively common today.

We were talking about this on the team last week, so it was interesting to see @titobf's idea.

The team is unlikely to have time to spend a lot of time in the near future, but we will pay attention to ideas posted here and give feedback. You may need to be patient; we are more busy than usual this year.

@ronnyek
Copy link

ronnyek commented Apr 10, 2024

Forgive me for perhaps missing something... but validating my understanding here. (not talking custom migration steps here)
Make changes to context/entities
Create Migration (creates migration cs files for up/down)
Apply Migration (runs up(), and updates __EFMigrationHistory with the name of the latest applied migration)
image

My particular needs are basically collapsing many migrations into a release migration for a specific version of our software. While we'd potentially want to be able to roll back to the last version, we may not need granular rollback for everything that happened before last version was released.

Seems like you could effectively have a way of marking a particular migration as a snapshot, and then say when you want to publish next version of your app... some way of basically removing all the migrations back to last snapshot (code, and efmigrationhistory... not current schema/data), and regenerating the migration from current model until now also marking the current migration as a snapshot.

Am I over simplifying?

I'm not sure how crazy custom migrations would need to work. So far our needs have been that we need to seed stuff from time to time, but never had the need (YET) to do things like an actual serious data migration where data gets moved to other tables, or updated with other values or anything.

@TomGathercole
Copy link

@ronnyek Yes, we're open to ideas. In particular, we are open radical ideas that don't involve this feature being implemented, since it is difficult to do right. For example, the idea @titobf posted is interesting.

Things to consider:

* This could be something that deals with existing migrations, but this may be difficult because custom migrations are difficult to handle in an automated way.

* Instead, we could implement a new flavor of migrations which doesn't leave a lot of artifacts in its wake.
  
  * This system might have limitations. For example, it might be up-only.
  * This system should not have a backing model for each migration. This means that a migration cannot depend on the current model, which is relatively common today.

We were talking about this on the team last week, so it was interesting to see @titobf's idea.

The team is unlikely to have time to spend a lot of time in the near future, but we will pay attention to ideas posted here and give feedback. You may need to be patient; we are more busy than usual this year.

So we already have a process scripted to do this squash, but it feels super hacky. I've included the full script below, but the nastiest points are:

  1. We have to time travel with git to generate a new snapshot from a particular point in time.
  2. We have to do string manipulation to make a new migration look like the old InitialMigration so that the tooling doesn't try to re-apply it.
  3. We have to do more string manipulation to add a trigger to the initial migration.

Now that last part is probably solved by This Issue, or more generally by not customizing anything in the migration that isn't strictly related to migrating data from A to B. Also, I can't really imagine any general solution that could possibly solve that without just jamming all the migrations into one file.

For the other two points, I think these could potentially be solved by having some special handling around initial migrations. Honestly, I'm not entirely sure how this would work but potentially something like:

  • Have some project level flag to opt into auto-generating an initial migration.
  • This would always be generated with the same name (e.g. 00000000000000_Init).
  • If this migration is found in the history (or maybe if the __EFMigrationsHistory table exists full stop), then do nothing.
  • Otherwise:
    • Find the first migration sequentially
    • Generate a 'default' migration based on the model snapshot of this migration

I don't actually think that last part would work, because the snapshot is the state after that migration is applied, but perhaps someone smarter than me can figure out some way to make it work? I've only just realised it doesn't work, otherwise I maybe wouldn't have bothered typing those thoughts up...

Anyhow, that feels like it might be nice because it lets us randomly delete some series of early migrations, and things will 'just work', but you guys don't have to offer a guarantee that customized scripts will be handled in any way.

Messy migration squash script, to illustrate the hoops we're jumping through!

Note, this is less opinionated version of our actual script, so this might not work properly. Hopefully it illustrates the broad strokes anyway.

Function First-Migration {
    Get-ChildItem $migrationFolder | Sort Basename | Select -First 1 -ExpandProperty Basename
}

Function Get-Class {    
    Param ($migration)
    $migration -replace '^[^_]*', ''
}

Function Rewrite-Migration {
    Param ($suffix)
    
	$source = "$migrationFolder\$newMigration$suffix"
	 
    $content = (Get-Content -Raw $source) `
        -replace $newMigration,$initialMigration `
        -replace (Get-Class $newMigration),(Get-Class $initialMigration) `
        -replace "(\s+}\s+protected override void Down)","$footer`$1" `
		
	Remove-Item $source
	
	return $content
}

Function Save-Migration {
    Param ($content, $suffix)

	$target = "$migrationFolder\$initialMigration$suffix"
    $content | Set-Content $target
}

Function Remove-IfExists {
	Param ($filesToRemove)

	ForEach ($f in $filesToRemove) {
		If (Test-Path $f) {
			Remove-Item $f
		}
	}
}

# 4 week sprints means that this covers two 28 day sprints.
# Technically we should only need one sprint, but this gives some contingency.
$days = 60

# Triggers cannot be managed in EF core yet, so we need to add these to the migration 'manually'
$footer = @" 
	migrationBuilder.Sql("CREATE OR ALTER TRIGGER MyTrigger ...");
"@

$migrationFolder = ".\MyProject\Migrations"

# Create a new branch for the job, updated to the latest main
git checkout main
git pull
git checkout -B "$jobNumber-squash-migrations"

# Get the name of the first migration file
$initialMigration = First-Migration

# Reset branch to the first migration created more than $days ago
$hash = git log --before="$days days ago" -1 --pretty=format:"%h" $migrationFolder
git reset --hard $hash

# Make a note of all the migrations which existed at this point, then delete them
# We want to delete the snapshot now, but ensure that we don't delete it a second time later on
$migrationsPattern = "$migrationFolder\*.cs"
$migrationsToDelete = Get-ChildItem $migrationsPattern -Exclude "*ContextModelSnapshot.cs"
Remove-Item $migrationsPattern

# Create a new initial migration
dotnet ef migrations add "$($jobNumber)_Squash"

# Get the name of the newly added migration
$newMigration = First-Migration

# Read out the new migration, make any replacements, then store in a varable
$rewrittenMigration = Rewrite-Migration ".cs"
$rewrittenDesigner = Rewrite-Migration ".Designer.cs"

# Reset back to main, and reapply the changes
git reset --hard main

Remove-IfExists $migrationsToDelete
Save-Migration $rewrittenMigration ".cs"
Save-Migration $rewrittenDesigner ".Designer.cs"

# Commit changes
git add .
git commit -m "Ref #$($jobNumber): Squash $stack migrations"

@ronnyek
Copy link

ronnyek commented Apr 11, 2024

@TomGathercole I'm somewhat curious as to what role the git stuff here is providing. Is this simply as a mechanism to rollback migrations to an earlier state? Is this just some automation you are doing to auto collapse migrations every <time interval> ?

It seems to me like if you have a sort of order all the migrations were executed (whether that was by .cs file name) you could provide the last legitimate migration step, and collapse everything between that one.

Also seems like application of the migration may be weird since the db would have record of migrations that have been applied, but code/efcore cli tools wouldn't have anyway to tell what those migrations were.

@TomGathercole
Copy link

@ronnyek The purpose of the git stuff is to get to a point in history where the model generated from code matches some given 'initial migration' state. The main criteria is that all migration which make up this must have been applied to all environments which will ever be migrated going forward.

There's definitely a fair bit of weirdness here. So far we haven't had any problems with it, since the__EFMigrationsHistory doesn't contain muchh aside from the names of the migrations. It would be nice to have some official-ish solution so that what we're doing (or something that achieves the same benefits) could be truly safe to rely on.

I think you might be on to something in regards to collapsing migrations. We could take the model in the .Designer.cs file for some last-known-applied-everywhere migration and re-generate the code based migration using this model. I don't think there's anything in the CLI tooling to do this, but there's probably something in the underlying library to generate part of a ScaffoldedMigration from an IRelationalModel

@deanvr
Copy link

deanvr commented Apr 25, 2024

Y'all are over thinking this... the problem statement is that on a real world project that spans years the migrations folder can get rather large and us poor dev folks end up manually cleaning up the folder by moving files to an archive folder. Any new dev often needs to get a DB backup and start from there... We need to be able to generate an initial migration from which a dev can use this to build up a full schema based off the current model.

We need a command e.g. Add-Initial-Migration which runs the migrations and intentionally runs the model compare against an 'empty' schema, it then generates a migration class which could be run to generate the full schema upto that point. If the __EFMigrationsHistory table already contains migrations it will never run on update-database command, if for example you are a new dev and you run update database command and there are no existing __EFMigrationsHistory records or table it can run and generate the schema in the database from that migration class.

Sounds easy peasy... ;)

Other option is some poor soul goes through all the migrations and literally combine all the ups, and all the downs into one migration :) Ohh couldn't we just get a code generator to do this mmmmm

@TomGathercole
Copy link

Y'all are over thinking this... the problem statement is that on a real world project that spans years the migrations folder can get rather large and us poor dev folks end up manually cleaning up the folder by moving files to an archive folder. Any new dev often needs to get a DB backup and start from there... We need to be able to generate an initial migration from which a dev can use this to build up a full schema based off the current model.

We need a command e.g. Add-Initial-Migration which runs the migrations and intentionally runs the model compare against an 'empty' schema, it then generates a migration class which could be run to generate the full schema upto that point. If the __EFMigrationsHistory table already contains migrations it will never run on update-database command, if for example you are a new dev and you run update database command and there are no existing __EFMigrationsHistory records or table it can run and generate the schema in the database from that migration class.

Sounds easy peasy... ;)

Other option is some poor soul goes through all the migrations and literally combine all the ups, and all the downs into one migration :) Ohh couldn't we just get a code generator to do this mmmmm

This is essentially what we're doing, but the trouble is that there's no safe time to just delete all migrations and start again. At any given time, we'll usually have ~10 migrations pending deployment in a 2-weekly release which we cannot squash until they're deployed to every production environment.

I would absolutely be in favour of any solution that totally drops any manual migration tweaks though - I think that's going to be far too hard to deal with in a way that provides any real benefits.

@douglasg14b
Copy link

douglasg14b commented Jun 9, 2024

This would be helpful when doing development work.

I may be developing against a clone of staging, and making changes that generate migrations. I may make these changes incrementally, and running the application necessitates that I apply a migration for the changes made. The result is that while in development I may generate multiple migrations, and need to squash them all down to 1 by the end to clear the dev clutter.

This is a tedious, error prone, process when done manually. An automated way to squash multiple migrations into 1 would be a nice QoL tool.


Idea:

Understandably, there are challenges here. Such as "How do you preserve the ordering of custom commands & hand-crafted queries that are part of a migration"?.

How about we sidestep those problems with something like an "aggregate migration"? A migration that is the sum of multiple migrations, maintaining the separate, and ordered, SQL commands that multiple migrations would create. With one final up state and one final down state? Preserving the nuances of the individual SQL commands within each "child", but with only one snapshot that needs to be stored.

This doesn't aim to solve the "We have too many migrations" problem, but rather the "We only need 1 migration for this release, and we have 20" problem. Which, incidentally, may help alleviate the former.

@diegopalat

This comment has been minimized.

@roji

This comment has been minimized.

@pdevito3
Copy link

pdevito3 commented Nov 11, 2024

For anyone that's interested, I made a .NET tool that will squash everything (read, aggregate all your existing migrations) into your initial migration file. May still require some manual resolutions, but should still save a bunch of manual work.

@ErikEJ
Copy link
Contributor

ErikEJ commented Nov 12, 2024

@pdevito3 you should get that added to the EF docs...

@jasekiw
Copy link

jasekiw commented Nov 13, 2024

The tool does not work for me due to not handling ef core having a dependency on the target state of each migration. See the issue I created here: pdevito3/StewardEF#1

I'm not sure if this is a database specific issues or not. I would assume not. I recommended another approach in the issue.

@pdevito3
Copy link

The tool does not work for me due to not handling ef core having a dependency on the target state of each migration. See the issue I created here: pdevito3/StewardEF#1

I'm not sure if this is a database specific issues or not. I would assume not. I recommended another approach in the issue.

Fwiw, this approach has already worked on multiple projects and I call out that manual resolutions may still be needed based on your migration history (like in this case).

It's no risk to try it out for like 5 min and see how it does for you (slow build times really add up!) so I wouldn't want people to be discouraged by this.

@alrz
Copy link
Member

alrz commented Nov 13, 2024

For those who don't run migrations as part of app startup, you could just exclude /Migrations on CI (or even local) builds. For our particular setup this worked out quite well.

@diegopalat
Copy link

For those who don't run migrations as part of app startup, you could just exclude /Migrations on CI (or even local) builds. For our particular setup this worked out quite well.

using which approach?

@alrz
Copy link
Member

alrz commented Nov 13, 2024

Sorry I haven't read the entire thread so this might be mentioned previously, what I mean by that is simply removing it from the compilation using <Compile Remove="..."/>, that would address the build duration issue at the very least.

@jasekiw
Copy link

jasekiw commented Nov 14, 2024

For those who don't run migrations as part of app startup, you could just exclude /Migrations on CI (or even local) builds. For our particular setup this worked out quite well.

This is true but when interacting with migrations the build still slows down significantly. This also slows down packaging the migrations to be deployed. That's why we are still looking into ways to squash migrations.

@alrz
Copy link
Member

alrz commented Nov 14, 2024

This is true but when interacting with migrations the build still slows down significantly. This also slows down packaging the migrations to be deployed. That's why we are still looking into ways to squash migrations.

Rather than focusing on squashing existing migrations, IMO using a different migrations CLI workflow instead of the current add command can also accomplish the same end results.

I like to think of migrations and snapshots in terms of a "published snapshot" which reflects the main database schema, and a "unpublished migration" which is generated relative to this snapshot.

For example, each time you change the data model, you could either,

  • create a migration without updating the snapshot,
  • amend the last migration, again without updating the snapshot (using an idempotent script to reapply it), or
  • publish the migration, which updates the snapshot and can optionally clean up migrations created so far.

Maybe you could choose between create and amend depending on how many "checkpoints" you need in the process but still have the option to get rid of already-applied migrations in the source that you don't plan to rollback to.

(this will probably work only if you do not run migrations on app startup which is not that uncommon)

@achikhv
Copy link

achikhv commented Nov 19, 2024

Maybe it should be more attention to #18620? The core issue is that model snapshots take a lot of time and resources to compile. So I suppose that most of participants would come to terms with the number of migrations provided that migrations consume reasonable CPU and RAM to compile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests