-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Squash migrations #2174
Comments
@rowanmiller how will squash work? Merge Migrations into one or simply place them in a single file. I think merging will be problematic when a target database has partial migrations but not all from a squash. |
@popcatalin81 I suspect at first, it will simpy concatenate all the operations together into one migration. In the future, it may try and simplify the operations (e.g. renaming A -> B -> C will become just A -> C) Correct, "rewriting history" is always a bad idea. Before squashing, you'll have to revert all the migrations you want to squash, squash them, then re-apply the new one. You shouldn't do it if the migrations have been applied on any database other than your local one. This operation would be useful while developing a new feature. You could add all the migrations locally you want, but before merging your feature, you could squash them all down into a single migration. |
👍 For this idea |
I just wanted to suggest that idea also.. the migrations folder gets quite large quite fast if the projects develops over time |
I was wondering if removing them all and creating an "initial migration" would be a better approach. In the end this is what I did recently. Of course the "initial migration" should be executed only on database creation. This will not only reduce the number of files in the project but it will also speed up the initial database creation if you recreate it multiple times e.g. for development and testing purposes. What do you guys think? @rowanmiller @bricelam |
I'm puzzled that this issue is so inactive. How are others solving the issue of the ever growing Migrations folder? Could there be at least some best practice described in the documentation @AndriySvyryd ? |
@pgrm I briefly mentioned a strategy in the Migrations docs I'm adding... |
This is a common problem we run into every once in a while. It's quite simple to accomplish. If your database is already up to date just delete all the migration files and truncate the dbo.__EFMigrationsHistory table. Generate a new initial create migration and you have now squashed all your migrations. You lose any comments but that's minor if you're needing to do it. |
@replaysMike It will lost my custom migration operations. (For example, I set a custom default value for a new field.) |
@PMExtra that’s surprising since you’re basically creating a migration based on the current state of the database. Is the default value being applied at the db level, or code level when the entity is created? |
Migrations already lock you into a db and db engine, as of right now you already have to create multiple migrations for each database engine you intend to support, ideally in different assemblies |
@ajcvickers Is there any opportunity where someone could potentially provide ideas/ submit PR's for this functionality? I wouldn't mind beginning to look into this... It's a massive problem we've run into time and time again, and wouldn't mind contributing some time and doing some research, if I thought it was something that might be evaluated for inclusion |
@ronnyek Yes, we're open to ideas. In particular, we are open radical ideas that don't involve this feature being implemented, since it is difficult to do right. For example, the idea @titobf posted is interesting. Things to consider:
We were talking about this on the team last week, so it was interesting to see @titobf's idea. The team is unlikely to have time to spend a lot of time in the near future, but we will pay attention to ideas posted here and give feedback. You may need to be patient; we are more busy than usual this year. |
So we already have a process scripted to do this squash, but it feels super hacky. I've included the full script below, but the nastiest points are:
Now that last part is probably solved by This Issue, or more generally by not customizing anything in the migration that isn't strictly related to migrating data from A to B. Also, I can't really imagine any general solution that could possibly solve that without just jamming all the migrations into one file. For the other two points, I think these could potentially be solved by having some special handling around initial migrations. Honestly, I'm not entirely sure how this would work but potentially something like:
I don't actually think that last part would work, because the snapshot is the state after that migration is applied, but perhaps someone smarter than me can figure out some way to make it work? I've only just realised it doesn't work, otherwise I maybe wouldn't have bothered typing those thoughts up... Anyhow, that feels like it might be nice because it lets us randomly delete some series of early migrations, and things will 'just work', but you guys don't have to offer a guarantee that customized scripts will be handled in any way. Messy migration squash script, to illustrate the hoops we're jumping through!Note, this is less opinionated version of our actual script, so this might not work properly. Hopefully it illustrates the broad strokes anyway. Function First-Migration {
Get-ChildItem $migrationFolder | Sort Basename | Select -First 1 -ExpandProperty Basename
}
Function Get-Class {
Param ($migration)
$migration -replace '^[^_]*', ''
}
Function Rewrite-Migration {
Param ($suffix)
$source = "$migrationFolder\$newMigration$suffix"
$content = (Get-Content -Raw $source) `
-replace $newMigration,$initialMigration `
-replace (Get-Class $newMigration),(Get-Class $initialMigration) `
-replace "(\s+}\s+protected override void Down)","$footer`$1" `
Remove-Item $source
return $content
}
Function Save-Migration {
Param ($content, $suffix)
$target = "$migrationFolder\$initialMigration$suffix"
$content | Set-Content $target
}
Function Remove-IfExists {
Param ($filesToRemove)
ForEach ($f in $filesToRemove) {
If (Test-Path $f) {
Remove-Item $f
}
}
}
# 4 week sprints means that this covers two 28 day sprints.
# Technically we should only need one sprint, but this gives some contingency.
$days = 60
# Triggers cannot be managed in EF core yet, so we need to add these to the migration 'manually'
$footer = @"
migrationBuilder.Sql("CREATE OR ALTER TRIGGER MyTrigger ...");
"@
$migrationFolder = ".\MyProject\Migrations"
# Create a new branch for the job, updated to the latest main
git checkout main
git pull
git checkout -B "$jobNumber-squash-migrations"
# Get the name of the first migration file
$initialMigration = First-Migration
# Reset branch to the first migration created more than $days ago
$hash = git log --before="$days days ago" -1 --pretty=format:"%h" $migrationFolder
git reset --hard $hash
# Make a note of all the migrations which existed at this point, then delete them
# We want to delete the snapshot now, but ensure that we don't delete it a second time later on
$migrationsPattern = "$migrationFolder\*.cs"
$migrationsToDelete = Get-ChildItem $migrationsPattern -Exclude "*ContextModelSnapshot.cs"
Remove-Item $migrationsPattern
# Create a new initial migration
dotnet ef migrations add "$($jobNumber)_Squash"
# Get the name of the newly added migration
$newMigration = First-Migration
# Read out the new migration, make any replacements, then store in a varable
$rewrittenMigration = Rewrite-Migration ".cs"
$rewrittenDesigner = Rewrite-Migration ".Designer.cs"
# Reset back to main, and reapply the changes
git reset --hard main
Remove-IfExists $migrationsToDelete
Save-Migration $rewrittenMigration ".cs"
Save-Migration $rewrittenDesigner ".Designer.cs"
# Commit changes
git add .
git commit -m "Ref #$($jobNumber): Squash $stack migrations" |
@TomGathercole I'm somewhat curious as to what role the git stuff here is providing. Is this simply as a mechanism to rollback migrations to an earlier state? Is this just some automation you are doing to auto collapse migrations every It seems to me like if you have a sort of order all the migrations were executed (whether that was by .cs file name) you could provide the last legitimate migration step, and collapse everything between that one. Also seems like application of the migration may be weird since the db would have record of migrations that have been applied, but code/efcore cli tools wouldn't have anyway to tell what those migrations were. |
@ronnyek The purpose of the git stuff is to get to a point in history where the model generated from code matches some given 'initial migration' state. The main criteria is that all migration which make up this must have been applied to all environments which will ever be migrated going forward. There's definitely a fair bit of weirdness here. So far we haven't had any problems with it, since the__EFMigrationsHistory doesn't contain muchh aside from the names of the migrations. It would be nice to have some official-ish solution so that what we're doing (or something that achieves the same benefits) could be truly safe to rely on. I think you might be on to something in regards to collapsing migrations. We could take the model in the .Designer.cs file for some last-known-applied-everywhere migration and re-generate the code based migration using this model. I don't think there's anything in the CLI tooling to do this, but there's probably something in the underlying library to generate part of a |
Y'all are over thinking this... the problem statement is that on a real world project that spans years the migrations folder can get rather large and us poor dev folks end up manually cleaning up the folder by moving files to an archive folder. Any new dev often needs to get a DB backup and start from there... We need to be able to generate an initial migration from which a dev can use this to build up a full schema based off the current model. We need a command e.g. Add-Initial-Migration which runs the migrations and intentionally runs the model compare against an 'empty' schema, it then generates a migration class which could be run to generate the full schema upto that point. If the __EFMigrationsHistory table already contains migrations it will never run on update-database command, if for example you are a new dev and you run update database command and there are no existing __EFMigrationsHistory records or table it can run and generate the schema in the database from that migration class. Sounds easy peasy... ;) Other option is some poor soul goes through all the migrations and literally combine all the ups, and all the downs into one migration :) Ohh couldn't we just get a code generator to do this mmmmm |
This is essentially what we're doing, but the trouble is that there's no safe time to just delete all migrations and start again. At any given time, we'll usually have ~10 migrations pending deployment in a 2-weekly release which we cannot squash until they're deployed to every production environment. I would absolutely be in favour of any solution that totally drops any manual migration tweaks though - I think that's going to be far too hard to deal with in a way that provides any real benefits. |
This would be helpful when doing development work. I may be developing against a clone of staging, and making changes that generate migrations. I may make these changes incrementally, and running the application necessitates that I apply a migration for the changes made. The result is that while in development I may generate multiple migrations, and need to squash them all down to 1 by the end to clear the dev clutter. This is a tedious, error prone, process when done manually. An automated way to squash multiple migrations into 1 would be a nice QoL tool. Idea:Understandably, there are challenges here. Such as "How do you preserve the ordering of custom commands & hand-crafted queries that are part of a migration"?. How about we sidestep those problems with something like an "aggregate migration"? A migration that is the sum of multiple migrations, maintaining the separate, and ordered, SQL commands that multiple migrations would create. With one final This doesn't aim to solve the "We have too many migrations" problem, but rather the "We only need 1 migration for this release, and we have 20" problem. Which, incidentally, may help alleviate the former. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
For anyone that's interested, I made a .NET tool that will squash everything (read, aggregate all your existing migrations) into your initial migration file. May still require some manual resolutions, but should still save a bunch of manual work. |
@pdevito3 you should get that added to the EF docs... |
The tool does not work for me due to not handling ef core having a dependency on the target state of each migration. See the issue I created here: pdevito3/StewardEF#1 I'm not sure if this is a database specific issues or not. I would assume not. I recommended another approach in the issue. |
Fwiw, this approach has already worked on multiple projects and I call out that manual resolutions may still be needed based on your migration history (like in this case). It's no risk to try it out for like 5 min and see how it does for you (slow build times really add up!) so I wouldn't want people to be discouraged by this. |
For those who don't run migrations as part of app startup, you could just exclude /Migrations on CI (or even local) builds. For our particular setup this worked out quite well. |
using which approach? |
Sorry I haven't read the entire thread so this might be mentioned previously, what I mean by that is simply removing it from the compilation using |
This is true but when interacting with migrations the build still slows down significantly. This also slows down packaging the migrations to be deployed. That's why we are still looking into ways to squash migrations. |
Rather than focusing on squashing existing migrations, IMO using a different migrations CLI workflow instead of the current I like to think of migrations and snapshots in terms of a "published snapshot" which reflects the main database schema, and a "unpublished migration" which is generated relative to this snapshot. For example, each time you change the data model, you could either,
Maybe you could choose between (this will probably work only if you do not run migrations on app startup which is not that uncommon) |
Maybe it should be more attention to #18620? The core issue is that model snapshots take a lot of time and resources to compile. So I suppose that most of participants would come to terms with the number of migrations provided that migrations consume reasonable CPU and RAM to compile. |
It would be good to have the ability to squash several migrations into a single file to help reduce the number of files in a project.
We probably want to keep track of the original list of migration names so that we can reason about this when targeting an existing database that the original migrations were applied to in their un-squashed form.
The text was updated successfully, but these errors were encountered: