-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking which migrations have been applied in the schema_migrations table #510
Comments
To expand on this, we've checked the proposal that was made on #470, but this will not fix our use case, as a file is local to a given system, and this will not support clustered scenarios with multiple servers using the same database. The idea we have it basically to reuse the existing migrations table without changing it. Instead of tracking only the last version applied, we would have one row per applied migration, and This should make easy to maintain backwards-compatibility for several drivers, as in the case of relational databases like mysql or postgres, migrating from the old system is just a matter of reading the last migration applied, and marking all migrations up to that point as applied creating a row for each. |
Thanks for reaching out! Let's talk about the manifest file in #470
🎉 Any help is greatly appreciated!! Let's settle on a design before starting any implementation. |
@dhui thanks for the quick response on this. I've brushed off on the implementation in the manifest proposal but basically, the design that we were thinking is what @mgdelacroix described:
In a nutshell, instead of using 1 row to determine which migrations have been applied, we will be writing many migration rows in the User A creates in their dev cycle a migration User B gets their PR approved and merged to master, migration is applied to prod and the schema migration has version | dirty
----------------+-------
t1 | f User A updates their branch from master without changing anything on their side, they merge their PR and they apply the migrations to prod. Now the table looks like: version | dirty
----------------+-------
t1 | f
t0 | f Every time that we want to apply migrations, we will compute the difference between the migration files from the source and what is in What needs to changeThe Driver interface needs to change in order to allow the ability to retrieve the already applied migrations, we were thinking something along the lines of functions like The interface changes will not break any compatibility. Since the proposed design is a superset of the existing implementation. Data Backwards CompatibilityThat should be the easiest step. We can add a piece of logic when someone is Existing project with the single row mechanism
Updated project with the multi-row mechanismAfter
We would ❤️ to hear your thoughts on this. |
Your example allows for dangerous behaviors. e.g. applying migrations earlier than the last successfully applied migration |
It works beautifully in ActiveRecord and liquibase. The situation you stated comes up rarely, but if they do, the migration fails in a verbose way so that you know what happened, can rollback, correct the migrations and re-run them. This type of functionality is absolutely crucial for team development. |
Rarity isn't the concern, safety is. You can't rollback dropping a column since data is lost.
What functionality are you referring to? e.g. detecting and preventing branching/reordered migrations? |
Also duplicates #65 |
@dhui I do not see this as a migrations issue TBH, let me elaborate,tthe possible permutations that i can see are the following:
Scenario 1) A merges first than B
Scenario 2) B merges first than A
Flipping X < Y the above scenarios still stand valid. In all scenarios, i think the problem is around "logical" schema development, which relies almost entirely on the team that implements the logic for their features/services. NOTE From the above analysis we can infer that migration timestamps do not affect migration validity, since the crucial component for a migrations tool is the time when a migration is applied. Am i missing any scenarios here? |
The same issue can happen when the previous migration deletes a column, and the next one wants to rename it. If it's a merge from two different branches, everyone could have tested it, and it was working as intended. As I see, it's not a migration tool's responsibility to fix this problem. The team I'm working on is having issues merging files from different branches. When the first merge brings a file with the newest timestamp, the migrations are applied perfectly. But the second merge, with a file with the lowest timestamp, is not applied. I believe the proposed solution is a common pattern applied by other migration tools/libraries in other languages. Thank you very much for the amazing tool/library, but I would love to see this feature implemented. |
@nronas I follow the scenarios you've laid out. Any fix will need to be done by developers. But why wait until integration testing to catch migration ordering issues when these issues can be caught earlier. Schema changes should always be made against the latest "version" of the schema to prevent issues and detecting and enforcing this can be done early in the software development lifecycle. You referenced Django, which also linearizes migrations.
Correct, but
Yes, but that doesn't mean it's best approach for |
IIUC, there are two distinct problems that we are trying to solve here.
The proposed solution solves #1 by tracking all applied migrations, instead of only the last one, and applying the delta, but it leaves #2 as a responsibility for the user to fix. Now if I am reading the manifest proposal correctly, is it an attempt to follow Django's model of modelling the migration history as a digraph and then throwing an error if the graph has conflicts? I am wondering if we actually need a manifest file for this. Will it be possible to store the digraph along with all the node hashes in the DB itself? I think something like that was proposed here. And then build up the graph from the filesystem, compare with what's in the DB and then the migrator can choose to do whatever it wants. It can throw an error if the graph is not linear, or apply topological sort and run migrations for a power user, or anything else for that matter. Just to clarify, I have no opinions on either solutions. Only trying to understand the problems, and the proposed solutions. |
Sort of. Django is quite different in:
But at a high level, the manifest file proposal aims to detect and prevent migration ordering issues as early in the development lifecycle as possible.
What's the purpose of storing the tree structure in the DB? The migration source is the source of truth of what changes need to be applied to the DB, so any tree structure would need to be stored in the migration source if we were to go down that path. The current manifest file proposal would prevent any branching from occurring so it's stores a "linked list" instead of a "tree" and provides tooling to "rebase" (e.g. |
Understood. So essentially, you are enforcing a linked list structure in the migration source, and whenever you detect a divergence you throw an error. This would be before even applying the migrations, so the DB can still maintain the last applied sequence number as its only state? |
@agnivade correct |
@dhui 💯 get the point/concern raised, but let me elaborate more on the rationale and background context of the proposal because i think we weren't super clear on that and we apologize for this. "Customer" analysis/ "Market" researchWe believe that a tool for managing(developing, applying, etc) migrations on a project across many team members, needs to be accurate, safe, and as frictionless as possible. We identified two different types of migrations:
Based on experience and a bit of research on issues raised on this project and others, category 1) accounts for the 99% of migration types out there. You can defo call this semi-anecdotal, but that's all we have at the moment. The proposed solution here, solves category 1) and put's the burden of fixing category 2) to the user. We definitely believe in the manifest solution, based on our understanding of the manifest proposal we are thinking of it more as a feature not as the norm, since if it's established as the norm then migrations of category 1) will become super noisy for no added value. Consider a scenario that two people are adding 2 migrations that are completely independent with each there, let's say creation of two different tables. The first migration with its digest gets merged to master and all is well. Now the other person pulls the new manifest and they get a conflict, they have to re-compute and commit, but those migrations are perfectly fine to be applied as is. So for the "canonical" use case a manifest as an enforcing function, to me, it sounds a bit too friction-full. The idea from our point of view for this proposal/effort, is that we start by solving for category 1) and then in future iteration we introduce a "dependent migrations" through a conflict detection mechanism as an optional feature, like you can annotate your migration files(with a comment or something) and the tool will build a dep-gram and perform a conflict detection. The proposal we believe fixes 99% of the cases that we know for a fact, based on issues raised by many people, without breaking compatibility and without "vendor-locking" us on an approach that we cannot easily change in the future. Also, the proposal for the remaining 1% is not reducing accuracy, it just puts the burden on the devs. In a sense is a low-hanging fruit that has a huge impact. We will love to find a common ground here, since we believe a hybrid approach of multiple-row tracking + dependent migrations feature will serve us for a long way, without introducing unnecessary friction.
Totally, but i think that wasn't in the initial implementation of Django's migrate tool, see here. The annotation on the migration file that allows the user to declare dependencies was later introduced if my memory serves me well here and is also optional. Excuse, if my context is not up to date, but is being a long time since the last time that I've worked with Django. |
Falling in that 1% (which I do believe to be too low an estimate) is sad. I used to love working with Play Framework migrations, which have always stored the SQL of the migration applied (up+down) in the database itself. This allows you to have a "tree" (as in Datastructure) of many serialised change paths in progress by different developers, while having the database correctly roll back/forward with each push to the development environment. Note that this is for the development kind of workflows, not production, where you often do not want to apply downs. Downs are mostly a thing for disaster recovery rollback and development only. The "manifest file" solution has the problem of being a state attributed to a single version. The database can be in an entirely different state (or branch of the migration tree). In order to backtrack / forward you need each applied migrations down available at all times. |
As a data point, not tracking all database migrations is a dealbreaker for our team and we will have to use another tool instead of |
the same feelings |
@abezzub what tool did your team switch to? |
@sourcec0de we are using goose and that works well enough for our use case |
Recently we started integrating our platform with
go-migrate
and we discovered that the only state tracker that the tool has is the last version applied. This can be problematic for big teams (as ours) in which two migrations will not necessary be merged in sequence (chronological order).To illustrate this, say two feature teams work on parallel with one migration each, and the team that merges first has the migration with the latter timestamp. When this migration gets applied, the other team merges theirs, and as the only check
go-migrate
is doing is on the last version stored in the database, the one with the earlier timestamp will never be applied.To fix this, we've checked other systems like Rails' Active Record, and what they do is that they track every migration that they apply. We are in a position now where we could spend some time working on a patch to add this to
go-migrate
in a backwards compatible way, so would this be something that the project will like to include and support? If so, we're happy to come out with a pull request illustrating this and discuss the approach further.We consider as well creating a CI check to avoid adding migrations that are older than the last version, but we believe that adding this to the tool can be beneficial not only for our use case but for other teams out there.
//cc @dhui and @nronas for visibility
The text was updated successfully, but these errors were encountered: