Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Allow specifying multiple migration sources in CLI #3177

Closed
wants to merge 1 commit into from

Conversation

Tortoaster
Copy link

@Tortoaster Tortoaster commented Apr 5, 2024

This PR extends the CLI to allow specifying multiple migration sources:

sqlx migrate run --source migrations --source fixtures

It will take the migrations within all specified folders, and consider each of them sorted by version in ascending order. This should be fully backwards compatible.

This change would allow separating migrations that set up table structure from those that create test data. This does not have an issue associated to it, please let me know if I should open one first.


Motivation (rather long)

Use case

When setting up a web project for local development, I like to include some test fixtures in the database, so that I can see how everything looks when populated with actual data. I create that test data from database dumps, so they are simple SQL files, just like migrations. So far, I solved this problem by having two separate directories in my project:

/project
    /migrations
        20240101000000_init.up.sql
        20240101000000_init.down.sql
    /fixtures
        20240101000001_add_a_few_blog_posts.sql
    ...

This way, I can run sqlx migrate run and sqlx migrate run --source fixtures locally to create both the tables and the test data within them. For the production database, I only run sqlx migrate run, so that the test data isn't included there. This works well enough in most cases.

The problem

A problem arises when I need to create a migration for a newer version that changes an existing table. The migration itself should already make sure existing data will conform to the new data model. However, with the above setup, all fixture data is created after the last table migration, so they will have to comply to the new data model from the start.

Example: suppose I add a column with a non-null and foreign key constraint to an existing table. That same migration will have to make sure that all existing rows get updated to have a valid value in that column. Existing production data doesn't have that column yet, but that will be fixed by running the migration. Existing fixtures don't have that column yet either, and attempting to run them will result in an error.

Alternative solutions

To work around this issue, I can choose to either:

  • move all fixture migrations into the /migrations folder (so that they are performed in order of their timestamps), then run sqlx migrate run, and then move them back so I don't accidentally do the same in the production database
  • manually adapt all fixture migrations to conform to the new data model from the get-go

Both of these are undesirable. The former requires manual work every time the database needs to be initialized (or some sort of external tool that synchronizes a single folder with the contents of both other folders). The latter can take quite long depending on the amount of test data, and also requires manual work to undo in case of a sqlx migrate revert.

Allowing to specify multiple sources with the CLI would solve the problem elegantly. I hope you'll consider including it in sqlx!

@Tortoaster Tortoaster changed the title Allow specifying multiple migration sources in CLI feat: Allow specifying multiple migration sources in CLI Apr 8, 2024
@praseodym
Copy link

I was also looking for an option to apply fixtures from the CLI to make it easier to set up a development environment.

I think the way this PR achieves that would track any the fixtures from --source fixtures as if they were migrations and also add them to the _sqlx_migrations table. This is an unwanted side-effect.

A better approach would be a dedicated fixtures option that runs one-off scripts that can add data but won't otherwise be tracked like migrations. Of course this is also a bit more work to implement.

@Tortoaster
Copy link
Author

Are there downsides to tracking fixture migrations in the _sqlx_migrations table? They're distinct in the way that they don't alter table structure, and that you wouldn't run them in production environments, but they're still migrations.

If you pull new fixture migrations into your local repository, a simple re-run of sqlx migrate run --source migrations --source fixtures would set everything up for you with this approach, which can also be automated if desired. Without keeping track of which fixture migrations have been run, you would have to run any new fixture migrations manually.

@praseodym
Copy link

The downside is that sqlx will throw an error after fixtures are edited. This easiest way to resolve this is to recreate the database from scratch, which can be cumbersome during development.

If fixtures are not tracked in the migrations table but just run as one-off scripts, sqlx would not throw errors when fixtures are edited and the developer only has to recreate the database when really required.

@Tortoaster
Copy link
Author

Tortoaster commented Jun 11, 2024

It's true that changing the fixtures would result in an error. If that's a regular occurrence in your workflow, this change is indeed not helpful. I'm not sure about other databases, but Postgres' psql tool allows running one-off scripts, which may better suit your needs.

This simple change mainly streamlines fixtures for workflows that treat them as migrations. Like migrations, they shouldn't be changed directly; any necessary changes should be done by a new migration. Be it another fixture migration that updates/deletes old rows and creates new ones, or a "regular" migration that changes table structure for instance (note that, with this PR, these will also adapt older fixture data to conform to the new data model, saving the work of recreating them and effectively testing whether the migration works as intended). The fact that sqlx keeps track of which fixture migrations have already been run is very useful when working in a team: I simply run sqlx migrate run --source migrations --source fixtures (rather, Docker Compose does it for me) after pulling someone else's fixtures, and their fixtures are applied without reapplying older fixtures.

@academiaresf
Copy link

Some news here? It's a mandatory change for local/testing/fixture envs

@abonander
Copy link
Collaborator

Treating fixtures and migrations interchangeably is not the correct approach. Fixtures should not be hashed, nor information about them stored in the database. They should just be a set of scripts that are optionally run during database setup or after migrations are run. Instead of tracking which fixtures have been run, they should be idempotent, e.g. insert data with fixed primary keys and do nothing on conflict.

@abonander
Copy link
Collaborator

I won't be merging this as-is but I'm happy to discuss design either here or in a new issue.

@Tortoaster
Copy link
Author

That is a more traditional way to create fixtures, but I disagree that it's the only way it should be. It often requires changing several fixtures after each table structure change, even though you've already done the work of migrating existing data in the migration itself. This approach has no such drawback, and as an added bonus, it allows you to easily check whether your migration migrates existing data as expected.

As to storing fixture information in the database: it's already possible to do that with sqlx now, this PR doesn't change that. It's up to the user to decide if doing so locally might cause problems (if they intend to edit them directly). This PR just allows separating migrations into multiple folders.

@academiaresf
Copy link

You are right, is not mandatory (language fail) but like @Tortoaster says, can be a good opportunity to drive more versatility allowing the users create migrations/fixtures/seeds in multiple folders/tables and track the changes (at least if this is not a big change on the source side).

@abonander
Copy link
Collaborator

@Tortoaster I don't think this is the right solution for what you want. Allowing the mixing and matching of migrations is a huge footgun that I don't want to hand the user. I have to assume that someone would try using this in production, not just for testing, especially if it's designed as generically as it is. Someone could really make a mess of things that way.

For supporting multiple applications touching the same database, I'm working on that as part of #3383 (separate migrations tables). The idea is that each application would be given its own schema that it controls.

My simplest recommendation is to just keep your fixtures up-to-date with the latest schema. It's a little more work, but it'll be so much easier to understand what the data should look like today, instead of the day it was created. This is especially important if you have multiple developers on the project.

Wanting to test that a migration will preserve existing data correctly is a reasonable motivation, however. I'd just prefer a more opinionated solution.

I've opened #3391 for discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants