feat: Allow specifying multiple migration sources in CLI #3177

Tortoaster · 2024-04-05T14:51:23Z

This PR extends the CLI to allow specifying multiple migration sources:

sqlx migrate run --source migrations --source fixtures

It will take the migrations within all specified folders, and consider each of them sorted by version in ascending order. This should be fully backwards compatible.

This change would allow separating migrations that set up table structure from those that create test data. This does not have an issue associated to it, please let me know if I should open one first.

Motivation (rather long)

Use case

When setting up a web project for local development, I like to include some test fixtures in the database, so that I can see how everything looks when populated with actual data. I create that test data from database dumps, so they are simple SQL files, just like migrations. So far, I solved this problem by having two separate directories in my project:

/project
    /migrations
        20240101000000_init.up.sql
        20240101000000_init.down.sql
    /fixtures
        20240101000001_add_a_few_blog_posts.sql
    ...

This way, I can run sqlx migrate run and sqlx migrate run --source fixtures locally to create both the tables and the test data within them. For the production database, I only run sqlx migrate run, so that the test data isn't included there. This works well enough in most cases.

The problem

A problem arises when I need to create a migration for a newer version that changes an existing table. The migration itself should already make sure existing data will conform to the new data model. However, with the above setup, all fixture data is created after the last table migration, so they will have to comply to the new data model from the start.

Example: suppose I add a column with a non-null and foreign key constraint to an existing table. That same migration will have to make sure that all existing rows get updated to have a valid value in that column. Existing production data doesn't have that column yet, but that will be fixed by running the migration. Existing fixtures don't have that column yet either, and attempting to run them will result in an error.

Alternative solutions

To work around this issue, I can choose to either:

move all fixture migrations into the /migrations folder (so that they are performed in order of their timestamps), then run sqlx migrate run, and then move them back so I don't accidentally do the same in the production database
manually adapt all fixture migrations to conform to the new data model from the get-go

Both of these are undesirable. The former requires manual work every time the database needs to be initialized (or some sort of external tool that synchronizes a single folder with the contents of both other folders). The latter can take quite long depending on the amount of test data, and also requires manual work to undo in case of a sqlx migrate revert.

Allowing to specify multiple sources with the CLI would solve the problem elegantly. I hope you'll consider including it in sqlx!

praseodym · 2024-06-07T15:09:37Z

I was also looking for an option to apply fixtures from the CLI to make it easier to set up a development environment.

I think the way this PR achieves that would track any the fixtures from --source fixtures as if they were migrations and also add them to the _sqlx_migrations table. This is an unwanted side-effect.

A better approach would be a dedicated fixtures option that runs one-off scripts that can add data but won't otherwise be tracked like migrations. Of course this is also a bit more work to implement.

Tortoaster · 2024-06-07T16:19:20Z

Are there downsides to tracking fixture migrations in the _sqlx_migrations table? They're distinct in the way that they don't alter table structure, and that you wouldn't run them in production environments, but they're still migrations.

If you pull new fixture migrations into your local repository, a simple re-run of sqlx migrate run --source migrations --source fixtures would set everything up for you with this approach, which can also be automated if desired. Without keeping track of which fixture migrations have been run, you would have to run any new fixture migrations manually.

praseodym · 2024-06-11T11:02:56Z

The downside is that sqlx will throw an error after fixtures are edited. This easiest way to resolve this is to recreate the database from scratch, which can be cumbersome during development.

If fixtures are not tracked in the migrations table but just run as one-off scripts, sqlx would not throw errors when fixtures are edited and the developer only has to recreate the database when really required.

Tortoaster · 2024-06-11T18:41:34Z

It's true that changing the fixtures would result in an error. If that's a regular occurrence in your workflow, this change is indeed not helpful. I'm not sure about other databases, but Postgres' psql tool allows running one-off scripts, which may better suit your needs.

This simple change mainly streamlines fixtures for workflows that treat them as migrations. Like migrations, they shouldn't be changed directly; any necessary changes should be done by a new migration. Be it another fixture migration that updates/deletes old rows and creates new ones, or a "regular" migration that changes table structure for instance (note that, with this PR, these will also adapt older fixture data to conform to the new data model, saving the work of recreating them and effectively testing whether the migration works as intended). The fact that sqlx keeps track of which fixture migrations have already been run is very useful when working in a team: I simply run sqlx migrate run --source migrations --source fixtures (rather, Docker Compose does it for me) after pulling someone else's fixtures, and their fixtures are applied without reapplying older fixtures.

academiaresf · 2024-06-28T19:18:24Z

Some news here? It's a mandatory change for local/testing/fixture envs

abonander · 2024-06-28T21:42:43Z

Treating fixtures and migrations interchangeably is not the correct approach. Fixtures should not be hashed, nor information about them stored in the database. They should just be a set of scripts that are optionally run during database setup or after migrations are run. Instead of tracking which fixtures have been run, they should be idempotent, e.g. insert data with fixed primary keys and do nothing on conflict.

abonander · 2024-06-28T21:43:29Z

I won't be merging this as-is but I'm happy to discuss design either here or in a new issue.

Tortoaster · 2024-06-29T12:48:22Z

That is a more traditional way to create fixtures, but I disagree that it's the only way it should be. It often requires changing several fixtures after each table structure change, even though you've already done the work of migrating existing data in the migration itself. This approach has no such drawback, and as an added bonus, it allows you to easily check whether your migration migrates existing data as expected.

As to storing fixture information in the database: it's already possible to do that with sqlx now, this PR doesn't change that. It's up to the user to decide if doing so locally might cause problems (if they intend to edit them directly). This PR just allows separating migrations into multiple folders.

academiaresf · 2024-07-03T10:43:12Z

You are right, is not mandatory (language fail) but like @Tortoaster says, can be a good opportunity to drive more versatility allowing the users create migrations/fixtures/seeds in multiple folders/tables and track the changes (at least if this is not a big change on the source side).

abonander · 2024-07-30T01:40:29Z

@Tortoaster I don't think this is the right solution for what you want. Allowing the mixing and matching of migrations is a huge footgun that I don't want to hand the user. I have to assume that someone would try using this in production, not just for testing, especially if it's designed as generically as it is. Someone could really make a mess of things that way.

For supporting multiple applications touching the same database, I'm working on that as part of #3383 (separate migrations tables). The idea is that each application would be given its own schema that it controls.

My simplest recommendation is to just keep your fixtures up-to-date with the latest schema. It's a little more work, but it'll be so much easier to understand what the data should look like today, instead of the day it was created. This is especially important if you have multiple developers on the project.

Wanting to test that a migration will preserve existing data correctly is a reasonable motivation, however. I'd just prefer a more opinionated solution.

I've opened #3391 for discussion.

feat: Allow specifying multiple migration sources in CLI

6354fa4

Tortoaster force-pushed the multiple-sources branch from 6c70021 to 6354fa4 Compare April 8, 2024 09:40

Tortoaster changed the title ~~Allow specifying multiple migration sources in CLI~~ feat: Allow specifying multiple migration sources in CLI Apr 8, 2024

abonander mentioned this pull request Jul 30, 2024

Support versioned fixtures for ensuring that migrations preserve data correctly #3391

Open

abonander closed this Jul 30, 2024

CommanderStorm mentioned this pull request Jul 30, 2024

feat(cli): print documentation if parameter is ignored/infered #3370

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Allow specifying multiple migration sources in CLI #3177

feat: Allow specifying multiple migration sources in CLI #3177

Tortoaster commented Apr 5, 2024 •

edited

Loading

praseodym commented Jun 7, 2024

Tortoaster commented Jun 7, 2024

praseodym commented Jun 11, 2024

Tortoaster commented Jun 11, 2024 •

edited

Loading

academiaresf commented Jun 28, 2024

abonander commented Jun 28, 2024

abonander commented Jun 28, 2024

Tortoaster commented Jun 29, 2024

academiaresf commented Jul 3, 2024

abonander commented Jul 30, 2024

feat: Allow specifying multiple migration sources in CLI #3177

feat: Allow specifying multiple migration sources in CLI #3177

Conversation

Tortoaster commented Apr 5, 2024 • edited Loading

Use case

The problem

Alternative solutions

praseodym commented Jun 7, 2024

Tortoaster commented Jun 7, 2024

praseodym commented Jun 11, 2024

Tortoaster commented Jun 11, 2024 • edited Loading

academiaresf commented Jun 28, 2024

abonander commented Jun 28, 2024

abonander commented Jun 28, 2024

Tortoaster commented Jun 29, 2024

academiaresf commented Jul 3, 2024

abonander commented Jul 30, 2024

Tortoaster commented Apr 5, 2024 •

edited

Loading

Tortoaster commented Jun 11, 2024 •

edited

Loading