psql/migrations: when no-transaction is set, split multi-statement sql into individual statements and execute them separately #3574

yannleretaille · 2024-10-23T13:27:47Z

This is going to be a controversial one.

Recently, support for adding -- no-transaction at the beginning of migration files was introduced, which will execute the migration without creating a transaction. This is useful for certain commands, such as CREATE INDEX CONCURRENTLY, which are not allowed to run inside transactions.

However, postgres will implicitly create a transaction block when a query contains multiple statements. This means that the -- no-transaction flag has inconsistent behavior depending on whether the migration contains just one statement or multiple ones, and that commands that are not allowed to run inside transactions will fail if part of a larger migration.

I propose to split the sql into individual statements when no_tx is set and to execute them separately, resulting in consistent behavior in line with how many sql tools handle this case. This was implemented using a custom pest grammar, which is very low overhead and performant (but does add a dependency).

One could argue that the current behavior is good (albeit lacking a clear error message) and that users should be encouraged to create small migrations with only one statement at a time when opting out of the safety of transactions. I would counter that a user that opts out of transactions usually has a reason for that and is aware that extra care has to be taken to ensure that migration is sound and tested, that an error during a no_tx migration will always require manual cleanup anyway, and that there are production cases where forcing the user to split up a logical migration into potentially many individual files can add complexity and make it harder to manage and audit (we recently had such case).

abonander · 2024-10-24T02:57:47Z

I don't really see the value in carrying all this extra complexity for what is a pretty niche use-case.

As far as I can tell, Diesel's migrations don't work this way, nor does Flyway--the two migrators I'm familiar with--and both allow you to configure migrations to run outside of a transaction.

yannleretaille · 2024-10-24T13:36:30Z

Flyway actually does split migrations into individual statements, and even goes a step further and automatically detects if certain statements need to be run outside a transaction.

Another example is goose (Go, 7k stars) that also splits the sql into statements and even allows providing additional annotations per statement.

On the ruby/rails side, ActiveRecord supports setting disable_ddl_transaction! and providing multiple execute statements to achieve the same effect.

Most SQL tools (DBeaver, psql, etc.) will also split multi-statment sqls and run them sequentially.

I do agree about the added complexity, but I'm not sure if I would necessarily agree that it is a niche case when dealing with larger projects that use postgres.

abonander · 2024-10-24T19:33:44Z

I dunno, I think it's a huge footgun because you can easily be left stuck with a partially applied migration. It won't even catch syntax errors until it goes to execute that statement. And if there's any bugs in the parser it could lead to some really confusing errors.

I think keeping our migrations simple is an advantage.

As a compromise, I could see adding a simple comment syntax like:

-- no-transaction

ALTER TABLE foo ADD COLUMN bar_id uuid;

-- Execute the following chunk in its own `Query` message:
-- statement-break
CREATE INDEX CONCURRENTLY ON foo(bar_id, foo_id);
-- statement-break

ALTER TABLE foo ADD FOREIGN KEY bar_id REFERENCES bar(bar_id);

Then instead of having to pseudo-parse SQL, we just check if a line begins with a certain string, similar to how we detect -- no-transaction currently.

It would also give you direct control over the splitting, and group things together which can execute in an autocommit block for a bit more safety.

yannleretaille · 2024-10-24T21:17:30Z

Re: footguns
Opting out of transactions will always come with risks. I think making that clear in the documentation (I don't think it's fully documented yet anyway since it's a new feature) and improving the error messaging would be helpful additional steps. It might even be a good idea to rename it to -- unsafe-no-transaction (while still accepting the old version).

Re: parser
I did test the parser with a bunch of exotic cases (testcases-sql.txt) and couldn't find any issues with it. I will add extensive testing in case you greenlight this approach. The pest parser is very lightweight and fast, but it does end up being a "pseudo" parser as you correctly point out. I think that's OK if all that is needed is to split statements and given a lot of other libraries and tools parse sql for the purpose of splitting it seems to be an accepted approach. I did briefly consider using pg_query, which should "guarantee" correct splitting, but decided against it due to the very heavy dependencies and long compile times.

Re: comment annotation idea
Your idea of using comments to annotate the beginning of a new statement block in --no-transaction migrations sounds like a potentially good approach as well. It's definitely safer/easier to parse, and the explicit annotation would make it very clear what the migration is doing. However, I'm not totally convinced about the added safety benefit, as any failure in a multi-statement no_tx migration will always require manual intervention. That also means that the additional annotation is ultimately slightly redundant.

Please let me know if you think any of the two approaches would be acceptable to you and if yes, which one you'd prefer and I'll update the PR!

yannleretaille added 2 commits October 23, 2024 13:23

feat(psql): add split_sql function using a simple pest parser

994fc43

feat(psql): use split_sql to run statements seperately in no_tx mode

21ccc4e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

psql/migrations: when no-transaction is set, split multi-statement sql into individual statements and execute them separately #3574

psql/migrations: when no-transaction is set, split multi-statement sql into individual statements and execute them separately #3574

yannleretaille commented Oct 23, 2024

abonander commented Oct 24, 2024

yannleretaille commented Oct 24, 2024

abonander commented Oct 24, 2024 •

edited

Loading

yannleretaille commented Oct 24, 2024 •

edited

Loading

psql/migrations: when no-transaction is set, split multi-statement sql into individual statements and execute them separately #3574

Are you sure you want to change the base?

psql/migrations: when no-transaction is set, split multi-statement sql into individual statements and execute them separately #3574

Conversation

yannleretaille commented Oct 23, 2024

abonander commented Oct 24, 2024

yannleretaille commented Oct 24, 2024

abonander commented Oct 24, 2024 • edited Loading

yannleretaille commented Oct 24, 2024 • edited Loading

abonander commented Oct 24, 2024 •

edited

Loading

yannleretaille commented Oct 24, 2024 •

edited

Loading