Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: pg mandates multi-stmt batches to execute as single txn #44803

Closed
knz opened this issue Feb 6, 2020 · 3 comments · Fixed by #76834
Closed

sql: pg mandates multi-stmt batches to execute as single txn #44803

knz opened this issue Feb 6, 2020 · 3 comments · Fixed by #76834
Assignees
Labels
A-sql-executor SQL txn logic A-sql-pgcompat Semantic compatibility with PostgreSQL A-sql-pgwire pgwire protocol issues. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@knz
Copy link
Contributor

knz commented Feb 6, 2020

As per pg's docs: https://www.postgresql.org/docs/12/protocol-flow.html#PROTOCOL-FLOW-MULTI-STATEMENT

When a simple Query message contains more than one SQL statement (separated by semicolons), those statements are executed as a single transaction, unless explicit transaction control commands are included to force a different behavior. For example, if the message contains

INSERT INTO mytable VALUES(1);
SELECT 1/0;
INSERT INTO mytable VALUES(2);

then the divide-by-zero failure in the SELECT will force rollback of the first INSERT. Furthermore, because execution of the message is abandoned at the first error, the second INSERT is never attempted at all.

If a pg client that expects this behavior is ran against crdb, this will yield inconsistent data without any clear signal to the client.

(The current crdb behavior is a plain ACID violation when viewed through the lens of pg semantics.)

Epic CRDB-8948

Jira issue: CRDB-5201

@knz knz added A-sql-executor SQL txn logic A-sql-pgcompat Semantic compatibility with PostgreSQL A-sql-pgwire pgwire protocol issues. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels Feb 6, 2020
@andreimatei
Copy link
Contributor

I think we simply didn't know about this, since it appears to only be documented starting in pg 11. But I've verified down to pg 9.4, and their behavior seems to have stayed consistent.

@knz
Copy link
Contributor Author

knz commented Apr 4, 2020

I agree with "we didn't know" but it seems egregious given the amount of time we routinely spend comparing behaviors. I am very surprised we didn't notice earlier.

The reason, I think, is that when an error happens the messages sent back and forth are the same in both cases. So there's no visual hint. We probably never checked the side effects in the database.

My argument at the top remains though. If client apps rely on transactionality, we're failing them big time now, and in no way for them to notice until their data is in a bad state.

@rafiss
Copy link
Collaborator

rafiss commented May 21, 2020

Leaving some notes in here about how Postgres implements this. In the exec_simple_query function of src/backend/tcop/postgres.c:

  • If there are multiple statements in the query, set use_implicit_block to true.
  • For each statement in the query:
    • Unconditionally call start_xact_command(), which starts a transaction if one is not in progress already.
      • Enter the TBLOCK_STARTED state, or else remain in the TBLOCK_INPROGRESS or TBLOCK_IMPLICIT_INPROGRESS states.
      • Other cases are omitted for brevity.
    • If use_implicit_block is true, call BeginImplicitTransactionBlock(). This moves from the TBLOCK_STARTED state to the TBLOCK_IMPLICIT_INPROGRESS state. (No effect if the current state is anything else.)
    • Do other stuff.
    • If this is the last statement in the query:
      • If use_implicit_block is true, call EndImplicitTransactionBlock(). This moves from the TBLOCK_IMPLICIT_INPROGRESS state to the TBLOCK_STARTED state. (No effect if the current state is anything else.)
      • Unconditionally call finish_xact_command(). (This calls CommitTransaction() if the current state is TBLOCK_STARTED. If the current state is TBLOCK_INPROGRESS or TBLOCK_IMPLICIT_INPROGRESS it just increments the command counter.)
    • Else if the current statement is a TransactionStmt, call finish_xact_command(). (This calls CommitTransaction() if the current state is TBLOCK_END.)

We should be able to do something similar in our own state machine.

NB: I skipped over some edge cases above (which are also described in the Postgres docs). For example, if a BEGIN is issued from inside an implicit transaction block, then the state switches to TBLOCK_INPROGRESS. As the docs say, "If the BEGIN follows some statements that were executed as an implicit transaction block, those statements are not immediately committed; in effect, they are retroactively included into the new regular transaction block."

@rafiss rafiss added the T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) label May 12, 2021
@jlinder jlinder added the T-sql-queries SQL Queries Team label Jun 16, 2021
@rafiss rafiss self-assigned this Feb 22, 2022
@exalate-issue-sync exalate-issue-sync bot removed the T-sql-queries SQL Queries Team label Feb 22, 2022
craig bot pushed a commit that referenced this issue Mar 15, 2022
76834: sql: execute batch statements in an implicit transaction r=otan a=rafiss

fixes #44803
relates to #76490

Release justification: high value bug fix to existing functionality 

### *: prepare tests for batch stmt txn change

This commit will make the next one easier to review.

### sql: execute batch statements in an implicit transaction

Release note (bug fix): Previously statements that arrived in a batch
during the simple query protocol would all execute in their own implicit
transactions. Now, we match the Postgres wire protocol, so all these
statements share the same implicit transaction. If a BEGIN is included
in a statement batch, then the existing implicit transaction is
upgraded to an explicit transaction.

### sql: add session var for old implicit txn behavior

Release note (sql change): The enable_implicit_transaction_for_batch_statements
session variable was added. It defaults to true. When it is true,
multiple statements in a single query (a.k.a. a "batch statement") will
all be run in the same implicit transaction, which matches the Postgres
wire protocol. This setting is provided for users who want to preserve
the behavior of CockroachDB versions v21.2 and earlier.

77780: roachtest: update 22.1 version map to v21.2.7 r=celiala a=Azhng

Release note: None
Release justification: non-production code change.

77781: cloud: bump orchestrator to v21.2.7 r=celiala a=Azhng

Release note: None
Release justification: non-production code change.

77803: kvserverbase: move MaxCommandSizeDefault out of kvserver r=ajwerner a=ajwerner

```
$ bazel query "somepath(//pkg/sql, //pkg/kv/kvserver)"
INFO: Empty results
Loading: 0 packages loaded
```

Release justification: non-production code changes

Release note: None

Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Azhng <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
@craig craig bot closed this as completed in cfe38f2 Mar 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-executor SQL txn logic A-sql-pgcompat Semantic compatibility with PostgreSQL A-sql-pgwire pgwire protocol issues. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants