kv: `TxnCoordMeta` refactor broke sqlalchemy retry/savepoint logic #45477

rafiss · 2020-02-26T22:17:23Z

Background

PR #43032 removed TxnCoordMeta.

Previously execSavepointInOpenState would perform this check:

meta := ex.state.mu.txn.GetTxnCoordMeta(ctx)
if meta.CommandCount > 0 {
  err := pgerror.Newf(pgcode.Syntax,
    "SAVEPOINT %s needs to be the first statement in a transaction", RestartSavepointName)
}

After this PR, the check is done this way:

if ex.state.mu.txn.Active() {
  err := pgerror.Newf(pgcode.Syntax,
    "SAVEPOINT %s needs to be the first statement in a transaction", RestartSavepointName)
}

Issue

I am working on the retry behavior in our sqlalchemy adapter, and found that this PR causes this error to occur now in cases where it previously did not.

Here is a basic repro:

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from cockroachdb.sqlalchemy import run_transaction

import _thread
import time
import logging

logging.basicConfig()
logging.getLogger('sqlalchemy.engine').setLevel(logging.INFO)

#engine = create_engine("cockroachdb://root@localhost:26257/defaultdb?sslmode=disable", pool_size=2, max_overflow=0)
engine = create_engine("cockroachdb://root@localhost:26257/defaultdb?sslmode=disable")
session_maker = sessionmaker(bind=engine)

def db_func(conn, i):
    mod = i % 10
    if i < 10:
        conn.execute(f"insert into a values({i}, {i}) on conflict do nothing")
    rs = conn.execute(f"select a, b from a where a = {mod}")
    conn.execute("select crdb_internal.force_retry('5s')")
    for row in rs:
        print(f"thread {i}: results {row}")

def f(thread_id):
    print(f"thread id {thread_id}")
    run_transaction(session_maker, lambda conn: db_func(conn, thread_id))

run_transaction(session_maker, lambda c: c.execute("create table if not exists a (a int primary key, b int)"))
try:
    for x in range(1):
        _thread.start_new_thread(f, (x,))
except:
    print("Error: unable to start thread")
    raise

while 1:
    pass

It produces this output

INFO:sqlalchemy.engine.base.Engine:select current_schema()
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:select version()
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:BEGIN (implicit)
INFO:sqlalchemy.engine.base.Engine:SAVEPOINT cockroach_restart
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:create table if not exists a (a int primary key, b int)
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:RELEASE SAVEPOINT cockroach_restart
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:COMMIT
thread id 0
INFO:sqlalchemy.engine.base.Engine:BEGIN (implicit)
INFO:sqlalchemy.engine.base.Engine:SAVEPOINT cockroach_restart
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:insert into a values(0, 0) on conflict do nothing
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:select a, b from a where a = 0
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:select crdb_internal.force_retry('5s')
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:ROLLBACK TO SAVEPOINT cockroach_restart
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:SAVEPOINT cockroach_restart
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:ROLLBACK
Unhandled exception in thread started by <function f at 0x10481b8c0>
Traceback (most recent call last):
  File "/Users/rafiss/.virtualenvs/sqlalchemy-full/lib/python3.7/site-packages/SQLAlchemy-1.3.13.dev0-py3.7-macosx-10.14-x86_64.egg/sqlalchemy/engine/base.py", line 1246, in _execute_context
    cursor, statement, parameters, context
  File "/Users/rafiss/.virtualenvs/sqlalchemy-full/lib/python3.7/site-packages/SQLAlchemy-1.3.13.dev0-py3.7-macosx-10.14-x86_64.egg/sqlalchemy/engine/default.py", line 588, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.SyntaxError: SAVEPOINT cockroach_restart needs to be the first statement in a transaction

When running against a build before #43032, the error does not occur.

The text was updated successfully, but these errors were encountered:

andreimatei · 2020-02-26T22:23:12Z

Can you please spell out a minimal set of statements that makes you unhappy?
Is it the ROLLBACK TO SAVEPOINT cockroach_restart; SAVEPOINT cockroach_restart combo?

I'm mucking with savepoints now, so I'll take a look.

rafiss · 2020-02-26T23:03:30Z

Here is a minimal example. Thanks!

> CREATE TABLE IF NOT EXISTS a (a INT PRIMARY KEY, b INT);
> BEGIN;
OPEN> SAVEPOINT cockroach_restart;
OPEN> SELECT a, b FROM a WHERE a = 0;
OPEN> ROLLBACK TO SAVEPOINT cockroach_restart;
OPEN> SAVEPOINT cockroach_restart;

ERROR: SAVEPOINT cockroach_restart needs to be the first statement in a transaction
SQLSTATE: 42601

knz · 2020-02-26T23:07:29Z

this example is not exactly minimal. The problem you have here is that cockroach sql inserts invisible SQL in-between prompts.

I think if you had entered begin; savepoint cockroach_restart or rollback to savepoint cockroach_restart; savepoint cockroach_restart, it would work.

knz · 2020-02-26T23:07:51Z

(see --echo-sql. You can disable with \unset check_syntax and \unset smart_prompt)

rafiss · 2020-02-26T23:25:14Z

The extra invisible SQL doesn't seem to matter for this bug. I re-ran my commands after doing \unset check_syntax, \unset smart_prompt, and \set prompt1 >.

I get the error on a local build including that PR, but using a build from the commit right before the PR was merged, I do not get the error.

knz · 2020-02-26T23:26:04Z

thanks for clarifying

rafiss · 2020-03-03T19:23:26Z

Just a heads up -- I am marking this as a possible release blocker because it will cause apps that use sqlalchemy to break. If this seems to be an overly severe categorization, please let me know!

This patch adds support for SAVEPOINT <foo>, RELEASE SAVEPOINT <foo>, ROLLBACK TO SAVEPOINT <foo>. Before this patch, we only had support for the special savepoint cockroach_restart, which had to be placed at the beginning of the transaction and was specifically intended for dealing with transaction retries. This patch implements general support for savepoints, which provide an error recovery mechanism. The connExecutor now maintains a stack of savepoints. Rolling back to a savepoint uses the recent KV api for ignoring a range of write sequence numbers. At the SQL level, savepoints differ in two characteristics: 1) savepoints placed at the beginning of a transaction (i.e. before any KV operations) are marked as "initial". Rolling back to an initial savepoint is always possible. Rolling back to a non-initial savepoint is not possible after the transaction restarts (see below). 2) the savepoint named "cockroach_restart" retains special RELEASE semantics: releasing it (attempts to) commit the underlying KV txn. This continues to allow for discovering of deferred serilizability errors (i.e. write timestamp pushes by the ts cache). As before, this RELEASE can fail with a retriable error, at which point the client can do ROLLBACK TO SAVEPOINT cockroach_restart (which is guaranteed to work because cockroach_restart needs to be an "initial" savepoint). The transaction continues to maintain all its locks after such an error. This is all in contrast to releasing any other savepoints, which cannot commit the txn and also never fails. See below for more discussion. The cockroach_restart savepoint is only special in its release behavior, not in its rollback behavior. With the implementation of savepoints, the state machine driving a SQL connection's transactions becomes a lot simpler. There's no longer a distinction between an "Aborted" transaction and one that's in "RestartWait". Rolling back to a savepoint now works the same way across the two states, so RestartWait is gone. This patch also improves the KV savepoints. They now capture and restore the state of the read spans and the in-flight writes. Some things don't work (yet): a) Rolling back to a savepoint created after a schema change will error out. This is because we don't yet snapshot the transaction's schema change state. b) After a retriable error, you can only rollback to an initial savepoint. Attempting to rollback to a non-initial savepoint generates a retriable error again. If the trasaction has been aborted, I think this is the right behavior; no recovery is possible since the transaction has lost its write intents. In the usual case where the transaction has not been aborted, I think we want something better but it will take more work to get it. I think the behavior we want is the following: - after a serializability failure, retrying just part of the transaction should be doable by attempting a ROLLBACK TO SAVEPOINT. This rollback should succeed if all the non-rolled-back reads can be refreshed to the desired timestamp. If they can be refreshed, then the client can simply retry the rolled back part of the transaction. If they can't, then the ROLLBACK should return a retriable error again, allowing the client to attempt a deeper rollback - and so on until the client rolls back to an initial savepoint (which succeeds by definition). Implementing this would allow for the following nifty pattern: func fn_n() { for { SAVEPOINT savepoint_n try { fn_n+1() } catch retriable error { err := ROLLBACK TO SAVEPOINT outer if err != nil { throw err } continue } RELEASE SAVEPOINT savepoint_n break } } The idea here is that the client is trying to re-do as little work as possible by successively rolling back to earlier and earlier savepoints. This pattern will technically work with the current patch already, except it will not actually help the client in any way since all the rollbacks will fail until we get to the very first savepoint. There's an argument to be made for making RELEASE SAVEPOINT check for deferred serializability violations (and perhaps other deferred checks - like deferred constraint validation), although Postgres doesn't do any of these. Anyway, I've left implementing this for a future patch because I want to do some KV work for supporting it nicely. Currently, the automatic restart behavior that KV transactions have is a pain in the ass since it works against what we're trying to do. For the time-being, non-initial savepoints remember their txn ID and epoch and attempting to rollback to them after these changes produces a retriable error automatically. Fixes cockroachdb#45477 Touches cockroachdb#10735 Release note (sql change): SQL savepoints are now supported. SAVEPOINT <foo>, RELEASE SAVEPOINT <foo>, ROLLBACK TO SAVEPOINT <foo> now works. `SHOW SAVEPOINT STATUS` can be used to inspect the current stack of active savepoints. Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: Andrei Matei <[email protected]>

rafiss assigned knz Feb 26, 2020

jordanlewis mentioned this issue Mar 3, 2020

20.1 release blockers list #45599

Closed

83 tasks

rafiss mentioned this issue Mar 3, 2020

Add a test for retry/savepoint logic cockroachdb/sqlalchemy-cockroachdb#102

Merged

craig bot closed this as completed in f1e2a00 Mar 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: `TxnCoordMeta` refactor broke sqlalchemy retry/savepoint logic #45477

kv: `TxnCoordMeta` refactor broke sqlalchemy retry/savepoint logic #45477

rafiss commented Feb 26, 2020

andreimatei commented Feb 26, 2020

rafiss commented Feb 26, 2020

knz commented Feb 26, 2020

knz commented Feb 26, 2020 •

edited

Loading

rafiss commented Feb 26, 2020 •

edited

Loading

knz commented Feb 26, 2020

rafiss commented Mar 3, 2020 •

edited

Loading

kv: TxnCoordMeta refactor broke sqlalchemy retry/savepoint logic #45477

kv: TxnCoordMeta refactor broke sqlalchemy retry/savepoint logic #45477

Comments

rafiss commented Feb 26, 2020

Background

Issue

andreimatei commented Feb 26, 2020

rafiss commented Feb 26, 2020

knz commented Feb 26, 2020

knz commented Feb 26, 2020 • edited Loading

rafiss commented Feb 26, 2020 • edited Loading

knz commented Feb 26, 2020

rafiss commented Mar 3, 2020 • edited Loading

kv: `TxnCoordMeta` refactor broke sqlalchemy retry/savepoint logic #45477

kv: `TxnCoordMeta` refactor broke sqlalchemy retry/savepoint logic #45477

knz commented Feb 26, 2020 •

edited

Loading

rafiss commented Feb 26, 2020 •

edited

Loading

rafiss commented Mar 3, 2020 •

edited

Loading