Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: TxnCoordMeta refactor broke sqlalchemy retry/savepoint logic #45477

Closed
rafiss opened this issue Feb 26, 2020 · 7 comments
Closed

kv: TxnCoordMeta refactor broke sqlalchemy retry/savepoint logic #45477

rafiss opened this issue Feb 26, 2020 · 7 comments
Assignees

Comments

@rafiss
Copy link
Collaborator

rafiss commented Feb 26, 2020

Background

PR #43032 removed TxnCoordMeta.

Previously execSavepointInOpenState would perform this check:

meta := ex.state.mu.txn.GetTxnCoordMeta(ctx)
if meta.CommandCount > 0 {
  err := pgerror.Newf(pgcode.Syntax,
    "SAVEPOINT %s needs to be the first statement in a transaction", RestartSavepointName)
}

After this PR, the check is done this way:

if ex.state.mu.txn.Active() {
  err := pgerror.Newf(pgcode.Syntax,
    "SAVEPOINT %s needs to be the first statement in a transaction", RestartSavepointName)
}

Issue

I am working on the retry behavior in our sqlalchemy adapter, and found that this PR causes this error to occur now in cases where it previously did not.

Here is a basic repro:

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from cockroachdb.sqlalchemy import run_transaction

import _thread
import time
import logging

logging.basicConfig()
logging.getLogger('sqlalchemy.engine').setLevel(logging.INFO)

#engine = create_engine("cockroachdb://root@localhost:26257/defaultdb?sslmode=disable", pool_size=2, max_overflow=0)
engine = create_engine("cockroachdb://root@localhost:26257/defaultdb?sslmode=disable")
session_maker = sessionmaker(bind=engine)

def db_func(conn, i):
    mod = i % 10
    if i < 10:
        conn.execute(f"insert into a values({i}, {i}) on conflict do nothing")
    rs = conn.execute(f"select a, b from a where a = {mod}")
    conn.execute("select crdb_internal.force_retry('5s')")
    for row in rs:
        print(f"thread {i}: results {row}")

def f(thread_id):
    print(f"thread id {thread_id}")
    run_transaction(session_maker, lambda conn: db_func(conn, thread_id))

run_transaction(session_maker, lambda c: c.execute("create table if not exists a (a int primary key, b int)"))
try:
    for x in range(1):
        _thread.start_new_thread(f, (x,))
except:
    print("Error: unable to start thread")
    raise

while 1:
    pass

It produces this output

INFO:sqlalchemy.engine.base.Engine:select current_schema()
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:select version()
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:BEGIN (implicit)
INFO:sqlalchemy.engine.base.Engine:SAVEPOINT cockroach_restart
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:create table if not exists a (a int primary key, b int)
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:RELEASE SAVEPOINT cockroach_restart
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:COMMIT
thread id 0
INFO:sqlalchemy.engine.base.Engine:BEGIN (implicit)
INFO:sqlalchemy.engine.base.Engine:SAVEPOINT cockroach_restart
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:insert into a values(0, 0) on conflict do nothing
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:select a, b from a where a = 0
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:select crdb_internal.force_retry('5s')
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:ROLLBACK TO SAVEPOINT cockroach_restart
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:SAVEPOINT cockroach_restart
INFO:sqlalchemy.engine.base.Engine:{}
INFO:sqlalchemy.engine.base.Engine:ROLLBACK
Unhandled exception in thread started by <function f at 0x10481b8c0>
Traceback (most recent call last):
  File "/Users/rafiss/.virtualenvs/sqlalchemy-full/lib/python3.7/site-packages/SQLAlchemy-1.3.13.dev0-py3.7-macosx-10.14-x86_64.egg/sqlalchemy/engine/base.py", line 1246, in _execute_context
    cursor, statement, parameters, context
  File "/Users/rafiss/.virtualenvs/sqlalchemy-full/lib/python3.7/site-packages/SQLAlchemy-1.3.13.dev0-py3.7-macosx-10.14-x86_64.egg/sqlalchemy/engine/default.py", line 588, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.SyntaxError: SAVEPOINT cockroach_restart needs to be the first statement in a transaction

When running against a build before #43032, the error does not occur.

@andreimatei
Copy link
Contributor

Can you please spell out a minimal set of statements that makes you unhappy?
Is it the ROLLBACK TO SAVEPOINT cockroach_restart; SAVEPOINT cockroach_restart combo?

I'm mucking with savepoints now, so I'll take a look.

@rafiss
Copy link
Collaborator Author

rafiss commented Feb 26, 2020

Here is a minimal example. Thanks!

> CREATE TABLE IF NOT EXISTS a (a INT PRIMARY KEY, b INT);
> BEGIN;
OPEN> SAVEPOINT cockroach_restart;
OPEN> SELECT a, b FROM a WHERE a = 0;
OPEN> ROLLBACK TO SAVEPOINT cockroach_restart;
OPEN> SAVEPOINT cockroach_restart;

ERROR: SAVEPOINT cockroach_restart needs to be the first statement in a transaction
SQLSTATE: 42601

@knz
Copy link
Contributor

knz commented Feb 26, 2020

this example is not exactly minimal. The problem you have here is that cockroach sql inserts invisible SQL in-between prompts.

I think if you had entered begin; savepoint cockroach_restart or rollback to savepoint cockroach_restart; savepoint cockroach_restart, it would work.

@knz
Copy link
Contributor

knz commented Feb 26, 2020

(see --echo-sql. You can disable with \unset check_syntax and \unset smart_prompt)

@rafiss
Copy link
Collaborator Author

rafiss commented Feb 26, 2020

The extra invisible SQL doesn't seem to matter for this bug. I re-ran my commands after doing \unset check_syntax, \unset smart_prompt, and \set prompt1 >.

I get the error on a local build including that PR, but using a build from the commit right before the PR was merged, I do not get the error.

@knz
Copy link
Contributor

knz commented Feb 26, 2020

thanks for clarifying

@rafiss
Copy link
Collaborator Author

rafiss commented Mar 3, 2020

Just a heads up -- I am marking this as a possible release blocker because it will cause apps that use sqlalchemy to break. If this seems to be an overly severe categorization, please let me know!

andreimatei added a commit to andreimatei/cockroach that referenced this issue Mar 5, 2020
This patch adds support for SAVEPOINT <foo>, RELEASE SAVEPOINT <foo>,
ROLLBACK TO SAVEPOINT <foo>.
Before this patch, we only had support for the special savepoint
cockroach_restart, which had to be placed at the beginning of the
transaction and was specifically intended for dealing with transaction
retries. This patch implements general support for savepoints, which
provide an error recovery mechanism.

The connExecutor now maintains a stack of savepoints. Rolling back to a
savepoint uses the recent KV api for ignoring a range of write sequence
numbers.

At the SQL level, savepoints differ in two characteristics:
1) savepoints placed at the beginning of a transaction (i.e. before any
KV operations) are marked as "initial". Rolling back to an initial
savepoint is always possible. Rolling back to a non-initial savepoint is
not possible after the transaction restarts (see below).
2) the savepoint named "cockroach_restart" retains special RELEASE
semantics: releasing it (attempts to) commit the underlying KV txn.
This continues to allow for discovering of deferred serilizability
errors (i.e. write timestamp pushes by the ts cache). As before, this
RELEASE can fail with a retriable error, at which point the client can
do ROLLBACK TO SAVEPOINT cockroach_restart (which is guaranteed to work
because cockroach_restart needs to be an "initial" savepoint). The
transaction continues to maintain all its locks after such an error.
This is all in contrast to releasing any other savepoints, which cannot
commit the txn and also never fails. See below for more discussion.
The cockroach_restart savepoint is only special in its release behavior,
not in its rollback behavior.

With the implementation of savepoints, the state machine driving a SQL
connection's transactions becomes a lot simpler. There's no longer a
distinction between an "Aborted" transaction and one that's in
"RestartWait". Rolling back to a savepoint now works the same way across
the two states, so RestartWait is gone.

This patch also improves the KV savepoints. They now capture and restore
the state of the read spans and the in-flight writes.

Some things don't work (yet):
a) Rolling back to a savepoint created after a schema change will error
out. This is because we don't yet snapshot the transaction's schema
change state.
b) After a retriable error, you can only rollback to an initial
savepoint. Attempting to rollback to a non-initial savepoint generates a
retriable error again. If the trasaction has been aborted, I think this
is the right behavior; no recovery is possible since the transaction has
lost its write intents. In the usual case where the transaction has not
been aborted, I think we want something better but it will take more
work to get it. I think the behavior we want is the following:
- after a serializability failure, retrying just part of the transaction
should be doable by attempting a ROLLBACK TO SAVEPOINT. This rollback
should succeed if all the non-rolled-back reads can be refreshed to the
desired timestamp. If they can be refreshed, then the client can simply
retry the rolled back part of the transaction. If they can't, then the
ROLLBACK should return a retriable error again, allowing the client to
attempt a deeper rollback - and so on until the client rolls back to an
initial savepoint (which succeeds by definition).
Implementing this would allow for the following nifty pattern:

func fn_n() {
  for {
    SAVEPOINT savepoint_n
    try {
      fn_n+1()
    } catch retriable error {
      err := ROLLBACK TO SAVEPOINT outer
      if err != nil {
        throw err
      }
      continue
    }
    RELEASE SAVEPOINT savepoint_n
    break
  }
}

The idea here is that the client is trying to re-do as little work as
possible by successively rolling back to earlier and earlier savepoints.
This pattern will technically work with the current patch already,
except it will not actually help the client in any way since all the
rollbacks will fail until we get to the very first savepoint.
There's an argument to be made for making RELEASE SAVEPOINT check for
deferred serializability violations (and perhaps other deferred checks -
like deferred constraint validation), although Postgres doesn't do any
of these.
Anyway, I've left implementing this for a future patch because I want to
do some KV work for supporting it nicely. Currently, the automatic
restart behavior that KV transactions have is a pain in the ass since it
works against what we're trying to do.

For the time-being, non-initial savepoints remember their txn ID and
epoch and attempting to rollback to them after these changes produces a
retriable error automatically.

Fixes cockroachdb#45477
Touches cockroachdb#10735

Release note (sql change): SQL savepoints are now supported. SAVEPOINT
<foo>, RELEASE SAVEPOINT <foo>, ROLLBACK TO SAVEPOINT <foo> now works.
`SHOW SAVEPOINT STATUS` can be used to inspect the current stack of active
savepoints.

Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Andrei Matei <[email protected]>
andreimatei added a commit to andreimatei/cockroach that referenced this issue Mar 5, 2020
This patch adds support for SAVEPOINT <foo>, RELEASE SAVEPOINT <foo>,
ROLLBACK TO SAVEPOINT <foo>.
Before this patch, we only had support for the special savepoint
cockroach_restart, which had to be placed at the beginning of the
transaction and was specifically intended for dealing with transaction
retries. This patch implements general support for savepoints, which
provide an error recovery mechanism.

The connExecutor now maintains a stack of savepoints. Rolling back to a
savepoint uses the recent KV api for ignoring a range of write sequence
numbers.

At the SQL level, savepoints differ in two characteristics:
1) savepoints placed at the beginning of a transaction (i.e. before any
KV operations) are marked as "initial". Rolling back to an initial
savepoint is always possible. Rolling back to a non-initial savepoint is
not possible after the transaction restarts (see below).
2) the savepoint named "cockroach_restart" retains special RELEASE
semantics: releasing it (attempts to) commit the underlying KV txn.
This continues to allow for discovering of deferred serilizability
errors (i.e. write timestamp pushes by the ts cache). As before, this
RELEASE can fail with a retriable error, at which point the client can
do ROLLBACK TO SAVEPOINT cockroach_restart (which is guaranteed to work
because cockroach_restart needs to be an "initial" savepoint). The
transaction continues to maintain all its locks after such an error.
This is all in contrast to releasing any other savepoints, which cannot
commit the txn and also never fails. See below for more discussion.
The cockroach_restart savepoint is only special in its release behavior,
not in its rollback behavior.

With the implementation of savepoints, the state machine driving a SQL
connection's transactions becomes a lot simpler. There's no longer a
distinction between an "Aborted" transaction and one that's in
"RestartWait". Rolling back to a savepoint now works the same way across
the two states, so RestartWait is gone.

This patch also improves the KV savepoints. They now capture and restore
the state of the read spans and the in-flight writes.

Some things don't work (yet):
a) Rolling back to a savepoint created after a schema change will error
out. This is because we don't yet snapshot the transaction's schema
change state.
b) After a retriable error, you can only rollback to an initial
savepoint. Attempting to rollback to a non-initial savepoint generates a
retriable error again. If the trasaction has been aborted, I think this
is the right behavior; no recovery is possible since the transaction has
lost its write intents. In the usual case where the transaction has not
been aborted, I think we want something better but it will take more
work to get it. I think the behavior we want is the following:
- after a serializability failure, retrying just part of the transaction
should be doable by attempting a ROLLBACK TO SAVEPOINT. This rollback
should succeed if all the non-rolled-back reads can be refreshed to the
desired timestamp. If they can be refreshed, then the client can simply
retry the rolled back part of the transaction. If they can't, then the
ROLLBACK should return a retriable error again, allowing the client to
attempt a deeper rollback - and so on until the client rolls back to an
initial savepoint (which succeeds by definition).
Implementing this would allow for the following nifty pattern:

func fn_n() {
  for {
    SAVEPOINT savepoint_n
    try {
      fn_n+1()
    } catch retriable error {
      err := ROLLBACK TO SAVEPOINT outer
      if err != nil {
        throw err
      }
      continue
    }
    RELEASE SAVEPOINT savepoint_n
    break
  }
}

The idea here is that the client is trying to re-do as little work as
possible by successively rolling back to earlier and earlier savepoints.
This pattern will technically work with the current patch already,
except it will not actually help the client in any way since all the
rollbacks will fail until we get to the very first savepoint.
There's an argument to be made for making RELEASE SAVEPOINT check for
deferred serializability violations (and perhaps other deferred checks -
like deferred constraint validation), although Postgres doesn't do any
of these.
Anyway, I've left implementing this for a future patch because I want to
do some KV work for supporting it nicely. Currently, the automatic
restart behavior that KV transactions have is a pain in the ass since it
works against what we're trying to do.

For the time-being, non-initial savepoints remember their txn ID and
epoch and attempting to rollback to them after these changes produces a
retriable error automatically.

Fixes cockroachdb#45477
Touches cockroachdb#10735

Release note (sql change): SQL savepoints are now supported. SAVEPOINT
<foo>, RELEASE SAVEPOINT <foo>, ROLLBACK TO SAVEPOINT <foo> now works.
`SHOW SAVEPOINT STATUS` can be used to inspect the current stack of active
savepoints.

Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Andrei Matei <[email protected]>
andreimatei added a commit to andreimatei/cockroach that referenced this issue Mar 6, 2020
This patch adds support for SAVEPOINT <foo>, RELEASE SAVEPOINT <foo>,
ROLLBACK TO SAVEPOINT <foo>.
Before this patch, we only had support for the special savepoint
cockroach_restart, which had to be placed at the beginning of the
transaction and was specifically intended for dealing with transaction
retries. This patch implements general support for savepoints, which
provide an error recovery mechanism.

The connExecutor now maintains a stack of savepoints. Rolling back to a
savepoint uses the recent KV api for ignoring a range of write sequence
numbers.

At the SQL level, savepoints differ in two characteristics:
1) savepoints placed at the beginning of a transaction (i.e. before any
KV operations) are marked as "initial". Rolling back to an initial
savepoint is always possible. Rolling back to a non-initial savepoint is
not possible after the transaction restarts (see below).
2) the savepoint named "cockroach_restart" retains special RELEASE
semantics: releasing it (attempts to) commit the underlying KV txn.
This continues to allow for discovering of deferred serilizability
errors (i.e. write timestamp pushes by the ts cache). As before, this
RELEASE can fail with a retriable error, at which point the client can
do ROLLBACK TO SAVEPOINT cockroach_restart (which is guaranteed to work
because cockroach_restart needs to be an "initial" savepoint). The
transaction continues to maintain all its locks after such an error.
This is all in contrast to releasing any other savepoints, which cannot
commit the txn and also never fails. See below for more discussion.
The cockroach_restart savepoint is only special in its release behavior,
not in its rollback behavior.

With the implementation of savepoints, the state machine driving a SQL
connection's transactions becomes a lot simpler. There's no longer a
distinction between an "Aborted" transaction and one that's in
"RestartWait". Rolling back to a savepoint now works the same way across
the two states, so RestartWait is gone.

This patch also improves the KV savepoints. They now capture and restore
the state of the read spans and the in-flight writes.

Some things don't work (yet):
a) Rolling back to a savepoint created after a schema change will error
out. This is because we don't yet snapshot the transaction's schema
change state.
b) After a retriable error, you can only rollback to an initial
savepoint. Attempting to rollback to a non-initial savepoint generates a
retriable error again. If the trasaction has been aborted, I think this
is the right behavior; no recovery is possible since the transaction has
lost its write intents. In the usual case where the transaction has not
been aborted, I think we want something better but it will take more
work to get it. I think the behavior we want is the following:
- after a serializability failure, retrying just part of the transaction
should be doable by attempting a ROLLBACK TO SAVEPOINT. This rollback
should succeed if all the non-rolled-back reads can be refreshed to the
desired timestamp. If they can be refreshed, then the client can simply
retry the rolled back part of the transaction. If they can't, then the
ROLLBACK should return a retriable error again, allowing the client to
attempt a deeper rollback - and so on until the client rolls back to an
initial savepoint (which succeeds by definition).
Implementing this would allow for the following nifty pattern:

func fn_n() {
  for {
    SAVEPOINT savepoint_n
    try {
      fn_n+1()
    } catch retriable error {
      err := ROLLBACK TO SAVEPOINT outer
      if err != nil {
        throw err
      }
      continue
    }
    RELEASE SAVEPOINT savepoint_n
    break
  }
}

The idea here is that the client is trying to re-do as little work as
possible by successively rolling back to earlier and earlier savepoints.
This pattern will technically work with the current patch already,
except it will not actually help the client in any way since all the
rollbacks will fail until we get to the very first savepoint.
There's an argument to be made for making RELEASE SAVEPOINT check for
deferred serializability violations (and perhaps other deferred checks -
like deferred constraint validation), although Postgres doesn't do any
of these.
Anyway, I've left implementing this for a future patch because I want to
do some KV work for supporting it nicely. Currently, the automatic
restart behavior that KV transactions have is a pain in the ass since it
works against what we're trying to do.

For the time-being, non-initial savepoints remember their txn ID and
epoch and attempting to rollback to them after these changes produces a
retriable error automatically.

Fixes cockroachdb#45477
Touches cockroachdb#10735

Release note (sql change): SQL savepoints are now supported. SAVEPOINT
<foo>, RELEASE SAVEPOINT <foo>, ROLLBACK TO SAVEPOINT <foo> now works.
`SHOW SAVEPOINT STATUS` can be used to inspect the current stack of active
savepoints.

Co-authored-by: Raphael 'kena' Poss <[email protected]>
Co-authored-by: Andrei Matei <[email protected]>
@craig craig bot closed this as completed in f1e2a00 Mar 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants