-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pgx TestTxCommitSerializationFailure seems to get stuck waiting on a lock #60754
Comments
The second insert will block on the first insert and can only commence once tx1 commits. Think of it this way: var mu sync.Mutex
mu.Lock() // tx1
mu.Lock() // tx2
mu.Unlock // tx1
mu.Unlock // tx2 Obviously this ordering is not going to complete as we deadlock on tx2's |
@tbg that makes sense to me, but then why does it not block when I run the same commands from the CLI as described above? am I missing something obvious? also, running the same test against Postgres in SERIALIZABLE mode does not block. do we implement SERIALIZABLE differently? |
OK, so if I run the below commands in two CLI shells repeatedly, I do eventually repro the behavior where the second connection is stuck waiting.
So questions:
|
Cockroach should never deadlock. Any deadlock is a serious bug. Does connection 1 actually send that commit? Stack traces and logs from a a repro will help. |
I don't think it's a deadlock. As I say above, when running that test against CockroachDB, connection 1 does not send that commit.
The test does not continue past that point. To reiterate, my questions are:
Is this all expected? Should we modify that PGX test so that it runs the commands in separate goroutines? |
Here's a debug zip though. debug.zip |
Postgres's locking behavior is different from cockroach's. Postgres figures out serialization violations later, at commit time. We're more pessimistic IIUC. We can't assume that they'll behave exactly the same. For example, I don't think pg ever blocks on reads. |
OK, that part is more clear to me now, thanks! I'd still like to understand why running the test against against CRDB blocks, while running the same commands against CRDB manually from the CLI does not block. |
I can't reproduce the behavior you're seeing @rafiss. I used this logic test:
I can only imagine that when you're reproing manually, the order isn't the same (though I also think the txns are simple enough for it to not be screwed up?) Also, isn't that SQL in the original test invalid? What is |
Oops, yes, I put the cast on the wrong expression in my code sample. The commands I put in the table in the issue description has the cast in the right place. (Editing the post now to fix my code sample too.) Though @tbg , one difference in your logic test repro and my manual CLI repro is that the logic test sends the INSERT immediately after the BEGIN. In my CLI repro, I do (1) BEGIN; (2) BEGIN; (1) INSERT..., (2) INSERT..., (1) COMMIT, (2) COMMIT Does interleaving BEGINs like this make any difference? Maybe not, since it looks like the pgx test interleaves them as well. |
It should not matter. What matters is that (2) INSERT happens after (1) INSERT but before (1) COMMIT. That way you will always have the second write block on the first one, and since the first one will never disappear on its own it's a deadlock. |
(More details: (2) INSERT contains a full table scan, and it's really that read that blocks on (1) INSERT). But a read can't ignore an intent at a lower timestamp, and the intent would definitely be at a lower timestamp since it was laid down earlier. |
We have marked this issue as stale because it has been inactive for |
The pgx driver has a test that verifies that a
40001
error is returned on serialization failure. See https://github.com/jackc/pgx/blob/927a15124e6a579327ea56b478e4ba7dac6e45a6/tx_test.go#L146When running against CockroachDB, however, the database gets stuck executing a query instead of returning the
40001
error code.Confusingly, when I try to reproduce this by running the commands from two different CLIs, the
40001
error is returned as expected.For this issue, we should determine why the pgx test gets stuck.
Here are the statements executed by the test:
And here is the test code, which gets stuck at the second INSERT. you can run
SHOW CLUSTER QUERIES
after invoking the test and see that the query is running.Jira issue: CRDB-3144
The text was updated successfully, but these errors were encountered: