release-20.1: kv/concurrency: avoid redundant txn pushes and batch intent resolution #49835

nvanbenschoten · 2020-06-03T16:44:17Z

Backport 2/2 commits from #49218.

/cc @cockroachdb/release

Fixes #48790.
Informs #36876.
Closes #31664.

This commit adds a per-Range LRU cache of transactions that are known to be aborted or committed. We use this cache in the lockTableWaiter for two purposes:

when we see a lock held by a known-finalized txn, we neither wait out the kv.lock_table.coordinator_liveness_push_delay (10 ms) nor push the transactions record (RPC to leaseholder of pushee's txn record range).
we use the existence of a transaction in the cache as an indication that it may have abandoned multiple intents, perhaps due to a failure of the transaction coordinator node, so we begin deferring intent resolution to enable batching.

Together, these two changes make us much more effective as cleaning up after failed transactions that have abandoned a large number of intents. The following example demonstrates this:

--- BEFORE

CREATE TABLE keys (k BIGINT NOT NULL PRIMARY KEY);
BEGIN; INSERT INTO keys SELECT generate_series(1, 10000); ROLLBACK;
SELECT * FROM keys;

  k
-----
(0 rows)

Time: 2m50.801304266s


CREATE TABLE keys2 (k BIGINT NOT NULL PRIMARY KEY);
BEGIN; INSERT INTO keys2 SELECT generate_series(1, 10000); ROLLBACK;
INSERT INTO keys2 SELECT generate_series(1, 10000);

INSERT 10000

Time: 3m26.874571045s



--- AFTER

CREATE TABLE keys (k BIGINT NOT NULL PRIMARY KEY);
BEGIN; INSERT INTO keys SELECT generate_series(1, 10000); ROLLBACK;
SELECT * FROM keys;

  k
-----
(0 rows)

Time: 5.138220753s


CREATE TABLE keys2 (k BIGINT NOT NULL PRIMARY KEY);
BEGIN; INSERT INTO keys2 SELECT generate_series(1, 10000); ROLLBACK;
INSERT INTO keys2 SELECT generate_series(1, 10000);

INSERT 10000

Time: 48.763541138s

Notice that we are still not as fast at cleaning up intents on the insertion path as we are at doing so on the retrieval path. This is because we only batch the resolution of intents observed by a single request at a time. For the scanning case, a single ScanRequest notices all 10,000 intents and cleans them all up together. For the insertion case, each of the 10,000 PutRequests notices a single intent, and each intent is cleaned up individually. So this case is only benefited by the first part of this change (no liveness delay or txn record push) and not the second part of this change (intent resolution batching).

For this reason, we still haven't solved all of #36876. To completely address that, we'll need to defer propagation of WriteIntentError during batch evaluation, as we do for WriteTooOldErrors. Or we can wait out the future LockTable changes - once we remove all cases where an intent is not "discovered", the changes here will effectively address #36876.

This was a partial regression in v20.1, so we'll want to backport this to that release branch. This change is on the larger side, but I feel ok about it because the mechanics aren't too tricky. I'll wait a week before backporting just to see if anything falls out.

Release note (bug fix): Abandoned intents due to failed transaction coordinators are now cleaned up much faster. This resolves a regression in v20.1.0 compared to prior releases.

@irfansharif I'm adding you as a reviewer because there's not really anyone else on KV that knows this code, so we should change that.

Fixes cockroachdb#48790. Informs cockroachdb#36876. Closes cockroachdb#31664. This commit adds a per-Range LRU cache of transactions that are known to be aborted or committed. We use this cache in the lockTableWaiter for two purposes: 1. when we see a lock held by a known-finalized txn, we neither wait out the kv.lock_table.coordinator_liveness_push_delay (10 ms) nor push the transactions record (RPC to leaseholder of pushee's txn record range). 2. we use the existence of a transaction in the cache as an indication that it may have abandoned multiple intents, perhaps due to a failure of the transaction coordinator node, so we begin deferring intent resolution to enable batching. Together, these two changes make us much more effective as cleaning up after failed transactions that have abandoned a large number of intents. The following example demonstrates this: ```sql --- BEFORE CREATE TABLE keys (k BIGINT NOT NULL PRIMARY KEY); BEGIN; INSERT INTO keys SELECT generate_series(1, 10000); ROLLBACK; SELECT * FROM keys; k ----- (0 rows) Time: 2m50.801304266s CREATE TABLE keys2 (k BIGINT NOT NULL PRIMARY KEY); BEGIN; INSERT INTO keys2 SELECT generate_series(1, 10000); ROLLBACK; INSERT INTO keys2 SELECT generate_series(1, 10000); INSERT 10000 Time: 3m26.874571045s --- AFTER CREATE TABLE keys (k BIGINT NOT NULL PRIMARY KEY); BEGIN; INSERT INTO keys SELECT generate_series(1, 10000); ROLLBACK; SELECT * FROM keys; k ----- (0 rows) Time: 5.138220753s CREATE TABLE keys2 (k BIGINT NOT NULL PRIMARY KEY); BEGIN; INSERT INTO keys2 SELECT generate_series(1, 10000); ROLLBACK; INSERT INTO keys2 SELECT generate_series(1, 10000); INSERT 10000 Time: 48.763541138s ``` Notice that we are still not as fast at cleaning up intents on the insertion path as we are at doing so on the retrieval path. This is because we only batch the resolution of intents observed by a single request at a time. For the scanning case, a single ScanRequest notices all 10,000 intents and cleans them all up together. For the insertion case, each of the 10,000 PutRequests notice a single intent, and each intent is cleaned up individually. So this case is only benefited by the first part of this change (no liveness delay or txn record push) and not the second part of this change (intent resolution batching). For this reason, we still haven't solved all of cockroachdb#36876. To completely address that, we'll need to defer propagation of WriteIntentError during batch evaluation, like we do for WriteTooOldErrors. Or we can wait out the future LockTable changes - once we remove all cases where an intent is not "discovered", the changes here will effectively address cockroachdb#36876. This was a partial regression in v20.1, so we'll want to backport this to that release branch. This change is on the larger side, but I feel ok about it because the mechanics aren't too tricky. I'll wait a week before backporting just to see if anything falls out. Release note (bug fix): Abandoned intents due to failed transaction coordinators are now cleaned up much faster. This resolves a regression in v20.1.0 compared to prior releases.

cockroach-teamcity · 2020-06-03T16:44:24Z

This change is

sumeerbhola

Reviewed 10 of 10 files at r1, 8 of 8 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained

These structs are 288 bytes large - a little too large to copy around unnecessarily when we already have pointers to their original, immutable instance on the heap.

nvanbenschoten requested a review from sumeerbhola June 3, 2020 16:44

sumeerbhola approved these changes Jun 3, 2020

View reviewed changes

kv: stop copying roachpb.Transaction by value when pushing

a1568b4

These structs are 288 bytes large - a little too large to copy around unnecessarily when we already have pointers to their original, immutable instance on the heap.

nvanbenschoten force-pushed the backport20.1-49218 branch from 88aeaa5 to a1568b4 Compare June 3, 2020 20:06

nvanbenschoten merged commit 7e05700 into cockroachdb:release-20.1 Jun 3, 2020

nvanbenschoten deleted the backport20.1-49218 branch June 3, 2020 21:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-20.1: kv/concurrency: avoid redundant txn pushes and batch intent resolution #49835

release-20.1: kv/concurrency: avoid redundant txn pushes and batch intent resolution #49835

nvanbenschoten commented Jun 3, 2020

cockroach-teamcity commented Jun 3, 2020

sumeerbhola left a comment

release-20.1: kv/concurrency: avoid redundant txn pushes and batch intent resolution #49835

release-20.1: kv/concurrency: avoid redundant txn pushes and batch intent resolution #49835

Conversation

nvanbenschoten commented Jun 3, 2020

cockroach-teamcity commented Jun 3, 2020

sumeerbhola left a comment

Choose a reason for hiding this comment