Skip to content
This repository has been archived by the owner on Nov 14, 2024. It is now read-only.

Fix NPE caused by txn commit race condition #5940

Merged
merged 5 commits into from
Mar 8, 2022
Merged

Fix NPE caused by txn commit race condition #5940

merged 5 commits into from
Mar 8, 2022

Conversation

gsheasby
Copy link
Contributor

@gsheasby gsheasby commented Mar 7, 2022

Goals (and why):

==COMMIT_MSG==
Fixed a race condition where if a transaction was just committed by another thread, we may return null values from TransactionService.get.
==COMMIT_MSG==

Implementation Description (bullets):

Testing (What was existing testing like? What have you done to improve it?):

  • Added a unit test that failed without the first commit

Concerns (what feedback would you like?):

Where should we start reviewing?:

Priority (whenever / two weeks / yesterday): today

gsheasby added 2 commits March 7, 2022 16:49
Fixes NPEs caused by race condition between multiple attempts to commit or abort a transaction.
@changelog-app
Copy link

changelog-app bot commented Mar 7, 2022

Generate changelog in changelog/@unreleased

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

Fixed a race condition where if a transaction was just committed by another thread, users would receive null values from TransactionService.get. This caused users to observe that a completed transaction was still running, but did not lead to any correctness issue.

Check the box to generate changelog(s)

  • Generate changelog entry

type: fix
fix:
description: Fixed a race condition where if a transaction was just committed by
another thread, we may return null values from TransactionService.get.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's probably say what this means to a user: users may observe that a transaction that had definitively committed or aborted was still running. Retrying the transaction should work.

Copy link
Contributor

@jeremyk-91 jeremyk-91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks good, 1 comment

@@ -100,10 +101,9 @@ public void putUnlessExistsMultiple(Map<Long, Long> keyValues) throws KeyAlready
+ "was found in the KVS",
SafeArg.of("kvsValue", kvsValue),
SafeArg.of("stagingValue", currentValue));
continue;
} finally {
resultBuilder.put(startTs, commitTs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit odd. If an exception is thrown, won't this appear to a caller that a value was committed, but in reality it has not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a CheckAndSetException is thrown here, it means the value was committed by another thread, and hence we must return it.

If other exceptions are thrown, then we will propagate the exception out of this method, and thus would not return the result anyway.

This is quite subtle, though, so I'll add a clarifying comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep! because of the way the protocol works (ADR if you're interested), once you're able to perform a quorum read of some value V, the only permitted write operations are CAS((V, Staging), (V, Staging)) and PUT((V, Committed)).

Copy link
Contributor

@jeremyk-91 jeremyk-91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -100,10 +101,9 @@ public void putUnlessExistsMultiple(Map<Long, Long> keyValues) throws KeyAlready
+ "was found in the KVS",
SafeArg.of("kvsValue", kvsValue),
SafeArg.of("stagingValue", currentValue));
continue;
} finally {
resultBuilder.put(startTs, commitTs);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep! because of the way the protocol works (ADR if you're interested), once you're able to perform a quorum read of some value V, the only permitted write operations are CAS((V, Staging), (V, Staging)) and PUT((V, Committed)).

@bulldozer-bot bulldozer-bot bot merged commit 50349f6 into develop Mar 8, 2022
@bulldozer-bot bulldozer-bot bot deleted the gs/npe branch March 8, 2022 16:20
@svc-autorelease
Copy link
Collaborator

Released 0.567.0

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants