Skip to content
This repository has been archived by the owner on Nov 14, 2024. It is now read-only.

Handle KAE (or other exceptions of similar type) in the future when witnessing transactions #6533

Merged
merged 11 commits into from
May 3, 2023

Conversation

Sam-Kramer
Copy link
Contributor

General

Before this PR:

After this PR:

==COMMIT_MSG==
Handles transactions that may or may have not been witnessed due to errors (such as KeyAlreadyExistsException) in the workload server.
==COMMIT_MSG==

Priority:

Concerns / possible downsides (what feedback would you like?):

Is documentation needed?:

Compatibility

Does this PR create any API breaks (e.g. at the Java or HTTP layers) - if so, do we have compatibility?:

Does this PR change the persisted format of any data - if so, do we have forward and backward compatibility?:

The code in this PR may be part of a blue-green deploy. Can upgrades from previous versions safely coexist? (Consider restarts of blue or green nodes.):

Does this PR rely on statements being true about other products at a deployment - if so, do we have correct product dependencies on these products (or other ways of verifying that these statements are true)?:

Does this PR need a schema migration?

Testing and Correctness

What, if any, assumptions are made about the current state of the world? If they change over time, how will we find out?:

What was existing testing like? What have you done to improve it?:

If this PR contains complex concurrent or asynchronous code, is it correct? The onus is on the PR writer to demonstrate this.:

If this PR involves acquiring locks or other shared resources, how do we ensure that these are always released?:

Execution

How would I tell this PR works in production? (Metrics, logs, etc.):

Has the safety of all log arguments been decided correctly?:

Will this change significantly affect our spending on metrics or logs?:

How would I tell that this PR does not work in production? (monitors, etc.):

If this PR does not work as expected, how do I fix that state? Would rollback be straightforward?:

If the above plan is more complex than “recall and rollback”, please tag the support PoC here (if it is the end of the week, tag both the current and next PoC):

Scale

Would this PR be expected to pose a risk at scale? Think of the shopping product at our largest stack.:

Would this PR be expected to perform a large number of database calls, and/or expensive database calls (e.g., row range scans, concurrent CAS)?:

Would this PR ever, with time and scale, become the wrong thing to do - and if so, how would we know that we need to do something differently?:

Development Process

Where should we start reviewing?:

If this PR is in excess of 500 lines excluding versions lock-files, why does it not make sense to split it?:

Please tag any other people who should be aware of this PR:
@jeremyk-91
@sverma30
@raiju

@@ -60,7 +61,7 @@ public static <T extends TransactionStore> Workflow create(
@Override
public WorkflowHistory run() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may wish to update the WorkflowHistory class that it only accepts FullyWitnessedTransaction rather than both classes. I think this is ok for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Could we leave a todo here? WorkflowHistory should only accept FullyWitnessedTransaction long term, yeah.

assertThat(onlyKeyAlreadyExistsThrowingStore.readWrite(List.of(writeTransactionAction)))
.hasValueSatisfying(witnessedTransaction -> {
assertThat(witnessedTransaction).isInstanceOf(MaybeWitnessedTransaction.class);
assertThat(witnessedTransaction.actions()).containsExactly(writeTransactionAction.witness());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-reading this, probably should have checked that commit timestamp is present, but this is a precondition when we construct MaybeWitnessedTransaction, thus is not needed

Preconditions.checkArgument(
commitTimestamp().isPresent(),
"Commit timestamp must be present in a potentially witnessed transaction, as otherwise it is not"
+ " possible to validate whether or not a transaction has been committed.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be stronger: something like given how the transaction protocol works, a transaction for which we haven't retrieved the commit timestamp has not written anything of significance to the database or something like that?

@@ -28,4 +28,9 @@ public interface ReadableTransactionStore {
* @return The value of the cell for a given table.
*/
Optional<Integer> get(String table, WorkloadCell cell);

/**
* Checks the startTimestamp for a given transaction whether it has committed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: checks whether the transaction with the provided startTimestamp has committed?


private static final MaybeWitnessedTransaction MAYBE_WITNESSED_TRANSACTION = MaybeWitnessedTransaction.builder()
.startTimestamp(100L)
.commitTimestamp(100L)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's try not to include situations that won't (generally!) happen - should we use 101L instead?

lenient().when(transactionStore.isCommitted(anyLong())).thenThrow(new RuntimeException());
assertThat(READ_ONLY_WITNESSED_TRANSACTION.accept(
new OnlyCommittedWitnessedTransactionVisitor(transactionStore)))
.contains(READ_ONLY_WITNESSED_TRANSACTION);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For readability, this kind of check is normally done by asserting verifyNoInteractions(transactionStore); (also, while it doesn't apply here, an implementation that catches exceptions and does stuff with them could pass this test as written!)

@@ -60,7 +61,7 @@ public static <T extends TransactionStore> Workflow create(
@Override
public WorkflowHistory run() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Could we leave a todo here? WorkflowHistory should only accept FullyWitnessedTransaction long term, yeah.

.startTimestamp(transaction.getTimestamp())
.commitTimestamp(commitTimestampProvider
.get()
.getCommitTimestampOrThrowIfMaybeNotCommitted(transaction.getTimestamp()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not actionable: This might work strangely if the transaction task deliberately throws a KeyAlreadyExistsException, though I don't think we need to worry about that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Later on, I made it the more general case of TransactionCommitFailedException, rather than just KAE, but still, a transaction could also throw this for funs. I think in this case we're ok, as we validate later on if it exists or not

log.info(
"Failed to record transaction due to an exception for startTimestamp {}",
SafeArg.of("startTimestamp", startTimestamp),
e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to set the transaction reference at the beginning of the task, if we're concerned about something going wrong before we enter the commit-writes phase.

WriteTransactionAction writeTransactionAction =
WriteTransactionAction.of(TABLE_1, WORKLOAD_CELL_ONE, VALUE_ONE);
TransactionManager onlyThrowsKeyAlreadyExistsExceptionManager = spy(manager);
doAnswer(answer -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Answer is a functional interface with a method that takes an InvocationOnMock, so this param should be named invocation or invocationOnMock

.runTaskWithConditionWithRetry(any(Supplier.class), any());
AtlasDbTransactionStore onlyKeyAlreadyExistsThrowingStore =
AtlasDbTransactionStore.create(onlyThrowsKeyAlreadyExistsExceptionManager, TABLES_TO_ATLAS_METADATA);
assertThat(onlyKeyAlreadyExistsThrowingStore.readWrite(List.of(writeTransactionAction)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think the naming here is a bit hard to follow - not sure about the significance of the only. Maybe keyAlreadyExistsExceptionThrowingManager and keyAlreadyExistsExceptionThrowingStore?

.startTimestamp(startTimestamp())
.commitTimestamp(commitTimestamp())
.actions(actions())
.build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non actionable: If we add an optional field to WitnessedTransaction after this, we won't copy over its value from the Maybe to the Fully version. I think this risk is ok. Maybe there's some clever refactoring we can do, but not needed here.

Copy link
Contributor

@jeremyk-91 jeremyk-91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sam Kramer added 2 commits May 3, 2023 13:37
@bulldozer-bot bulldozer-bot bot merged commit 2857ac2 into develop May 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants