-
Notifications
You must be signed in to change notification settings - Fork 15
Handle KAE (or other exceptions of similar type) in the future when witnessing transactions #6533
Handle KAE (or other exceptions of similar type) in the future when witnessing transactions #6533
Conversation
@@ -60,7 +61,7 @@ public static <T extends TransactionStore> Workflow create( | |||
@Override | |||
public WorkflowHistory run() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may wish to update the WorkflowHistory
class that it only accepts FullyWitnessedTransaction
rather than both classes. I think this is ok for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Could we leave a todo here? WorkflowHistory
should only accept FullyWitnessedTransaction
long term, yeah.
assertThat(onlyKeyAlreadyExistsThrowingStore.readWrite(List.of(writeTransactionAction))) | ||
.hasValueSatisfying(witnessedTransaction -> { | ||
assertThat(witnessedTransaction).isInstanceOf(MaybeWitnessedTransaction.class); | ||
assertThat(witnessedTransaction.actions()).containsExactly(writeTransactionAction.witness()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-reading this, probably should have checked that commit timestamp is present, but this is a precondition when we construct MaybeWitnessedTransaction
, thus is not needed
Preconditions.checkArgument( | ||
commitTimestamp().isPresent(), | ||
"Commit timestamp must be present in a potentially witnessed transaction, as otherwise it is not" | ||
+ " possible to validate whether or not a transaction has been committed."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be stronger: something like given how the transaction protocol works, a transaction for which we haven't retrieved the commit timestamp has not written anything of significance to the database
or something like that?
@@ -28,4 +28,9 @@ public interface ReadableTransactionStore { | |||
* @return The value of the cell for a given table. | |||
*/ | |||
Optional<Integer> get(String table, WorkloadCell cell); | |||
|
|||
/** | |||
* Checks the startTimestamp for a given transaction whether it has committed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: checks whether the transaction with the provided startTimestamp has committed?
|
||
private static final MaybeWitnessedTransaction MAYBE_WITNESSED_TRANSACTION = MaybeWitnessedTransaction.builder() | ||
.startTimestamp(100L) | ||
.commitTimestamp(100L) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's try not to include situations that won't (generally!) happen - should we use 101L
instead?
lenient().when(transactionStore.isCommitted(anyLong())).thenThrow(new RuntimeException()); | ||
assertThat(READ_ONLY_WITNESSED_TRANSACTION.accept( | ||
new OnlyCommittedWitnessedTransactionVisitor(transactionStore))) | ||
.contains(READ_ONLY_WITNESSED_TRANSACTION); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For readability, this kind of check is normally done by asserting verifyNoInteractions(transactionStore);
(also, while it doesn't apply here, an implementation that catches exceptions and does stuff with them could pass this test as written!)
@@ -60,7 +61,7 @@ public static <T extends TransactionStore> Workflow create( | |||
@Override | |||
public WorkflowHistory run() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Could we leave a todo here? WorkflowHistory
should only accept FullyWitnessedTransaction
long term, yeah.
.startTimestamp(transaction.getTimestamp()) | ||
.commitTimestamp(commitTimestampProvider | ||
.get() | ||
.getCommitTimestampOrThrowIfMaybeNotCommitted(transaction.getTimestamp())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not actionable: This might work strangely if the transaction task deliberately throws a KeyAlreadyExistsException
, though I don't think we need to worry about that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Later on, I made it the more general case of TransactionCommitFailedException
, rather than just KAE, but still, a transaction could also throw this for funs. I think in this case we're ok, as we validate later on if it exists or not
log.info( | ||
"Failed to record transaction due to an exception for startTimestamp {}", | ||
SafeArg.of("startTimestamp", startTimestamp), | ||
e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to set the transaction reference at the beginning of the task, if we're concerned about something going wrong before we enter the commit-writes phase.
WriteTransactionAction writeTransactionAction = | ||
WriteTransactionAction.of(TABLE_1, WORKLOAD_CELL_ONE, VALUE_ONE); | ||
TransactionManager onlyThrowsKeyAlreadyExistsExceptionManager = spy(manager); | ||
doAnswer(answer -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Answer
is a functional interface with a method that takes an InvocationOnMock
, so this param should be named invocation
or invocationOnMock
.runTaskWithConditionWithRetry(any(Supplier.class), any()); | ||
AtlasDbTransactionStore onlyKeyAlreadyExistsThrowingStore = | ||
AtlasDbTransactionStore.create(onlyThrowsKeyAlreadyExistsExceptionManager, TABLES_TO_ATLAS_METADATA); | ||
assertThat(onlyKeyAlreadyExistsThrowingStore.readWrite(List.of(writeTransactionAction))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think the naming here is a bit hard to follow - not sure about the significance of the only
. Maybe keyAlreadyExistsExceptionThrowingManager
and keyAlreadyExistsExceptionThrowingStore
?
.startTimestamp(startTimestamp()) | ||
.commitTimestamp(commitTimestamp()) | ||
.actions(actions()) | ||
.build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non actionable: If we add an optional field to WitnessedTransaction
after this, we won't copy over its value from the Maybe
to the Fully
version. I think this risk is ok. Maybe there's some clever refactoring we can do, but not needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
General
Before this PR:
After this PR:
==COMMIT_MSG==
Handles transactions that may or may have not been witnessed due to errors (such as KeyAlreadyExistsException) in the workload server.
==COMMIT_MSG==
Priority:
Concerns / possible downsides (what feedback would you like?):
Is documentation needed?:
Compatibility
Does this PR create any API breaks (e.g. at the Java or HTTP layers) - if so, do we have compatibility?:
Does this PR change the persisted format of any data - if so, do we have forward and backward compatibility?:
The code in this PR may be part of a blue-green deploy. Can upgrades from previous versions safely coexist? (Consider restarts of blue or green nodes.):
Does this PR rely on statements being true about other products at a deployment - if so, do we have correct product dependencies on these products (or other ways of verifying that these statements are true)?:
Does this PR need a schema migration?
Testing and Correctness
What, if any, assumptions are made about the current state of the world? If they change over time, how will we find out?:
What was existing testing like? What have you done to improve it?:
If this PR contains complex concurrent or asynchronous code, is it correct? The onus is on the PR writer to demonstrate this.:
If this PR involves acquiring locks or other shared resources, how do we ensure that these are always released?:
Execution
How would I tell this PR works in production? (Metrics, logs, etc.):
Has the safety of all log arguments been decided correctly?:
Will this change significantly affect our spending on metrics or logs?:
How would I tell that this PR does not work in production? (monitors, etc.):
If this PR does not work as expected, how do I fix that state? Would rollback be straightforward?:
If the above plan is more complex than “recall and rollback”, please tag the support PoC here (if it is the end of the week, tag both the current and next PoC):
Scale
Would this PR be expected to pose a risk at scale? Think of the shopping product at our largest stack.:
Would this PR be expected to perform a large number of database calls, and/or expensive database calls (e.g., row range scans, concurrent CAS)?:
Would this PR ever, with time and scale, become the wrong thing to do - and if so, how would we know that we need to do something differently?:
Development Process
Where should we start reviewing?:
If this PR is in excess of 500 lines excluding versions lock-files, why does it not make sense to split it?:
Please tag any other people who should be aware of this PR:
@jeremyk-91
@sverma30
@raiju