Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: manually retry ABORTED reads in transactions #883

Merged
merged 5 commits into from
Jan 15, 2020

Conversation

schmidt-sebastian
Copy link
Contributor

This adds manual retry to our transaction logic similar to what we have in the Web SDK: https://github.com/firebase/firebase-js-sdk/blob/c822e78b00dd3420dcc749beb2f09a947aa4a344/packages/firestore/src/core/transaction_runner.ts#L112

We only retry GRPC exceptions, and only failures with code ABORTED. This is because:

  • we don't want to retry if the user throws an error or returns a rejected Promise
  • GAPIC retries reads that fail with UNAVAILABLE, INTERNAL and DEADLINE_EXCEEDED under the hood

Fixes #827 (fingers crossed)

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jan 14, 2020
@codecov
Copy link

codecov bot commented Jan 14, 2020

Codecov Report

Merging #883 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #883      +/-   ##
=========================================
- Coverage    96.6%   96.6%   -0.01%     
=========================================
  Files          25      25              
  Lines       15564   15559       -5     
  Branches     1155    1155              
=========================================
- Hits        15036   15031       -5     
  Misses        519     519              
  Partials        9       9
Impacted Files Coverage Δ
dev/src/types.ts 100% <ø> (ø) ⬆️
dev/src/index.ts 98.44% <100%> (-0.01%) ⬇️
dev/src/reference.ts 98.91% <100%> (-0.01%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2c66aba...06c482e. Read the comment docs.

Copy link

@thebrianchen thebrianchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -552,3 +561,14 @@ function validateReadOptions(
}
}
}

function isRetryableTransactionError(error: Error): boolean {
if (error instanceof GoogleError || 'code' in error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't seen this elsewhere: Is 'code' in error the standard way to check if the 'code' field exists?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is JavaScript functionality (see https://masteringjs.io/tutorials/fundamentals/hasownproperty for a comparison against Object.hasOwnProperty). I am doing it here because I am afraid that some of our dependencies might throw a GoogleError that is not actually a GoogleError by providing just a code and a message field.

@@ -106,7 +106,7 @@ function commit(

function rollback(
transaction?: Uint8Array | string,
err?: Error
error?: Error

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: change naming for commit() as well so they're all in sync (link)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Good catch.

begin('foo2', 'foo1'),
getDocument('foo2'),
commit('foo2')
).then(red => {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/red/res

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed (here and the test I copied it from)

return expect(
runTransaction(
() => {
return Promise.reject('request exception');
},
begin(),
rollback('foo')
rollback()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make it consistent throughout elsewhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

await transaction.get(docRef);
return 'failure';
},
begin('foo'),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If 'foo' is default, do we need it here as well, or can it also be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it as is here since getDocument uses the second argument. I prefer this over only specifying the transaction ID once or using undefined in its place.

Copy link
Contributor Author

@schmidt-sebastian schmidt-sebastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback addressed. Thanks for the speedy review.

@@ -552,3 +561,14 @@ function validateReadOptions(
}
}
}

function isRetryableTransactionError(error: Error): boolean {
if (error instanceof GoogleError || 'code' in error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is JavaScript functionality (see https://masteringjs.io/tutorials/fundamentals/hasownproperty for a comparison against Object.hasOwnProperty). I am doing it here because I am afraid that some of our dependencies might throw a GoogleError that is not actually a GoogleError by providing just a code and a message field.

@@ -106,7 +106,7 @@ function commit(

function rollback(
transaction?: Uint8Array | string,
err?: Error
error?: Error
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Good catch.

begin('foo2', 'foo1'),
getDocument('foo2'),
commit('foo2')
).then(red => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed (here and the test I copied it from)

await transaction.get(docRef);
return 'failure';
},
begin('foo'),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it as is here since getDocument uses the second argument. I prefer this over only specifying the transaction ID once or using undefined in its place.

return expect(
runTransaction(
() => {
return Promise.reject('request exception');
},
begin(),
rollback('foo')
rollback()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@schmidt-sebastian schmidt-sebastian merged commit 7562033 into master Jan 15, 2020
@schmidt-sebastian schmidt-sebastian deleted the mrschmidt/abort branch January 15, 2020 04:44
if (error instanceof GoogleError || 'code' in error) {
// In transactions, the backend returns code ABORTED for reads that fail
// with contention. These errors should be retried for both GoogleError
// and GoogleError-alike errors (in case the prototype hierarchy gets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this check for error.name === 'GoogleError' instead?

When implementing gRPC servers, it's not unthinkable that an ABORTED error is thrown from inside the transaction, with the expectation that it's returned to the client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should accept any object that has a "code" property. In the environment you are describing, will this property be stripped?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me rephrase.

There are many ways of wrongly implementing a transaction function, to do with side-effects, or forgetting to use the transaction itself to perform queries.

This new ABORTED handling adds more ways to get it wrong.

There is an assumption that all GoogleError instances came from transaction methods. That may not be the case if I use another GCP SDK within my function. I know, I shouldn't be, but somebody still might.

It also won't be the case if I call a Firestore method on a DocumentReference which then throws an ABORTED error.

These assumptions are then generalized by testing if the thrown error has a .code = 10 property. I have a ServiceError object that is thrown with a gRPC response code. If for some reason I throw that error inside a transaction function, my transaction may be retried even if it succeeded. And again, sure, I suppose I shouldn't be throwing that error, but somebody still might.

A safer approach may be to register errors, thrown as part of the actual transaction, in a WeakMap. Then as they propagate through user code, when you catch them again you can check whether they're genuine transaction errors, and then check their code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple things to note:

  • A read outside of a transaction should never result in Code ABORTED. This error should only happen during a transactional read.
  • If a user throws an error with code: 10, we will retry the transaction. In general, transaction functions should be written so they can be re-run, and re-running them more often than needed should not cause any harm.

I understand that we control the errors that we are catching here. We could tighten this check to make sure that the error is indeed a GoogleError, but for now I decided to err on the side of caution and also retry GoogleError-alike errors. This allows us to catch errors with code properties that were created by our dependencies before we converted everything to use the proper GoogleError type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Firestore client does not retry on contention errors
4 participants