Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retryOptions.retryableErrorFn called with 200 ok for resumable upload #1851

Closed
benogle opened this issue Apr 8, 2022 · 3 comments · Fixed by #1857
Closed

retryOptions.retryableErrorFn called with 200 ok for resumable upload #1851

benogle opened this issue Apr 8, 2022 · 3 comments · Fixed by #1857
Assignees
Labels
api: storage Issues related to the googleapis/nodejs-storage API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: question Request for information or clarification. Not an issue.

Comments

@benogle
Copy link

benogle commented Apr 8, 2022

Environment details

  • OS: node14.18.2-alpine3.12 (docker)
  • Node.js version: 14.18.2
  • npm version: 6.14.15
  • @google-cloud/storage version: 5.18.3

Steps to reproduce

retryOptions.retryableErrorFn is often called with a plain object containing code: 200.

const storage = new Storage({
  retryOptions: {
    autoRetry: true,
    retryDelayMultiplier: 2,
    totalTimeout: 600,
    maxRetryDelay: 60,
    maxRetries: 10,
    idempotencyStrategy: IdempotencyStrategy.RetryAlways,
    retryableErrorFn: (err) => {
      console.log(err)

      // !!
      // err is often a plain object:
      // { code: 200, message: 'OK', name: 'OK' }

      return true
    },
  },
})
const cloudFile = storage.bucket('our-bucket').file('somename.zip')
const result = await cloudFile.save(buffer, {
  // `resumable: true` seems to be the key to triggering this.
  // We only set `resumable: true` with files > 10mb.
  resumable: true,

  gzip: true,
  metadata: {
    cacheControl: 'no-cache',
  },
  predefinedAcl: 'projectPrivate',
})

If we ignore the 200s in retryableErrorFn, cloudFile.save ultimately succeeds, but it took a while for us to discover we could safely ignore them as we could only reproduce this in production. retryableErrorFn is called in the upload stream's error event, obviously extremely confusing that it's called with a code 200.

If we retry the 200s, it retries until it eventually fails with:

The offset is lower than the number of bytes written. The server has 0 bytes and while XXXX bytes has been uploaded - thus XXXX bytes are missing. Stopping as this could result in data loss. Initiate a new upload to continue.

@benogle benogle added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Apr 8, 2022
@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/nodejs-storage API. label Apr 8, 2022
@shaffeeullah shaffeeullah self-assigned this Apr 8, 2022
@shaffeeullah shaffeeullah added type: question Request for information or clarification. Not an issue. and removed type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Apr 8, 2022
@shaffeeullah
Copy link
Contributor

shaffeeullah commented Apr 8, 2022

Hi @benogle ,

This is happening here

/**
* @return {bool} is the request good?
*/
private onResponse(resp: GaxiosResponse) {
if (
this.retryOptions.retryableErrorFn!({
code: resp.status,
message: resp.statusText,
name: resp.statusText,
})
) {
this.attemptDelayedRetry(resp);
return false;
}
this.emit('response', resp);
return true;
}

We check all responses and if the retry function returns true, we retry the request. I've confirmed that for the default retry function, the 200 response is not retried.

/**
* Returns true if the API request should be retried, given the error that was
* given the first time the request was attempted.
* @const
* @param {error} err - The API error to check if it is appropriate to retry.
* @return {boolean} True if the API request should be retried, false otherwise.
*/
export const RETRYABLE_ERR_FN_DEFAULT = function (err?: ApiError) {
if (err) {
if ([408, 429, 500, 502, 503, 504].indexOf(err.code!) !== -1) {
return true;
}
if (err.errors) {
for (const e of err.errors) {
const reason = e?.reason?.toString().toLowerCase();
if (
(reason && reason.includes('eai_again')) || //DNS lookup error
reason === 'econnreset' ||
reason === 'unexpected connection closure'
) {
return true;
}
}
}
}
return false;
};

The custom function you've passed in here returns true for every type of response, so it will keep attempting to retry the request. Since 200 is a success response, it should never be retried. Is there a reason you're passing in a custom function instead of using the default?

@benogle
Copy link
Author

benogle commented Apr 8, 2022

Since 200 is a success response, it should never be retried.

Right, this is the issue. It feels like the retry function shouldn't be called at all for success responses, and it isn't when resumable: false, which added to the confusion when we started seeing these in the logs. Everything around this indicates it will be called on errors only, the function name, function arg type, etc.

Is there a reason you're passing in a custom function instead of using the default?

Basically to track when it is retrying, what errors it's retrying, and to ensure it did actually retry if the upload results in an error.

For more context, Google storage has been pretty unreliable on upload for us lately. We've been getting a number of 502s and 408s that come out as promise rejections / errors. We had the default retry configuration, but it was unclear if the lib was actually retrying the uploads. After digging into the code, it looks like you possibly need to specify a generation-related param or two to get the lib to actually retry with the default IdempotencyStrategy.RetryConditional. We weren't totally clear how that worked, and it's tough to repeatably test this against a real failure, so we set the strategy to RetryAlways and added the function to log retries for traceability.

@shaffeeullah
Copy link
Contributor

I understand that this can be confusing. Thank you for calling it out and explaining a little bit. I've opened a PR that will prevent an accidental retry of a 200 response.

As for IdempotencyStrategy, the default behavior prevents data corruption by only retrying operations that are idempotent. An upload without a generation number is not idempotent. Retrying this operation could result in data corruption. For more information on preconditions, see the request preconditions documentation page. For more information on which operations are idempotent, see the retry strategy documentation page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/nodejs-storage API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants