allow customization of an operation's polling interval #13

stephenplusplus · 2017-11-17T18:21:39Z

From @c0b on October 31, 2017 17:48

from the bigquery job api I was only aware the complete event to listen to a job with a callback when job completed, till recently I found from some gist shared code I got that a job.promise() is available, since our application uses node v6 and recently upgraded to v8; the promise api fits the code better, and works with async await model; wonder should you at least document it?
https://googlecloudplatform.github.io/google-cloud-node/#/docs/bigquery/0.9.6/bigquery/job

On the other hand, I spent some time figured out how was this default job.promise() working, I found the call trace down to the Operation's setTimeout self.startPolling of every 500ms, so it's polling at a hard coded interval of 500ms? while in many gcloud products best practices a backing off strategry of retrying is preferred,
https://github.com/GoogleCloudPlatform/google-cloud-node/blob/master/packages/common/src/operation.js#L184

this behavior of polling 500ms may be acceptable (or wanted) for some cases, for our ETL scripts which runs hundreds of query jobs concurrently in BATCH mode is just not so efficient, for this ETL purpose I have a piece of code already in use in production for a long while, implemented the backing off strategry, it supports an optional options obj parameters of waitbegin (default to 500ms) and waitmax (default to 10s)

// looping on a bigquery job till it's 'DONE' or error
//   using a backing off strategry, waiting starts with 500ms,
//     then increase by half till the max is 10s
function waitJobFinish(job, {waitbegin=500, waitmax=10000, initialState='UNKNOWN'} = {}) {
  return new Promise((fulfilled, rejected) =>
    function loop(retries=0, wait=waitbegin, state=initialState) {
      job.getMetadata()
        .catch(rejected)
        .then(([ metadata, apiResponse ]) => {
          if (metadata.status.state !== state) {
            console.log(`Job ${metadata.id} state transit from ${state} to ${metadata.status.state}, at ${(new Date).toJSON()} after ${retries} retries check job status.`);
            state = metadata.status.state;
          }

          if (metadata.status.errorResult)
            return rejected(metadata.status.errorResult);

          if (metadata.status.state === 'DONE')
            return fulfilled([ metadata, apiResponse, retries ]);

          setTimeout(loop, wait, retries+1, Math.min(waitmax, (wait+=wait/2)), state);
        });
    }() // (0, waitbegin, 'UNKNOWN')
  );
}

so with this API, it's similar to job.promise() we can write code like this, but internally it's doing a backing off strategy of retrying retrieve metadata;

  bigquery.startQuery({
    query: '...',
    // more options
  })
  .then(([ job ]) => waitJobFinish(job))
  .then(([ metadata, apiResponse, retries ]) => { ... })

or with async await

  // in an async function
  const [ job ] = await bigquery.startQuery(...);
  const [ metadata, apiResponse, retries ] = await waitJobFinish(job);
  // ...

the console.log lines give us transparency of how healthy each job runs, state transition from 'PENDING' to 'RUNNING' to 'DONE'

I'm not sure this strategy can be in the Operation for all the @google-cloud/... packages, but at least works for bigquery job; let me know if you like the code.

Copied from original issue: googleapis/google-cloud-node#2710

The text was updated successfully, but these errors were encountered:

stephenplusplus · 2017-11-17T18:21:40Z

From @callmehiphop on November 1, 2017 17:54

@stephenplusplus are there any reasons why we hardcode the polling interval and/or don't use a back off in operations?

stephenplusplus · 2017-11-17T18:21:40Z

I think that makes sense, although I can't find any official recommendations from the Google API docs. Do you know of any @c0b, or best practices in general for what should be our default?

Related: This was brought up in #1886, and discarded because GAPIC APIs already do backoff and allow configuration of retry parameters. Maybe we should look at what they do and copy it here.

stephenplusplus · 2017-11-17T18:21:40Z

From @c0b on November 3, 2017 17:19

I did a search "best practices backing off strategy site:cloud.google.com" got this doc on storage, and indeed saw more places (like errors trouble shooting guide) mentioned the exponential backoff strategy

https://cloud.google.com/storage/docs/exponential-backoff

If you get an error, use exponential backoff to avoid problems due to large traffic bursts.

another reason we need this is the bigquery API requests limit is 100/s isn't that big, my client side ETL script is constantly doing some hundreds of BATCH mode query, that could fill up the 100/s very soon get rateLimitError if use the job.promise()

https://cloud.google.com/bigquery/quotas

API requests per second, per user: If you make more than 100 requests per second, throttling might occur.

stephenplusplus · 2017-11-17T18:21:41Z

From @c0b on November 3, 2017 17:31

the (wait+=wait/2) isn't exactly exponential backoff described in the storage/docs/exponential-backoff link above, it is (wait*=1.5)
but easy to change to Math.min(maxwait, wait*=2)

stephenplusplus · 2017-11-17T18:21:41Z

The only thing I'm stuck on is we are talking about polling as opposed to error handling. We aren't retrying in the traditional sense, where the first request failed, so we must try again. We are only making a follow-up request because the job didn't complete yet. That's why I'm drawn to the constant interval, although I'm completely fine going with another value than 500ms, and/or allowing it to be configurable.

I'm also fine with introducing the backoff strategy that GAPIC uses, since it's probably safe to assume if they're doing it, it's the right thing for us to do. These are the options (source):

Name	Type	Description
initialRetryDelayMillis	number	the initial delay time, in milliseconds, between the completion of the first failed request and the initiation of the first retrying request.
retryDelayMultiplier	number	the multiplier by which to increase the delay time between the completion of failed requests, and the initiation of the subsequent retrying request.
maxRetryDelayMillis	number	the maximum delay time, in milliseconds, between requests. When this value is reached, retryDelayMultiplier will no longer be used to increase delay time.
initialRpcTimeoutMillis	number	the initial timeout parameter to the request.
rpcTimeoutMultiplier	number	the multiplier by which to increase the timeout parameter between failed requests.
maxRpcTimeoutMillis	number	the maximum timeout parameter, in milliseconds, for a request. When this value is reached, rpcTimeoutMultiplier will no longer be used to increase the timeout.
totalTimeoutMillis	number	the total time, in milliseconds, starting from when the initial request is sent, after which an error will be returned, regardless of the retrying attempts made meanwhile.

These are the default settings for longrunning operations (source):

{
  initialRetryDelayMillis: 100,
  retryDelayMultiplier: 1.3,
  maxRetryDelayMillis: 60000,
  initialRpcTimeoutMillis: null,
  rpcTimeoutMultiplier: null,
  maxRpcTimeoutMillis: null,
  totalTimeoutMillis: null
}

@callmehiphop WDYT?

stephenplusplus · 2017-11-17T18:21:41Z

From @callmehiphop on November 3, 2017 18:37

I think that's fine, should we make a setter of some kind to allow users to configure these options?

operation
  .setPollingOptions({ retryDelayMultiplier: 2 })
  .on('error', console.errror)
  .on('complete', () => console.log('yay'));

?

stephenplusplus · 2017-11-17T18:21:42Z

That sounds nice, how would we handle the methods that handle job polling internally, e.g. https://github.com/GoogleCloudPlatform/google-cloud-node/blob/7160d17a3b3c0f3af4bff900cc9ac64831564498/packages/bigquery/src/table.js#L377? Accept a new option, like metadata.backoffSettings?

callmehiphop · 2018-04-03T22:40:20Z

This got lost in my notifications, but I guess we could either make it an option that is sent in upfront or we could direct users that need a finer level of control to use the lower level methods.

meredithslota · 2022-04-11T21:02:20Z

@steffnay I haven't seen any users requesting this recently — do you think this is still necessary or should we just close it out?

aminebeh · 2022-10-05T09:51:08Z

Hey I'm looking for a feature like this !

In addition, it would be great if we can add an event listener on the 'poll' event (to track and report progression)

stephenplusplus added api: bigquery type: question Request for information or clarification. Not an issue. labels Nov 17, 2017

stephenplusplus mentioned this issue Nov 17, 2017

bigquery job.promise() is not documented? and always polling at a fixed interval of 500ms? googleapis/google-cloud-node#2710

Closed

stephenplusplus removed the api: bigquery label Dec 12, 2017

stephenplusplus added type: enhancement and removed type: question Request for information or clarification. Not an issue. labels Oct 1, 2018

stephenplusplus changed the title ~~bigquery job.promise() is not documented? and always polling at a fixed interval of 500ms?~~ allow customization of an operation's polling interval Oct 1, 2018

JustinBeckwith added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed type: enhancement labels Dec 22, 2018

google-cloud-label-sync bot added the api: bigquery Issues related to the googleapis/nodejs-bigquery API. label Jan 30, 2020

meredithslota assigned steffnay Apr 11, 2022

steffnay removed their assignment Jul 8, 2022

loferris self-assigned this Aug 23, 2023

shollyman unassigned loferris Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allow customization of an operation's polling interval #13

allow customization of an operation's polling interval #13

stephenplusplus commented Nov 17, 2017

stephenplusplus commented Nov 17, 2017

stephenplusplus commented Nov 17, 2017

stephenplusplus commented Nov 17, 2017

stephenplusplus commented Nov 17, 2017

stephenplusplus commented Nov 17, 2017

stephenplusplus commented Nov 17, 2017

stephenplusplus commented Nov 17, 2017

callmehiphop commented Apr 3, 2018

meredithslota commented Apr 11, 2022

aminebeh commented Oct 5, 2022

allow customization of an operation's polling interval #13

allow customization of an operation's polling interval #13

Comments

stephenplusplus commented Nov 17, 2017

stephenplusplus commented Nov 17, 2017

stephenplusplus commented Nov 17, 2017

stephenplusplus commented Nov 17, 2017

stephenplusplus commented Nov 17, 2017

stephenplusplus commented Nov 17, 2017

stephenplusplus commented Nov 17, 2017

stephenplusplus commented Nov 17, 2017

callmehiphop commented Apr 3, 2018

meredithslota commented Apr 11, 2022

aminebeh commented Oct 5, 2022