-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add default timeout for Client.get_job() #1935
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some tests where *Job.reload(timeout=None)
, *Job.done(timeout=None)
, and *Job.result(timeout=None)
disable the timeouts for jobs.get
.
google/cloud/bigquery/job/base.py
Outdated
extra_params["location"] = self.location | ||
span_attributes = {"path": self.path} | ||
kwargs: Dict[str, Any] = {} | ||
if type(timeout) is object: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add some comments explaining why we need to omit the timeout if we get _DEFAULT_VALUE
. (Because we want to use the default API-level timeouts but wait indefinitely for the job to finish)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added explanation why kwargs
is used here, but I feel like Queryjob.result()
is a better place to explain the default timeout behavior, so I added it in docstring there instead.
google/cloud/bigquery/job/base.py
Outdated
@@ -913,7 +922,7 @@ def _set_future_result(self): | |||
def done( | |||
self, | |||
retry: "retries.Retry" = DEFAULT_RETRY, | |||
timeout: Optional[float] = None, | |||
timeout: Optional[Union[float, object]] = PollingFuture._DEFAULT_VALUE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done isn't waiting for the job to complete, so I wonder if we actually need PollingFuture._DEFAULT_VALUE
here?
It seems more analogous to Client.get_job
so maybe we can use the same DEFAULT_GET_JOB_TIMEOUT,
here? 🤔 We need to double-check that we aren't calling any other API requests like getQueryResults, if so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done()
calls reload()
, which in turn calls Client.get_job()
. So I added the sentinel values along the path result()
-> done()
-> reload()
. Indeed I realize we don't have to add sentinel for done()
or reload()
, because it will be passed from result()
anyway. As to getQueryResults, I don't think it's called from done()
onward (it's used in result()
). I think either way (using PollingFuture._DEFAULT_VALUE
or DEFAULT_GET_JOB_TIMEOUT
) it's effectively the same, so I think either way it's fine. Are there any other factors I'm not aware of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep our type annotations as narrow as we can. My worry with PollingFuture._DEFAULT_VALUE
is it opens it up to object
, which technically is anything (string, bytes, anything) in Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I ended up only using sentinel for result()
, and used DEFAULT_GET_JOB_TIMEOUT
for subsequent calls.
google/cloud/bigquery/job/query.py
Outdated
if type(timeout) is object: | ||
timeout = None | ||
get_job_timeout = PollingFuture._DEFAULT_VALUE | ||
else: | ||
get_job_timeout = timeout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not do the kwargs trick here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's necessary - the purpose here is slightly different from when calling client.get_job()
. I'm only preserving the sentinel value for the job.done()
call, and use None as default for any other calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, following later comments
google/cloud/bigquery/job/base.py
Outdated
@@ -801,7 +801,7 @@ def reload( | |||
self, | |||
client=None, | |||
retry: "retries.Retry" = DEFAULT_RETRY, | |||
timeout: Optional[float] = None, | |||
timeout: Optional[Union[float, object]] = POLLING_DEFAULT_VALUE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd much rather this defaulted to the same timeout as get_job()
and anywhere that calls reload()
, we make sure we do the check for POLLING_DEFAULT_VALUE
before calling reload()
to make sure we only pass in float
or None
or omit the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
google/cloud/bigquery/job/base.py
Outdated
@@ -913,7 +924,7 @@ def _set_future_result(self): | |||
def done( | |||
self, | |||
retry: "retries.Retry" = DEFAULT_RETRY, | |||
timeout: Optional[float] = None, | |||
timeout: Optional[Union[float, object]] = POLLING_DEFAULT_VALUE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. I don't think this needs to handle POLLING_DEFAULT_VALUE
since it's not waiting for the job to finish, just making at most 1 API call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
google/cloud/bigquery/job/query.py
Outdated
# timeout to QueryJob.done(), and use None for the other timeouts. | ||
if type(timeout) is object: | ||
timeout = None | ||
get_job_timeout = POLLING_DEFAULT_VALUE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not set to DEFAULT_GET_JOB_TIMEOUT
or do the kwargs trick to omit it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Co-authored-by: Tim Sweña (Swast) <[email protected]>
Co-authored-by: Tim Sweña (Swast) <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it! Just a couple of typos to fix please before merging
When calling
QueryJob.result()
,QueryJob.reload()
is called in the background with an empty timeout value. This leadsQueryJob
to waiting indefinitely ifQueryJob.reload()
loses connection.This PR:
QueryJob.reload()
to callClient.get_job()
instead of calling API directlyClient.get_job()
Fixes #1922 🦕