Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QueryJob.result() not identifying and returning after submitted query completes #1922

Closed
dlstadther opened this issue May 16, 2024 · 7 comments · Fixed by #1935
Closed

QueryJob.result() not identifying and returning after submitted query completes #1922

dlstadther opened this issue May 16, 2024 · 7 comments · Fixed by #1935
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@dlstadther
Copy link

(this issue was first taken to a Google employee; they recommended this bug report be submitted here too)

Expected Behavior

QueryJob.result() always returns when a submitted query completes

Issue

At random and unexplainable frequencies, the QueryJob.result() runs indefinitely even after the submitted job_id is shown as completed in BigQuery's Job History.

Anecdotical observation which motivated this outreach:

  • On 2024-04-29 17:16:20 UTC, job_id "A" was submitted to BigQuery.
  • On 2024-05-06, we identified the hung python process, observed through its logs that it was waiting for job.result() to complete.
  • In the BigQuery console, the Job History for"A" shows that the query was submitted on 2024-04-29 at 17:16:20 UTC and completed ~2 minutes later on 2024-04-29 at 17:18:26 UTC.
  • The python process executing job.result() required manual termination over 6 days later on 2024-05-06.

We are implementing process-level timeouts to prevent this specific issue from running indefinitely, but this is a bandaid solution to a bug in the google-cloud-bigquery python package.

Environment details

  • OS type and version: Debian Bookworm
  • Python version: 3.11.4
  • pip version: 24.0
  • google-cloud-bigquery version: 3.21.0

Steps to reproduce

We are unable to deterministically reproduce this issue.

Code example

Simplified code example depicting the objects and methods used to

import uuid
from google.cloud import bigquery as bq

project_id = "dummy-project"
client = bq.Client(project=project_id)

job_id = str(uuid.uuid4())
sql = "select 1;"  # fake query

job_config = bq.QueryJobConfig(priority=bq.enums.QueryPriority.BATCH)
job = client.query(
    sql,
    job_config=job_config,
    api_method=bq.enums.QueryApiMethod.INSERT,  # this necessary to specify the job_id
    job_id=job_id
)
print(f"Submitting query: {job_id}")
result = job.result()  # run job and wait

# do other things with the result once completed
# sometimes this code is never reached, nor any error raised
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label May 16, 2024
@dlstadther
Copy link
Author

Perhaps googleapis/google-cloud-python#7831 is related?

@Linchin Linchin added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p3 Desirable enhancement or fix. May not be included in next release. labels May 21, 2024
@Linchin
Copy link
Contributor

Linchin commented May 21, 2024

Thank you @dlstadther for raising the issue! I will try to reproduce the issue by running the program on repeat, but it seems really hard to reproduce deterministically. Do you have any local log info when the program got stuck? That will make it easier to pin point what went wrong.

@tswast
Copy link
Contributor

tswast commented May 21, 2024

Definitely sounds like googleapis/google-cloud-python#7831 (comment)

At one time we had some default client-side timeouts to make the client more resilient to this sort of thing, but it's really difficult to pick a default that works for all APIs in BigQuery. Maybe a default timeout just for jobs.get could solve this one?

@Linchin Linchin assigned Linchin and unassigned farhan0102 May 21, 2024
@dlstadther
Copy link
Author

@Linchin , we have application info-level logging, but nothing specific to the bigquery client. I log statement occurs just before job.result() (telling us the custom job_id that will be submitted), we can see in the BigQuery console that the job_id exists and the query completed, but the application log immediately following the job.result() does not get emitted.

I don't have hard numbers easily accessible regarding frequency or percentage of occurance, but it is seemingly rare.


@tswast , is there anything we can do to see your suggestion implemented in an upcoming release version?

Would your proposal of a default client-side timeout behave similar to an http request timeout (how long to wait for a response) regardless of query completion state, or like the user specifying job.result(timeout=...) where the query result is expected to be completed within a duration? The former would be preferred for a general default.

@tswast
Copy link
Contributor

tswast commented May 23, 2024

Would your proposal of a default client-side timeout behave similar to an http request timeout (how long to wait for a response) regardless of query completion state, or like the user specifying job.result(timeout=...) where the query result is expected to be completed within a duration? The former would be preferred for a general default.

My proposal is for an HTTP request timeout.

For query jobs, they could last several days if they are multi-job scripts or BQML jobs, so wouldn't make sense to me to do a default there.

That said, if you do know your query will complete in a certain amount of time, we do turn the overall timeout into an HTTP request timeout, so it would prevent things from getting stuck if you have an idea of how long the query should take.

@tswast
Copy link
Contributor

tswast commented May 23, 2024

is there anything we can do to see your suggestion implemented in an upcoming release version?

I'm actively working with my teammates to get this implemented.

@dlstadther
Copy link
Author

Thanks you @tswast and @Linchin for promptly working to address this issue and already releasing a new public version which includes the fix!

We will be upgrading our environments and monitoring to ensure this issue is gone. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants