Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make QueryJob.done() method more performant #544

Merged
merged 1 commit into from
Mar 10, 2021

Conversation

plamut
Copy link
Contributor

@plamut plamut commented Mar 9, 2021

Fixes #534.

This PR removes refreshing query results from the QueryJob.done() method, the latter is now just the done() method inherited from the _AsyncJob base class that at most reloads the job itself and checks if its state is DONE.

Since blocking poll from the PollingFuture base class repeatedly invokes done(), the change would cause too many job reload requests while waiting for the query results. The QueryJob class thus overrides the _done_or_raise() method repeatedly used by the blocking poll so that the polling is actually performed by fetching the query results. The latter call can block for up to 10 seconds, meaning that fewer polling requests are made than if reload the job was used.

How to test

Set logging level to DEBUG to see what HTTP requests are made. Then run a query job that normally takes more than 10 seconds to complete, for example:

SELECT
CONCAT(
    'https://stackoverflow.com/questions/',
    CAST(id as STRING)) as url,
view_count, 1 AS foo
FROM `bigquery-public-data.stackoverflow.posts_questions`"""
"""
ORDER BY view_count DESC

Tip: If running the query in multiple test runs in a row, change 1 AS foo to a different value so that the query is re-run and cached query results are not used)

While the query is running, call query_job.result() - query results should be fetched, but with a reasonable amount of requests.

Beside testing the .result() method, the .done() method should be checked, too - if it is run repeatedly while the query is running, each call should finish "fast", i.e. it should block for significantly less time than 10 seconds, because all it needs to do is to reload the job data itself.

PR checklist:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

@plamut plamut added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Mar 9, 2021
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Mar 9, 2021
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Mar 9, 2021
@plamut plamut removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Mar 10, 2021
@plamut plamut marked this pull request as ready for review March 10, 2021 14:09
@plamut plamut requested review from a team, stephaniewang526 and tswast and removed request for a team March 10, 2021 14:09
Copy link
Contributor

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magnificent!

@tswast tswast merged commit a3ab9ef into googleapis:master Mar 10, 2021
@plamut plamut deleted the iss-534 branch March 10, 2021 20:52
This was referenced Mar 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve performance of QueryJob.done() method
2 participants