-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: use the first page a results when query(api_method="QUERY")
#1723
Conversation
…_is_completely_cached
I'm not sure I understand. Could you explain why doing this will improve query performance, and only applies when |
Great question @Linchin. There are two APIs for issuing a query in BigQuery: jobs.insert and jobs.query. There are some key differences between these APIs.
This PR does a few things: (1) if the job has finished, don't make the unnecessary call to |
…_is_completely_cached
# This also requires updates to `to_dataframe` and the DB API connector | ||
# so that they don't try to read from a destination table if all the | ||
# results are present. | ||
query_job._query_results = google.cloud.bigquery.query._QueryResults( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this change mean we also load the query results when job is complete, when query(api_method="INSERT")
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I just double-checked that this method is only called from query_jobs_query
, so it won't affect when api_method="INSERT"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for helping me understand this PR :)
…oogleapis#1723) * perf: use the first page a results when `query(api_method="QUERY")` * add tests * respect max_results with cached page * respect page_size, also avoid bqstorage if almost fully downloaded * skip true test if bqstorage not installed * coverage
…nto pandas extra (#1726) * feat: Introduce compatibility with native namespace packages * Update copyright year * removed pkg_resources from all test files and moved importlib into pandas extra * feat: removed pkg_resources from all test files and moved importlib into pandas extra * Adding no cover tag to test code * reformatted with black * undo revert * perf: use the first page a results when `query(api_method="QUERY")` (#1723) * perf: use the first page a results when `query(api_method="QUERY")` * add tests * respect max_results with cached page * respect page_size, also avoid bqstorage if almost fully downloaded * skip true test if bqstorage not installed * coverage * fix: ensure query job retry has longer deadline than API request deadline (#1734) In cases where we can't disambiguate API failure from job failure, this ensures we can still retry the job at least once. * fix: `load_table_from_dataframe` now assumes there may be local null values (#1735) Even if the remote schema is REQUIRED Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly: - [ ] Make sure to open an issue as a [bug/issue](https://togithub.com/googleapis/python-bigquery/issues/new/choose) before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea - [ ] Ensure the tests and linter pass - [ ] Code coverage does not decrease (if any source code was changed) - [ ] Appropriate docs were updated (if necessary) Fixes #1692 🦕 * chore: standardize samples directory - delete unneeded dependencies (#1732) * chore: standardize samples directory = delete unneeded dependencies * Removed unused import for linter * fix: move grpc, proto-plus and protobuf packages to extras (#1721) * chore: move grpc, proto-plus and protobuff packages to extras * formatted with black * feat: add `job_timeout_ms` to job configuration classes (#1675) * fix: adds new property and tests * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * updates docs to correct a sphinx failure * Updates formatting * Update tests/system/test_query.py * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * Update google/cloud/bigquery/job/base.py * updates one test and uses int_or_none * Update tests/system/test_query.py testing something. * Update tests/system/test_query.py * testing coverage feature * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * minor edits * tweaks to noxfile for testing purposes * add new test to base as experiment * adds a test, updates import statements * add another test * edit to tests * formatting fixes * update noxfile to correct debug code * removes unneeded comments. --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> --------- Co-authored-by: Chalmer Lowe <[email protected]> Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> Co-authored-by: Tim Swast <[email protected]> * remove unnecessary version checks * undo bad commit, remove unneeded version checks * Revert "undo bad commit, remove unneeded version checks" This reverts commit 5c82dcf. * Revert "remove unnecessary version checks" This reverts commit 9331a7e. * revert bad changes, remove pkg_resources from file * after clarification, reimplement changes and ignore 3.12 tests * reformatted with black * removed minimum check * updated pandas installed version check --------- Co-authored-by: Anthonios Partheniou <[email protected]> Co-authored-by: Tim Swast <[email protected]> Co-authored-by: Chalmer Lowe <[email protected]> Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
This is a restoration of part of #374
TODO:
jobs.getQueryResults
orjobs.get
calls happen when iterating over rows that are fully returned fromjobs.query
.to_pandas
andto_arrow
work with this code path.Closes #589 🦕