-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BigQuery: add --max_results
option to magic
#9169
BigQuery: add --max_results
option to magic
#9169
Conversation
I just merged #9167. We might have to rebase. |
All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter. We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the ℹ️ Googlers: Go here for more info. |
23fb140
to
37b4d39
Compare
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd actually love to see a system / integration test for this feature, but I know we don't have any examples for you to work off of for that. (I'll see what I can do about that, since I know this will come up again in the future.)
Since we don't have a system for notebook integration tests in this repo, I'd like to see the results of manually testing this feature in a notebook.
@@ -300,7 +300,7 @@ def _run_query(client, query, job_config=None): | |||
while True: | |||
print("\rQuery executing: {:0.2f}s".format(time.time() - start_time), end="") | |||
try: | |||
query_job.result(timeout=0.5) | |||
query_job.result(timeout=0.5, max_results=max_results) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we aren't actually returning the results in this line (just waiting for the query to finish), we don't need to pass max_results
here. I believe that means we can remove the max_results
parameter from _run_query
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in f88ffe8
"Defaults to returning all rows." | ||
), | ||
) | ||
@magic_arguments.argument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like these arguments got doubled up. (Maybe the two commits thing, we noticed when rebasing?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's probably what happened (moral of the story, always run tests before pushing). Removed the duplicate in 147e43d
@@ -414,7 +414,7 @@ def test_bigquery_magic_with_legacy_sql(): | |||
with run_query_patch as run_query_mock: | |||
ip.run_cell_magic("bigquery", "--use_legacy_sql", "SELECT 17 AS num") | |||
|
|||
job_config_used = run_query_mock.call_args_list[0][0][-1] | |||
job_config_used = run_query_mock.call_args_list[0][1]["job_config"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this change intentional? Maybe a bad rebase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was intentional. It was to fix some test failures I was getting because the argument that was being retrieved was the wrong type (a result of changing the method signature for _run_query
). This way the arg being accessed will always be job_config
, even if other named parameters are added. Since I reverted the changes to _run_query
, I can probably change this back too. I think the tests will pass either way
@@ -433,7 +455,9 @@ def _cell_magic(line, query): | |||
|
|||
error = None | |||
try: | |||
query_job = _run_query(client, query, job_config) | |||
query_job = _run_query( | |||
client, query, job_config=job_config, max_results=max_results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_results
is not actually needed here, it's not until the call to to_dataframe
that we need max_results
.
Note: to_dataframe
probably doesn't have a max_results
argument, and I'm actually not certain that we'd want to add one. Instead, we can call query_job.results
with a max_results
argument and then call to_dataframe
on the resulting RowIterator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated to call to_dataframe
on query_job.result
, passing max_results
as an argument if max_results
is present in f88ffe8
* added max_results magic option and fixed broken tests * added tests for --max_results magic option * added max_results magic option and fixed broken tests * added tests for --max_results magic option * Removed duplicate `--max_results` magic argument * removed max_results param from run_query, updated tests
* added max_results magic option and fixed broken tests * added tests for --max_results magic option * added max_results magic option and fixed broken tests * added tests for --max_results magic option * Removed duplicate `--max_results` magic argument * removed max_results param from run_query, updated tests
Second of 3 PRs towards resolving #9105 as described in review for #9147