Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: don't download 100gb onto local python machine in load test #537

Merged
merged 2 commits into from
Mar 28, 2024

Conversation

milkshakeiii
Copy link
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@milkshakeiii milkshakeiii requested review from a team as code owners March 27, 2024 22:57
@milkshakeiii milkshakeiii requested a review from GarrettWu March 27, 2024 22:57
@product-auto-label product-auto-label bot added size: xs Pull request size is extra small. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Mar 27, 2024
def test_to_pandas_batches_large_table():
df = bpd.read_gbq("load_testing.scalars_100gb")
df = bpd.read_gbq("load_testing.scalars_1gb")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we take a middle value, say 20gb to still represent a large table. I suspect even pandas may be able to handle 1gb.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a middle value would probably work, but I don't think we're really testing hardly anything here other than the download speed of the local machine? Even if it was an error in read_gbq, we wouldn't catch that. The middle layer is the BigQuery python library and then it's Bigframes. I don't quite see what load-related errors we're catching? 1 GB is enough to test the round trip... more seems like just testing downloading, what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding is that it's really the load - more the better. I agree it depends on the size of the VM running the test, but we should test as large as we can.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, alright, well the available middle ground is 10GB, so I'll switch to that. We also have other tests that do 1TB and such in this file, so we should be all set with this change, I think.

@tswast tswast merged commit 082c58b into main Mar 28, 2024
16 checks passed
@tswast tswast deleted the b329261457-100gb-load-test branch March 28, 2024 14:08
ashleyxuu pushed a commit that referenced this pull request Mar 28, 2024
* fix: don't download 100gb onto local python machine in load test

* Update test_large_tables.py
Genesis929 pushed a commit that referenced this pull request Apr 9, 2024
* fix: don't download 100gb onto local python machine in load test

* Update test_large_tables.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: xs Pull request size is extra small.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants