fix: don't download 100gb onto local python machine in load test #537

milkshakeiii · 2024-03-27T22:57:41Z

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

shobsi · 2024-03-27T23:02:29Z

tests/system/load/test_large_tables.py

 def test_to_pandas_batches_large_table():
-    df = bpd.read_gbq("load_testing.scalars_100gb")
+    df = bpd.read_gbq("load_testing.scalars_1gb")


can we take a middle value, say 20gb to still represent a large table. I suspect even pandas may be able to handle 1gb.

I think a middle value would probably work, but I don't think we're really testing hardly anything here other than the download speed of the local machine? Even if it was an error in read_gbq, we wouldn't catch that. The middle layer is the BigQuery python library and then it's Bigframes. I don't quite see what load-related errors we're catching? 1 GB is enough to test the round trip... more seems like just testing downloading, what do you think?

my understanding is that it's really the load - more the better. I agree it depends on the size of the VM running the test, but we should test as large as we can.

Hm, alright, well the available middle ground is 10GB, so I'll switch to that. We also have other tests that do 1TB and such in this file, so we should be all set with this change, I think.

* fix: don't download 100gb onto local python machine in load test * Update test_large_tables.py

fix: don't download 100gb onto local python machine in load test

ad24e69

milkshakeiii requested review from a team as code owners March 27, 2024 22:57

milkshakeiii requested a review from GarrettWu March 27, 2024 22:57

product-auto-label bot added size: xs Pull request size is extra small. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Mar 27, 2024

shobsi reviewed Mar 27, 2024

View reviewed changes

Update test_large_tables.py

f9efcb0

shobsi approved these changes Mar 28, 2024

View reviewed changes

tswast approved these changes Mar 28, 2024

View reviewed changes

tswast merged commit 082c58b into main Mar 28, 2024
16 checks passed

tswast deleted the b329261457-100gb-load-test branch March 28, 2024 14:08

release-please bot mentioned this pull request Mar 28, 2024

chore(main): release 1.1.0 #509

Merged

ashleyxuu pushed a commit that referenced this pull request Mar 28, 2024

fix: don't download 100gb onto local python machine in load test (#537)

6b03583

* fix: don't download 100gb onto local python machine in load test * Update test_large_tables.py

Genesis929 pushed a commit that referenced this pull request Apr 9, 2024

fix: don't download 100gb onto local python machine in load test (#537)

5c1c11d

* fix: don't download 100gb onto local python machine in load test * Update test_large_tables.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: don't download 100gb onto local python machine in load test #537

fix: don't download 100gb onto local python machine in load test #537

milkshakeiii commented Mar 27, 2024

shobsi Mar 27, 2024

milkshakeiii Mar 27, 2024

shobsi Mar 28, 2024

milkshakeiii Mar 28, 2024

fix: don't download 100gb onto local python machine in load test #537

fix: don't download 100gb onto local python machine in load test #537

Conversation

milkshakeiii commented Mar 27, 2024

shobsi Mar 27, 2024

Choose a reason for hiding this comment

milkshakeiii Mar 27, 2024

Choose a reason for hiding this comment

shobsi Mar 28, 2024

Choose a reason for hiding this comment

milkshakeiii Mar 28, 2024

Choose a reason for hiding this comment