-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: don't download 100gb onto local python machine in load test #537
Conversation
def test_to_pandas_batches_large_table(): | ||
df = bpd.read_gbq("load_testing.scalars_100gb") | ||
df = bpd.read_gbq("load_testing.scalars_1gb") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we take a middle value, say 20gb to still represent a large table. I suspect even pandas may be able to handle 1gb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a middle value would probably work, but I don't think we're really testing hardly anything here other than the download speed of the local machine? Even if it was an error in read_gbq, we wouldn't catch that. The middle layer is the BigQuery python library and then it's Bigframes. I don't quite see what load-related errors we're catching? 1 GB is enough to test the round trip... more seems like just testing downloading, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my understanding is that it's really the load - more the better. I agree it depends on the size of the VM running the test, but we should test as large as we can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, alright, well the available middle ground is 10GB, so I'll switch to that. We also have other tests that do 1TB and such in this file, so we should be all set with this change, I think.
* fix: don't download 100gb onto local python machine in load test * Update test_large_tables.py
* fix: don't download 100gb onto local python machine in load test * Update test_large_tables.py
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕