Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add samples from bigquery_storage/to_dataframe #50

Merged
merged 43 commits into from
Sep 10, 2020

Conversation

plamut
Copy link
Contributor

@plamut plamut commented Sep 2, 2020

Samples from https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/bigquery_storage/to_dataframe


IMPORTANT: When merging, use the rebase and merge option to preserve the samples commit history!


Closes #51.

tswast and others added 30 commits February 7, 2019 10:53
…ogleCloudPlatform/python-docs-samples#1994)

* BigQuery Storage API sample for reading pandas dataframe

How to get a pandas DataFrame, fast!

The first two examples use the existing BigQuery client. These examples
create a thread pool and read in parallel. The final example shows using
just the new BigQuery Storage client, but only shows how to read with a
single thread.
…gleCloudPlatform/python-docs-samples#2088)

* Remove temporary dataset from bqstorage pandas tutorial

As of google-cloud-bigquery version 1.11.1, the `to_dataframe` method
will fallback to the tabledata.list API when the BigQuery Storage API
fails to read the query results.

* Remove unused imports
…oogleCloudPlatform/python-docs-samples#2087)

* Add magics tutorial with BigQuery Storage API integration.

This is a notebooks tutorial, modeled after the Jupyter notebook example
code for BigQuery. Use some caution when running these tests, as they
run some large-ish (5 GB processed) queries and download about 500 MB
worth of data. This is intentional, as the BigQuery Storage API is most
useful for downloading large results.

* Update deps.

* Don't run big queries on Travis.
…ocs-samples#2436)

* Adds updates including compute

* Python 2 compat pytest

* Fixing weird \r\n issue from GH merge

* Put asset tests back in

* Re-add pod operator test

* Hack parameter for k8s pod operator
…amples#2005)

* Auto-update dependencies.

* Revert update of appengine/flexible/datastore.

* revert update of appengine/flexible/scipy

* revert update of bigquery/bqml

* revert update of bigquery/cloud-client

* revert update of bigquery/datalab-migration

* revert update of bigtable/quickstart

* revert update of compute/api

* revert update of container_registry/container_analysis

* revert update of dataflow/run_template

* revert update of datastore/cloud-ndb

* revert update of dialogflow/cloud-client

* revert update of dlp

* revert update of functions/imagemagick

* revert update of functions/ocr/app

* revert update of healthcare/api-client/fhir

* revert update of iam/api-client

* revert update of iot/api-client/gcs_file_to_device

* revert update of iot/api-client/mqtt_example

* revert update of language/automl

* revert update of run/image-processing

* revert update of vision/automl

* revert update testing/requirements.txt

* revert update of vision/cloud-client/detect

* revert update of vision/cloud-client/product_search

* revert update of jobs/v2/api_client

* revert update of jobs/v3/api_client

* revert update of opencensus

* revert update of translate/cloud-client

* revert update to speech/cloud-client

Co-authored-by: Kurtis Van Gent <[email protected]>
Co-authored-by: Doug Mahugh <[email protected]>
…0 [(#3050)](GoogleCloudPlatform/python-docs-samples#3050)

* chore(deps): update dependency google-cloud-bigquery-storage to v0.8.0

* chore(deps): update pandas-gbq

* chore(deps): update ipython

* chore: update requirements.txt

* chore: it is spelled version.

* chore(deps): split pandas version

* chore(deps): split pandas version

Co-authored-by: Christopher Wilcox <[email protected]>
Co-authored-by: Leah Cole <[email protected]>
…ples#2806)

* chore(deps): update dependency requests to v2.23.0

* Simplify noxfile and add version control.

* Configure appengine/standard to only test Python 2.7.

* Update Kokokro configs to match noxfile.

* Add requirements-test to each folder.

* Remove Py2 versions from everything execept appengine/standard.

* Remove conftest.py.

* Remove appengine/standard/conftest.py

* Remove 'no-sucess-flaky-report' from pytest.ini.

* Add GAE SDK back to appengine/standard tests.

* Fix typo.

* Roll pytest to python 2 version.

* Add a bunch of testing requirements.

* Remove typo.

* Add appengine lib directory back in.

* Add some additional requirements.

* Fix issue with flake8 args.

* Even more requirements.

* Readd appengine conftest.py.

* Add a few more requirements.

* Even more Appengine requirements.

* Add webtest for appengine/standard/mailgun.

* Add some additional requirements.

* Add workaround for issue with mailjet-rest.

* Add responses for appengine/standard/mailjet.

Co-authored-by: Renovate Bot <[email protected]>
…49)](GoogleCloudPlatform/python-docs-samples#3049)

* chore(deps): update dependency google-cloud-bigquery to v1.24.0

* chore(deps): update ipython version

* fix: fix requirements order

* explicitly add grpc to resolve errors

* adjust arguments

* undo mistake

* bump auth version

Co-authored-by: Bu Sun Kim <[email protected]>
Co-authored-by: Leah Cole <[email protected]>
Co-authored-by: Leah E. Cole <[email protected]>
Co-authored-by: Christopher Wilcox <[email protected]>
GoogleCloudPlatform/python-docs-samples#2797)

* chore(deps): update dependency google-auth-oauthlib to v0.4.1

* resolve dependency finding errors

* fix new matplotlib error

Co-authored-by: Leah E. Cole <[email protected]>
Co-authored-by: Leah Cole <[email protected]>
Co-authored-by: Bu Sun Kim <[email protected]>
Co-authored-by: Christopher Wilcox <[email protected]>
…eCloudPlatform/python-docs-samples#3464)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [google-auth](https://togithub.com/googleapis/google-auth-library-python) | patch | `==1.14.0` -> `==1.14.1` |
| [google-auth](https://togithub.com/googleapis/google-auth-library-python) | minor | `==1.11.2` -> `==1.14.1` |

---

### Release Notes

<details>
<summary>googleapis/google-auth-library-python</summary>

### [`v1.14.1`](https://togithub.com/googleapis/google-auth-library-python/blob/master/CHANGELOG.md#&#8203;1141-httpswwwgithubcomgoogleapisgoogle-auth-library-pythoncomparev1140v1141-2020-04-21)

[Compare Source](https://togithub.com/googleapis/google-auth-library-python/compare/v1.14.0...v1.14.1)

</details>

---

### Renovate configuration

:date: **Schedule**: At any time (no schedule defined).

:vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

:recycle: **Rebasing**: Never, or you tick the rebase/retry checkbox.

:no_bell: **Ignore**: Close this PR and you won't be reminded about these updates again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#GoogleCloudPlatform/python-docs-samples).
…eCloudPlatform/python-docs-samples#3724)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [google-auth](https://togithub.com/googleapis/google-auth-library-python) | patch | `==1.14.1` -> `==1.14.2` |

---

### Release Notes

<details>
<summary>googleapis/google-auth-library-python</summary>

### [`v1.14.2`](https://togithub.com/googleapis/google-auth-library-python/blob/master/CHANGELOG.md#&#8203;1142-httpswwwgithubcomgoogleapisgoogle-auth-library-pythoncomparev1141v1142-2020-05-07)

[Compare Source](https://togithub.com/googleapis/google-auth-library-python/compare/v1.14.1...v1.14.2)

</details>

---

### Renovate configuration

:date: **Schedule**: At any time (no schedule defined).

:vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

:recycle: **Rebasing**: Never, or you tick the rebase/retry checkbox.

:no_bell: **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#GoogleCloudPlatform/python-docs-samples).
…eCloudPlatform/python-docs-samples#3728)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [google-auth](https://togithub.com/googleapis/google-auth-library-python) | patch | `==1.14.2` -> `==1.14.3` |

---

### Release Notes

<details>
<summary>googleapis/google-auth-library-python</summary>

### [`v1.14.3`](https://togithub.com/googleapis/google-auth-library-python/blob/master/CHANGELOG.md#&#8203;1143-httpswwwgithubcomgoogleapisgoogle-auth-library-pythoncomparev1142v1143-2020-05-11)

[Compare Source](https://togithub.com/googleapis/google-auth-library-python/compare/v1.14.2...v1.14.3)

</details>

---

### Renovate configuration

:date: **Schedule**: At any time (no schedule defined).

:vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

:recycle: **Rebasing**: Never, or you tick the rebase/retry checkbox.

:no_bell: **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [x] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#GoogleCloudPlatform/python-docs-samples).
…oudPlatform/python-docs-samples#4024)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [google-cloud-bigquery](https://togithub.com/googleapis/python-bigquery) | minor | `==1.24.0` -> `==1.25.0` |

---

### Release Notes

<details>
<summary>googleapis/python-bigquery</summary>

### [`v1.25.0`](https://togithub.com/googleapis/python-bigquery/blob/master/CHANGELOG.md#&#8203;1250-httpswwwgithubcomgoogleapispython-bigquerycomparev1240v1250-2020-06-06)

[Compare Source](https://togithub.com/googleapis/python-bigquery/compare/v1.24.0...v1.25.0)

##### Features

-   add BigQuery storage client support to DB API ([#&#8203;36](https://www.github.com/googleapis/python-bigquery/issues/36)) ([ba9b2f8](https://www.github.com/googleapis/python-bigquery/commit/ba9b2f87e36320d80f6f6460b77e6daddb0fa214))
-   **bigquery:** add create job method ([#&#8203;32](https://www.github.com/googleapis/python-bigquery/issues/32)) ([2abdef8](https://www.github.com/googleapis/python-bigquery/commit/2abdef82bed31601d1ca1aa92a10fea1e09f5297))
-   **bigquery:** add support of model for extract job ([#&#8203;71](https://www.github.com/googleapis/python-bigquery/issues/71)) ([4a7a514](https://www.github.com/googleapis/python-bigquery/commit/4a7a514659a9f6f9bbd8af46bab3f8782d6b4b98))
-   add HOUR support for time partitioning interval ([#&#8203;91](https://www.github.com/googleapis/python-bigquery/issues/91)) ([0dd90b9](https://www.github.com/googleapis/python-bigquery/commit/0dd90b90e3714c1d18f8a404917a9454870e338a))
-   add support for policy tags ([#&#8203;77](https://www.github.com/googleapis/python-bigquery/issues/77)) ([38a5c01](https://www.github.com/googleapis/python-bigquery/commit/38a5c01ca830daf165592357c45f2fb4016aad23))
-   make AccessEntry objects hashable ([#&#8203;93](https://www.github.com/googleapis/python-bigquery/issues/93)) ([23a173b](https://www.github.com/googleapis/python-bigquery/commit/23a173bc5a25c0c8200adc5af62eb05624c9099e))
-   **bigquery:** expose start index parameter for query result ([#&#8203;121](https://www.github.com/googleapis/python-bigquery/issues/121)) ([be86de3](https://www.github.com/googleapis/python-bigquery/commit/be86de330a3c3801653a0ccef90e3d9bdb3eee7a))
-   **bigquery:** unit and system test for dataframe with int column with Nan values  ([#&#8203;39](https://www.github.com/googleapis/python-bigquery/issues/39)) ([5fd840e](https://www.github.com/googleapis/python-bigquery/commit/5fd840e9d4c592c4f736f2fd4792c9670ba6795e))

##### Bug Fixes

-   allow partial streaming_buffer statistics ([#&#8203;37](https://www.github.com/googleapis/python-bigquery/issues/37)) ([645f0fd](https://www.github.com/googleapis/python-bigquery/commit/645f0fdb35ee0e81ee70f7459e796a42a1f03210))
-   distinguish server timeouts from transport timeouts ([#&#8203;43](https://www.github.com/googleapis/python-bigquery/issues/43)) ([a17be5f](https://www.github.com/googleapis/python-bigquery/commit/a17be5f01043f32d9fbfb2ddf456031ea9205c8f))
-   improve cell magic error message on missing query ([#&#8203;58](https://www.github.com/googleapis/python-bigquery/issues/58)) ([6182cf4](https://www.github.com/googleapis/python-bigquery/commit/6182cf48aef8f463bb96891cfc44a96768121dbc))
-   **bigquery:** fix repr of model reference ([#&#8203;66](https://www.github.com/googleapis/python-bigquery/issues/66)) ([26c6204](https://www.github.com/googleapis/python-bigquery/commit/26c62046f4ec8880cf6561cc90a8b821dcc84ec5))
-   **bigquery:** fix start index with page size for list rows ([#&#8203;27](https://www.github.com/googleapis/python-bigquery/issues/27)) ([400673b](https://www.github.com/googleapis/python-bigquery/commit/400673b5d0f2a6a3d828fdaad9d222ca967ffeff))

</details>

---

### Renovate configuration

:date: **Schedule**: At any time (no schedule defined).

:vertical_traffic_light: **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

:recycle: **Rebasing**: Never, or you tick the rebase/retry checkbox.

:no_bell: **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [WhiteSource Renovate](https://renovate.whitesourcesoftware.com). View repository job log [here](https://app.renovatebot.com/dashboard#GoogleCloudPlatform/python-docs-samples).
…Platform/python-docs-samples#4279)

* chore(deps): update dependency pytest to v5.4.3

* specify pytest for python 2 in appengine

Co-authored-by: Leah Cole <[email protected]>
@google-cla
Copy link

google-cla bot commented Sep 2, 2020

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@google-cla google-cla bot added the cla: no This human has *not* signed the Contributor License Agreement. label Sep 2, 2020
@google-cla
Copy link

google-cla bot commented Sep 2, 2020

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@plamut
Copy link
Contributor Author

plamut commented Sep 2, 2020

The tests fail, because they expect the results to contain Python objects, but the actual results are Pyarrow objects, e.g. 123 vs. <pyarrow.Int64Scalar: 123>.

@shollyman Is this actually expected behavior and pyarrow values need to be converted to Python objects explicitly, or should ReadRowsStream.rows() convert the results automatically before handing them to the user?

Update:
It's the former. Taking one of the BigQuery DB API tests as an example, mocked results data actually contains pyarrow values (created with the _to_pyarrow() test helper), implying that bqstorage indeed returns pyarrow values, and user code needs to account for that.

@busunkim96
Copy link
Contributor

It looks like the regular Kokoro job is also running the sample tests https://github.com/googleapis/python-bigquery-storage/blob/master/noxfile.py#L135

There seems to only be one existing sample in this repo. Here is what I suggest:

  • move the quickstart into dataframe or some other directory nested under samples
  • re-run synthtool
  • delete the samples nox session in the library's main noxfile.py
  • merge PR
  • follow with a docs update internally

@product-auto-label product-auto-label bot added the api: bigquerystorage Issues related to the googleapis/python-bigquery-storage API. label Sep 3, 2020
@google-cla
Copy link

google-cla bot commented Sep 3, 2020

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@plamut plamut force-pushed the add-to_dataframe-samples-2 branch from 8df5a86 to 5407056 Compare September 3, 2020 13:03
@google-cla
Copy link

google-cla bot commented Sep 3, 2020

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@google-cla
Copy link

google-cla bot commented Sep 3, 2020

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@plamut
Copy link
Contributor Author

plamut commented Sep 3, 2020

@busunkim96 Done. I had to tweak a few things (e.g. dependencies), but now the samples tests pass locally - both the migrated to_dataframe/ and the existing quickstart which was moved to its own subdirectory.

@plamut plamut force-pushed the add-to_dataframe-samples-2 branch from f891b9a to 8c2fe1a Compare September 3, 2020 14:39
@google-cla
Copy link

google-cla bot commented Sep 3, 2020

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@plamut plamut requested review from busunkim96 and removed request for busunkim96 September 10, 2020 10:38
@plamut
Copy link
Contributor Author

plamut commented Sep 10, 2020

@busunkim96 Please also appease the CLA bot on my behalf, thanks!

@busunkim96 busunkim96 added cla: yes This human has signed the Contributor License Agreement. and removed cla: no This human has *not* signed the Contributor License Agreement. labels Sep 10, 2020
@plamut plamut merged commit ba84d5b into googleapis:master Sep 10, 2020
@plamut plamut deleted the add-to_dataframe-samples-2 branch September 10, 2020 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquerystorage Issues related to the googleapis/python-bigquery-storage API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add bigquery_storage/to_dataframe samples from the samples repo
8 participants