Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError encountered when to_dataframe returns empty resultset with JSON field #1580

Open
jdub55 opened this issue Jun 7, 2023 · 1 comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@jdub55
Copy link

jdub55 commented Jun 7, 2023

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Please run down the following list and make sure you've tried the usual "quick fixes":

If you are still having issues, please be sure to include as much information as possible:

Environment details

(bigquery_reproduce) jupyter@gke-db-executor:~/bigquery_reproduce$ python --version
Python 3.9.0
(bigquery_reproduce) jupyter@gke-db-executor:~/bigquery_reproduce$ pip --version
pip 23.1.2 from /opt/conda/envs/bigquery_reproduce/lib/python3.9/site-packages/pip (python 3.9)
(bigquery_reproduce) jupyter@gke-db-executor:~/bigquery_reproduce$ pip show google-cloud-bigquery
Name: google-cloud-bigquery
Version: 3.11.0
Summary: Google BigQuery API client library
Home-page: https://github.com/googleapis/python-bigquery
Author-email: [email protected]
Location: /opt/conda/envs/bigquery_reproduce/lib/python3.9/site-packages
Requires: google-api-core, google-cloud-core, google-resumable-media, grpcio, packaging, proto-plus, protobuf, python-dateutil, requests
(bigquery_reproduce) jupyter@gke-db-executor:~/bigquery_reproduce$ pip show google-cloud-bigquery-storage
Name: google-cloud-bigquery-storage
Version: 2.20.0
Summary: Google Cloud Bigquery Storage API client library
Home-page: https://github.com/googleapis/python-bigquery-storage
Author-email: [email protected]
License: Apache 2.0
Location: /opt/conda/envs/bigquery_reproduce/lib/python3.9/site-packages
Requires: google-api-core, proto-plus, protobuf

(bigquery_reproduce) jupyter@gke-db-executor:~/bigquery_reproduce$ pip list
Package                       Version
----------------------------- --------
cachetools                    5.3.1
certifi                       2023.5.7
charset-normalizer            3.1.0
db-dtypes                     1.1.1
google-api-core               2.11.0
google-auth                   2.19.1
google-cloud-bigquery         3.11.0
google-cloud-bigquery-storage 2.20.0
google-cloud-core             2.3.2
google-crc32c                 1.5.0
google-resumable-media        2.5.0
googleapis-common-protos      1.59.0
grpcio                        1.55.0
grpcio-status                 1.55.0
idna                          3.4
numpy                         1.24.3
packaging                     23.1
pandas                        2.0.2
pip                           23.1.2
proto-plus                    1.22.2
protobuf                      4.23.2
pyarrow                       12.0.0
pyasn1                        0.5.0
pyasn1-modules                0.3.0
python-dateutil               2.8.2
pytz                          2023.3
requests                      2.31.0
rsa                           4.9
setuptools                    67.7.2
six                           1.16.0
tzdata                        2023.3
urllib3                       1.26.16
wheel                         0.40.0

Steps to reproduce

When bigquery return empty datase. bigquery python sdk will need to transfer [bq schema to arrow schema

https://github.com/googleapis/python-bigquery/blob/main/google/cloud/bigquery/table.py#L1844-L1853),

so it will execute bq_to_arrow_data_type
https://github.com/googleapis/python-bigquery/blob/main/google/cloud/bigquery/_pandas_helpers.py#L225-L246

Looks like currently this no JSON type mapping]:
https://github.com/googleapis/python-bigquery/blob/main/google/cloud/bigquery/_pandas_helpers.py#L147-L162)

Code example

Error:

/opt/conda/envs/smapi/lib/python3.9/site-packages/google/cloud/bigquery/_pandas_helpers.py:267:
UserWarning: Unable to determine type for field 'json'.

warnings.warn("Unable to determine type for field
'{}'.".format(bq_field.name))

Traceback (most recent call last):

File "/home/jupyter/smapi/test_bq.py", line 12, in <module>

df = bqexecutor.execute_query(sql)

File "/home/jupyter/smapi/smapi/core/data_providers/sql/bq_executor.py",
line 119, in execute_query

querydf = rows.to_dataframe()

File
"/opt/conda/envs/smapi/lib/python3.9/site-packages/google/cloud/bigquery/[table.py](https://table.py/)",
line 2151, in to_dataframe

progress_bar_type=progress_bar_type,

File
"/opt/conda/envs/smapi/lib/python3.9/site-packages/google/cloud/bigquery/[table.py](https://table.py/)",
line 1840, in to_arrow

return pyarrow.Table.from_batches(record_batches)

File "pyarrow/table.pxi", line 3747, in pyarrow.lib.Table.from_batches

ValueError: Must pass schema, or at least one RecordBatch

Python version: Python 3.9.0

package version:

google-cloud-bigquery 3.10.0

google-cloud-bigquery-storage 2.19.1

pyarrow 8.0.0

pandas 1.5.3

How to reproduce?

test_bq.py

python

from google.cloud import bigquery


client = bigquery.Client()


table_id = "gdw-dev-gdml.abehsu.reproduce_issue"


# Define schema

schema = [

bigquery.SchemaField("field1", "STRING", mode="REQUIRED"),

bigquery.SchemaField("field2", "JSON", mode="REQUIRED"),

]


table = bigquery.Table(table_id, schema=schema)

table = client.create_table(table) # Make an API request.

print(

"Created table {}.{}.{}".format(table.project, table.dataset_id,
table.table_id)

)



# Perform a query.

QUERY = (

f"""

select * from {table_id}

"""

)

query_job = client.query(QUERY) # API request

df = query_job.to_dataframe()


print(df)

python test_bq.py

Created table gdw-dev-gdml.abehsu.reproduce_issue

Stack trace




/opt/conda/envs/smapi/lib/python3.9/site-packages/google/cloud/bigquery/_pandas_helpers.py:267:
UserWarning: Unable to determine type for field 'field2'.

warnings.warn("Unable to determine type for field
'{}'.".format(bq_field.name))

Traceback (most recent call last):

File "/home/jupyter/smapi/test_bq.py", line 41, in <module>

df = query_job.to_dataframe()

File
"/opt/conda/envs/smapi/lib/python3.9/site-packages/google/cloud/bigquery/job/[query.py](https://query.py/)",
line 1800, in to_dataframe

return query_result.to_dataframe(

File
"/opt/conda/envs/smapi/lib/python3.9/site-packages/google/cloud/bigquery/[table.py](https://table.py/)",
line 2150, in to_dataframe

record_batch = self.to_arrow(

File
"/opt/conda/envs/smapi/lib/python3.9/site-packages/google/cloud/bigquery/[table.py](https://table.py/)",
line 1848, in to_arrow

return pyarrow.Table.from_batches(record_batches, schema=arrow_schema)

File "pyarrow/table.pxi", line 3747, in pyarrow.lib.Table.from_batches

ValueError: Must pass schema, or at least one RecordBatch


Making sure to follow these steps will guarantee the quickest resolution possible.

Thanks!

@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Jun 7, 2023
@chalmerlowe chalmerlowe added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. priority: p3 Desirable enhancement or fix. May not be included in next release. labels Jun 9, 2023
@chalmerlowe
Copy link
Collaborator

OP points out the mappings for BQ to arrow datatypes and it definitely appears that JSON is not included there OR in any of the other translation functions adjacent to that code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

No branches or pull requests

2 participants