BigQuery: Fix bug where `load_table_from_dataframe` could not append to REQUIRED fields. #8230

tswast · 2019-06-05T22:14:46Z

If a BigQuery schema is supplied as part of the job_config, it can be
used to set the nullable bit correctly on the serialized parquet file.

Closes #8093.

…D fields. If a BigQuery schema is supplied as part of the `job_config`, it can be used to set the `nullable` bit correctly on the serialized parquet file.

plamut

Update: I figured out that the example in the issue description does not hit the to_parquet() line, because job_config.schema is None. Will try to figure out how to set that.

(disclaimer: my BQ knowledge is very limited)

Non-essential remark aside, the code changes look good to me all in all. I had some trouble verifying the fix, though.

I was able to reproduce the issue following the steps from description (had to switch "foo" and "bar" in the second-to-last line). When testing it again on the PR branch, however, the issue persisted, I again got the same error.

What could I be missing?

FWIW, I did make sure to re-install the bigquery library after pulling the PR code:

(venv-3.6) peter@black-box:~/workspace/google-cloud-python/bigquery (pr_temp)$ pip install -e .

plamut · 2019-06-07T08:56:29Z

bigquery/google/cloud/bigquery/_pandas_helpers.py

        arrow_names.append(bq_field.name)
        arrow_arrays.append(bq_to_arrow_array(dataframe[bq_field.name], bq_field))

-    arrow_table = pyarrow.Table.from_arrays(arrow_arrays, names=arrow_names)
+    if all((field is not None for field in arrow_fields)):


(minor)
As a sole argument, the generator expression does not have to be enclosed in an extra pair of parentheses.

plamut

Update 2: I changed the last line of the example from the issue description to the following:

from google.cloud.bigquery import job
job_config = job.LoadJobConfig(schema=schema)

client.load_table_from_dataframe(
    df, table_ref, job_config=job_config
).result()

The error I then got was different, but seemed similar to the original one:

google.api_core.exceptions.BadRequest: 400 Error while reading data, error message: Provided schema is not compatible with the file 'prod-scotty-8efadb65-d51b-44ba-bfec-cf98d1e93934'. Field 'bar' is specified as REQUIRED in provided schema which does not match NULLABLE as specified in the file.

When I ran the modified example with the PR fix, the error disappeared. Seems like the fix works (and the new code path was indeed taken).

plamut · 2019-06-07T09:55:36Z

Based on my limited BQ knowledge, the fix seems to work and the code looks good, but I will wait with merging, since @shollyman might have something more to add.

(if not, then please feel free to go ahead and merge it)

shollyman

Thanks for this.

Fix bug where load_table_from_dataframe could not append to REQUIRE…

d77e56a

…D fields. If a BigQuery schema is supplied as part of the `job_config`, it can be used to set the `nullable` bit correctly on the serialized parquet file.

tswast requested a review from a team June 5, 2019 22:14

googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jun 5, 2019

tswast requested review from shollyman and plamut June 5, 2019 22:15

sduskis assigned plamut Jun 6, 2019

tseaver changed the title ~~Fix bug where load_table_from_dataframe could not append to REQUIRED fields.~~ BigQuery: Fix bug where load_table_from_dataframe could not append to REQUIRED fields. Jun 6, 2019

plamut reviewed Jun 7, 2019

View reviewed changes

plamut approved these changes Jun 7, 2019

View reviewed changes

shollyman approved these changes Jun 7, 2019

View reviewed changes

plamut merged commit 5c85d51 into googleapis:master Jun 7, 2019

tswast deleted the issue8093-load_table_from_dataframe-required-fields branch June 8, 2019 00:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BigQuery: Fix bug where `load_table_from_dataframe` could not append to REQUIRED fields. #8230

BigQuery: Fix bug where `load_table_from_dataframe` could not append to REQUIRED fields. #8230

tswast commented Jun 5, 2019

plamut left a comment •

edited

Loading

plamut Jun 7, 2019

plamut left a comment •

edited

Loading

plamut commented Jun 7, 2019 •

edited

Loading

shollyman left a comment

BigQuery: Fix bug where load_table_from_dataframe could not append to REQUIRED fields. #8230

BigQuery: Fix bug where load_table_from_dataframe could not append to REQUIRED fields. #8230

Conversation

tswast commented Jun 5, 2019

plamut left a comment • edited Loading

Choose a reason for hiding this comment

plamut Jun 7, 2019

Choose a reason for hiding this comment

plamut left a comment • edited Loading

Choose a reason for hiding this comment

plamut commented Jun 7, 2019 • edited Loading

shollyman left a comment

Choose a reason for hiding this comment

BigQuery: Fix bug where `load_table_from_dataframe` could not append to REQUIRED fields. #8230

BigQuery: Fix bug where `load_table_from_dataframe` could not append to REQUIRED fields. #8230

plamut left a comment •

edited

Loading

plamut left a comment •

edited

Loading

plamut commented Jun 7, 2019 •

edited

Loading