feat: default to DATETIME type when loading timezone-naive datetimes from Pandas #1061

plamut · 2021-11-15T10:03:27Z

Closes #985.

This proved to be more tricky than expected, because manual introspection is needed when augmenting the schema - pyarrow attaches the UTC timezone to naive datetimes, making it problematic to distinguish these from timezone-aware datetimes.

PR checklist:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

plamut · 2021-11-15T10:07:14Z

google/cloud/bigquery/_pandas_helpers.py

+            if detected_type == "TIMESTAMP":
+                valid_item = _first_array_valid(dataframe[field.name])
+                if isinstance(valid_item, datetime) and valid_item.tzinfo is None:
+                    detected_type = "DATETIME"


I was thinking of doing this check for all detected TIMESTAMP values, but it turned out it's only necessary for datetimes inside an array, because that's when we need to use pyarrow to help.

For datetime values outside of arrays, we can already distinguish between naive and aware ones based on Pandas dtypes, meaning that we do not even enter augment_schema() for them.

plamut · 2021-11-15T10:09:00Z

google/cloud/bigquery/_pandas_helpers.py

+
+    # Valid item is None because all items in the "valid" array are invalid. Try
+    # to find a true valid array manually.
+    for array in islice(series, first_valid_index + 1, None):


I was not sure if slicing the series results in an unnecessary copy (Pandas docs say it's context-dependent), thus played it safe and just used islice.

plamut · 2021-11-15T15:25:04Z

Status checks got stuck...

tswast

Thanks! Good catch in identifying the additional logic needed for arrays of DATETIME

deps!: BigQuery Storage and pyarrow are required dependencies (#776) fix!: use nullable `Int64` and `boolean` dtypes in `to_dataframe` (#786) feat!: destination tables are no-longer removed by `create_job` (#891) feat!: In `to_dataframe`, use `dbdate` and `dbtime` dtypes from db-dtypes package for BigQuery DATE and TIME columns (#972) fix!: automatically convert out-of-bounds dates in `to_dataframe`, remove `date_as_object` argument (#972) feat!: mark the package as type-checked (#1058) feat!: default to DATETIME type when loading timezone-naive datetimes from Pandas (#1061) feat: add `api_method` parameter to `Client.query` to select `INSERT` or `QUERY` API (#967) fix: improve type annotations for mypy validation (#1081) feat: use `StandardSqlField` class for `Model.feature_columns` and `Model.label_columns` (#1117) docs: Add migration guide from version 2.x to 3.x (#1027) Release-As: 3.0.0

deps!: BigQuery Storage and pyarrow are required dependencies (googleapis#776) fix!: use nullable `Int64` and `boolean` dtypes in `to_dataframe` (googleapis#786) feat!: destination tables are no-longer removed by `create_job` (googleapis#891) feat!: In `to_dataframe`, use `dbdate` and `dbtime` dtypes from db-dtypes package for BigQuery DATE and TIME columns (googleapis#972) fix!: automatically convert out-of-bounds dates in `to_dataframe`, remove `date_as_object` argument (googleapis#972) feat!: mark the package as type-checked (googleapis#1058) feat!: default to DATETIME type when loading timezone-naive datetimes from Pandas (googleapis#1061) feat: add `api_method` parameter to `Client.query` to select `INSERT` or `QUERY` API (googleapis#967) fix: improve type annotations for mypy validation (googleapis#1081) feat: use `StandardSqlField` class for `Model.feature_columns` and `Model.label_columns` (googleapis#1117) docs: Add migration guide from version 2.x to 3.x (googleapis#1027) Release-As: 3.0.0

plamut added 3 commits November 13, 2021 22:51

Make systest expect DATETIME for naive datetimes

3ee99cd

Fix SchemaField repr() when field type not set

5bfe6c1

Adjust DATETIME detection logic in dataframes

41dfa62

plamut requested review from tswast and a team November 15, 2021 10:03

plamut requested a review from a team as a code owner November 15, 2021 10:03

google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Nov 15, 2021

product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Nov 15, 2021

plamut commented Nov 15, 2021

View reviewed changes

Fix assertions in one of the samples tests

c82cf84

plamut requested a review from a team as a code owner November 15, 2021 15:03

plamut requested a review from parthea November 15, 2021 15:03

plamut added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 15, 2021

yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 15, 2021

tswast approved these changes Nov 16, 2021

View reviewed changes

tswast merged commit 3cae066 into googleapis:v3 Nov 16, 2021

plamut deleted the iss-985 branch November 16, 2021 20:46

tswast mentioned this pull request Dec 1, 2021

v3: default to loading timezone-less pandas datetime columns into a DATETIME BigQuery #985

Closed

tswast mentioned this pull request Mar 29, 2022

fix!: remove out-of-date BigQuery ML protocol buffers #1178

Merged

4 tasks

release-please bot mentioned this pull request Mar 29, 2022

chore(main): release 3.0.0 #1179

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: default to DATETIME type when loading timezone-naive datetimes from Pandas #1061

feat: default to DATETIME type when loading timezone-naive datetimes from Pandas #1061

plamut commented Nov 15, 2021

plamut Nov 15, 2021

plamut Nov 15, 2021

plamut commented Nov 15, 2021

tswast left a comment

feat: default to DATETIME type when loading timezone-naive datetimes from Pandas #1061

feat: default to DATETIME type when loading timezone-naive datetimes from Pandas #1061

Conversation

plamut commented Nov 15, 2021

plamut Nov 15, 2021

Choose a reason for hiding this comment

plamut Nov 15, 2021

Choose a reason for hiding this comment

plamut commented Nov 15, 2021

tswast left a comment

Choose a reason for hiding this comment