-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BigQuery: DATETIME columns invalid when uploaded with load_table_from_dataframe
#9996
Comments
See:
and
The fix is likely to change the pyarrow type to |
One thing to consider: the full range of DATETIME is not supported when nanosecond precision is used. We need to check the input precision and make sure it matches the Arrow precision we use. |
@tswast I think the short term fix would be to change datetime64 to TIMESTAMP instead of DATETIME |
@tseaver |
@HemangChothani Can you try changing
|
@emkornfield I have tried with the change and it's working fine.
…On Fri, 20 Dec, 2019, 11:12 PM emkornfield, ***@***.***> wrote:
@HemangChothani <https://github.com/HemangChothani> Can you try changing
https://github.com/googleapis/google-cloud-python/blob/2dabc2dcc606e3867e25fd86ede21614cdcaaa04/bigquery/google/cloud/bigquery/_pandas_helpers.py#L55
to be TIMESTAMP.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9996?email_source=notifications&email_token=AMAR4JVARUKEPRM3BEOMRKLQZT7W7A5CNFSM4J4PDI62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHNT2MQ#issuecomment-568016178>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AMAR4JTTUGVZK6SBYMYVXVDQZT7W7ANCNFSM4J4PDI6Q>
.
|
I'll let others chime in, but in the short term I think this might be a reasonable fix. The main downside is that it doesn't allow round-tripping "datetime64[ns]", but at least data is preserved. |
If you manually specify TIMESTAMP in the job_config, does that work? I’d prefer to keep time zone less columns mapping to DATETIME and fix the conversion. |
@tswast Yes it works if i manually specify |
OK, thanks. Since we have a workaround, I'll drop the priority to P2. |
For system tests of this fix, I'd like to see supported types added to https://github.com/googleapis/google-cloud-python/blob/master/bigquery/samples/load_table_dataframe.py and the tests for that sample updated to read and verify the table data. I'll work on updating the tests for currently-supported data types. |
Created draft PR #10028 to hopefully reproduce this issue in a system test. |
I see in the unit tests for this that uploading DATETIME columns was blocked by https://issues.apache.org/jira/browse/ARROW-5450, so once this issue is fixed, we'll require the latest version of |
I investigated this further by capturing the Parquet file that is generated by client.py
When I upload |
I'm able to reproduce this with the command-line:
|
Since I'm able to reproduce this error with the command-line, I believe it is a backend issue. I've filed internal bug 147108331 for the backend engineers to investigate. |
I was able to create a Parquet file with nanosecond-precision DATETIME columns, but the backend still multiplies the values by 1000. It seems we can't write DATETIME columns from Parquet without a backend fix. |
After all this, I think the best option is to write |
Talked with the backend engineers. DATETIME is not supported for uploading from Parquet (treating my request as a feature request), so changing the dtype to TIMESTAMP is the correct solution. |
https://github.com/googleapis/google-cloud-python/issues/9920#issuecomment-566642042
The text was updated successfully, but these errors were encountered: