We cannot append data to existing Delta Lake tables if the schema of data to write includes timestamp columns with timezone. #1777
Labels
binding/python
Issues for the Python package
binding/rust
Issues for the Rust crate
bug
Something isn't working
Milestone
Environment
Delta-rs version:
Binding:
Environment:
Bug
We cannot append data to existing Delta Lake tables if the schema of data to write includes timestamp columns with timezone.
What happened:
The first write succeeds. But subsequent append writes fail.
What you expected to happen:
We can append data including timestamp columns with timezone in its schema.
How to reproduce it:
pa.timestamp(unit="us", tz=timezone.utc)
looks compliant with the timestamp data type in Delta Lake.But the second
write_deltalake(..., mode="append")
fails with the following error.More details:
One of the possible workarounds is removing timezone from timestamp column definitions.
However, we are strongly concerned with this workaround because this workaround removes timestamp info from statistics in transaction logs.
We are currently investigating an inconsistent behaviour of Spark Delta Lake with one of our Delta Lake tables. Since this table is written using this workaround, and this inconsistency happens only when we set a timezone except for UTC to Spark session, we are guessing statistics without timezone information in transaction logs are the root cause of this inconsistency.
The text was updated successfully, but these errors were encountered: