Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to write new partitions with type timestamp on tables created with delta-rs 0.10.0 #2631

Closed
emanueledomingo opened this issue Jun 28, 2024 · 5 comments

Comments

@emanueledomingo
Copy link

Environment

Delta-rs version:

Binding: 0.18.0

Environment:

  • OS: Ubuntu 22.04 LTS

Bug

What happened:

I have a table written with delta-rs 0.10.0. The schema is:

{
   "type":"struct",
   "fields":[
      {
         "name":"Date",
         "type":"date",
         "nullable":false,
         "metadata":{}
      },
      {
         "name":"Timestamp",
         "type":"timestamp",
         "nullable":false,
         "metadata":{}
      }
   ]
}

I'm triyng to write a new partition on that table with the following schema:

pa.schema(
    [
        ("Day", pa.date32(), False),
        ("Timestamp", pa.timestamp("us"), False),
    ]
)

But i get: DeltaError: Generic DeltaTable error: Writer features must be specified for writerversion >= 7, please specify: TimestampWithoutTimezone.

With deltalake 0.16.2 worked fine. Now i dumped to 0.18.0 and i get this error with tables created with an old delta-rs client.

If i the table is created with newer delta-rs client, this doesn't happen.

How to reproduce it:

  1. Create a table with deltalake==0.10.0
import deltalake as dl
import pyarrow as pa

dl.__version__   # 0.10.0

ta = pa.Table.from_pydict(
    {
        "Date": ["2023-01-01", "2023-01-02"],
        "Timestamp": ["2023-01-01T14:37:35.386235", "2023-01-01T14:37:35.386235", "2023-01-01T14:37:35.386235"]
    }
)

ta = ta.cast(
    pa.schema(
        [
            ("Date", pa.date32(), False),
            ("Timestamp", pa.timestamp("us"), False),
        ]
    )
)
dl.write_deltalake(
    table_or_uri="tmp/table",
    mode="overwrite",
    data=ta,
)
  1. Write a new partition with deltalake==0.18.0
import deltalake as dl
import pyarrow as pa

dl.__version__   # 0.18.0

ta = pa.Table.from_pydict(
    {
        "Date": ["2024-06-28"],
        "Timestamp": ["2024-06-28T14:37:35.386235"]
    }
)

ta = ta.cast(
    pa.schema(
        [
            ("Date", pa.date32(), False),
            ("Timestamp", pa.timestamp("us"), False),
        ]
    )
)
dl.write_deltalake(
    table_or_uri="tmp/table",
    mode="overwrite",
    data=ta,
    pertition_filters=[("Date", "=", "2024-06-28"]
)

More details:

  1. Debugging the code (at least from python) i noticed that the table created with delta 0.10 has "timestamp" as a primary type, while new tables now have "timestamp_ntz"
  2. If i add the timezone (for example UTC), even if the table is created with delta 0.10, the write is successful
@emanueledomingo emanueledomingo added the bug Something isn't working label Jun 28, 2024
@Josh-Hiz
Copy link

Josh-Hiz commented Jul 1, 2024

I am additionally have the same issue when writing new partitions with type timestamp in 0.18.1, when time stamps are structured as for example: '2015-10-30T06:40:15.000Z', year-month-dayThour-min-month.000Z

@ion-elgreco ion-elgreco removed the bug Something isn't working label Jul 1, 2024
@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Jul 1, 2024

We fixed a longstanding bug where timestamps where incorrect, this has now been correct and was a backwards incomatible change in some areas, additionally the pyarrow engine however incorrectly writes UTC timestamps as Z, this is something we cannot configure in pyarrow

@emanueledomingo
Copy link
Author

Is there a way to migrate the schema from "timestamp" to "timestamp_ntz" without recreating the table? (and load all the historical data)

I tried with schema_mode: overwrite but i get the same error. It seems that delta is unable to write the new "timestamp_ntz" type over the legacy "timestamp".

@ion-elgreco
Copy link
Collaborator

@emanueledomingo easiest is to recreate the table at the moment

@ion-elgreco
Copy link
Collaborator

Closing this, since this change was backwards incompatible to fix a long standing issue

@ion-elgreco ion-elgreco closed this as not planned Won't fix, can't repro, duplicate, stale Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants