Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating dataframe with Recordbatch using pyarrow.Table.to_batches gives "type16 not valid error" when schema includes date32[day] type #949

Closed
preetijoshi-womply opened this issue Aug 26, 2021 · 2 comments
Labels
bug Something isn't working

Comments

@preetijoshi-womply
Copy link

preetijoshi-womply commented Aug 26, 2021

Describe the bug
I want to create a datafusion dataframe using an in memory pyarrow table which has a pa.date32[day] field.. But while doing so i am getting schema error. Saying type 16 not supported.

To Reproduce
import datafusion
import pyarrow
import datetime

ctx = datafusion.ExecutionContext()

batch = pyarrow.RecordBatch.from_arrays(
[pyarrow.array([1, 2, 3]), pyarrow.array([datetime.date(1970, 1, 1),datetime.date(1970, 1, 2),datetime.date(1970, 1, 3)])],
names=["a", "b"],
)
df = ctx.create_dataframe([[batch]])

Expected behavior
A dataframe should be created with or without date column

Additional context
pa_table.schema gives the following:
"a: string
b: date32[day]

@preetijoshi-womply preetijoshi-womply added the bug Something isn't working label Aug 26, 2021
@preetijoshi-womply preetijoshi-womply changed the title Creating dataframe with Recordbatch using pyarrow.Table.to_batches gives schema error Creating dataframe with Recordbatch using pyarrow.Table.to_batches gives "type16 not valid error" when schema includes date32[day] type Aug 26, 2021
@drauschenbach
Copy link
Contributor

This issue appears to have been resolved.

import datafusion
import pyarrow
import datetime

ctx = datafusion.SessionContext()

batch = pyarrow.RecordBatch.from_arrays(
    [pyarrow.array([1, 2, 3]), pyarrow.array([datetime.date(1970, 1, 1),datetime.date(1970, 1, 2),datetime.date(1970, 1, 3)])],
    names=["a", "b"],
)
df = ctx.create_dataframe([[batch]])


>>> print(df)
DataFrame()
+---+------------+
| a | b          |
+---+------------+
| 1 | 1970-01-01 |
| 2 | 1970-01-02 |
| 3 | 1970-01-03 |
+---+------------+


>>> print(df.schema())
a: int64
b: date32[day]

@andygrove
Copy link
Member

Closing this. Thanks @drauschenbach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants