You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, ds.dataset() can include the partition key in the result if partitioning='hive' option is used.
However, this partition key column is returned as a str, though it used to be a date.
Would be nice, if it was returned as a datetime column, if it is in the standard ISO format.
import pyarrow.dataset as ds
mytab = ds.dataset("my_table_name", partitioning='hive').to_table()
print(mytab)
kou
changed the title
Ds.dataset() should be able to return partition column as datetime (if it is in ISO date format)
[Python][Parquet] Ds.dataset() should be able to return partition column as datetime (if it is in ISO date format)
Feb 22, 2024
Describe the enhancement requested
Right now, ds.dataset() can include the partition key in the result if partitioning='hive' option is used.
However, this partition key column is returned as a str, though it used to be a date.
Would be nice, if it was returned as a datetime column, if it is in the standard ISO format.
Calendar_date is the partitioning column.
Results:
pyarrow.Table
USER_ID: double
TRX_CNT: double
DATE_OF_BIRTH: timestamp[ns]
CALENDAR_DATE: string
USER_ID: [[1000,1001,1002,1003],[1000,1001,1002,1005],[1000,1001,1003,1005,1008]]
TRX_CNT: [[434,11,3,555],[111,32,1,2],[434,21,44,111,222]]
DATE_OF_BIRTH: [[1998-12-01 23:00:00.000000000,2002-03-13 23:00:00.000000000,1975-08-31 23:00:00.000000000,1998-12-31 23:00:00.000000000],[1998-12-01 23:00:00.000000000,2002-03-13 23:00:00.000000000,1975-08-31 23:00:00.000000000,2004-06-06 22:00:00.000000000],[1998-12-01 23:00:00.000000000,2002-03-13 23:00:00.000000000,1998-12-31 23:00:00.000000000,2004-06-06 22:00:00.000000000,1988-02-27 23:00:00.000000000]]
CALENDAR_DATE: [["2023-08-01 00:00:00","2023-08-01 00:00:00","2023-08-01 00:00:00","2023-08-01 00:00:00"],["2023-08-02 00:00:00","2023-08-02 00:00:00","2023-08-02 00:00:00","2023-08-02 00:00:00"],["2023-08-03 00:00:00","2023-08-03 00:00:00","2023-08-03 00:00:00","2023-08-03 00:00:00","2023-08-03 00:00:00"]]
my_table_name.zip
Component(s)
Parquet, Python
The text was updated successfully, but these errors were encountered: