Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cant create empty table with write_deltalake #2086

Closed
Mxater opened this issue Jan 16, 2024 · 2 comments
Closed

Cant create empty table with write_deltalake #2086

Mxater opened this issue Jan 16, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@Mxater
Copy link

Mxater commented Jan 16, 2024

Environment

Windows 11
Python 3.11
Pandas 2.1.4
Delta-rs version: 0.15

Environment: Local

  • OS: Windows 11

Bug

What happened:
I tried to create a new empty table in datalake, this is code:

columnas = ['fecha', 'fecha_archivo', 'fecha_archivo_string', 'filename', 'registros', 'cliente', 'retail', 'ejecucion', 'cargado', 'estado', 'observacion', 'dias_quiebre', 'inmovilizados', 'dias_inmovilizados']

df = pd.DataFrame(columns=columnas)
write_deltalake(rutaBase + "filestest", df, mode="overwrite")

Throw the next error:

Traceback (most recent call last):
  File "X:\Asyma\Python\FabricProcess\pythonProject\test2.py", line 45, in <module>
    write_deltalake(rutaBase + "filestest", df, mode="overwrite")
  File "C:\Python311\Lib\site-packages\deltalake\writer.py", line 325, in write_deltalake
    raise ValueError(
ValueError: Schema of data does not match table schema
Data schema:
fecha: null
fecha_archivo: null
fecha_archivo_string: null
filename: null
registros: null
cliente: null
retail: null
ejecucion: null
cargado: null
estado: null
observacion: null
dias_quiebre: null
inmovilizados: null
dias_inmovilizados: null
Table Schema:
fecha: timestamp[us]
fecha_archivo: timestamp[us]
fecha_archivo_string: timestamp[us]
filename: timestamp[us]
registros: timestamp[us]
cliente: timestamp[us]
retail: timestamp[us]
ejecucion: timestamp[us]
cargado: timestamp[us]
estado: timestamp[us]
observacion: timestamp[us]
dias_quiebre: timestamp[us]
inmovilizados: timestamp[us]
dias_inmovilizados: timestamp[us]
__index_level_0__: int64
@Mxater Mxater added the bug Something isn't working label Jan 16, 2024
@ion-elgreco
Copy link
Collaborator

The reason it throws an error is because your Pandas dataframe has columns with no dtype so it's inferred as null. So converting this to a pyarrow table will give this schema:

pa.Table.from_pandas(df).schema

fecha: null
fecha_archivo: null
fecha_archivo_string: null
filename: null
registros: null
cliente: null
retail: null
ejecucion: null
cargado: null
estado: null
observacion: null
dias_quiebre: null
inmovilizados: null
dias_inmovilizados: null

I am closing this since this is not a bug, you need to instantiate your pandas dataframe with the correct dtypes using the dtype argument.

@ion-elgreco ion-elgreco closed this as not planned Won't fix, can't repro, duplicate, stale Jan 20, 2024
@Mxater
Copy link
Author

Mxater commented Jan 22, 2024

I dont find the way to instantiate the pandas dataframes with dtypes column. Can you help me with that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants