Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust writer should raise when decimal types are incompatible (currently writers and puts table in invalid state) #2275

Closed
neo4py opened this issue Mar 11, 2024 · 2 comments · Fixed by #2330
Assignees
Labels
binding/rust Issues for the Rust crate bug Something isn't working

Comments

@neo4py
Copy link

neo4py commented Mar 11, 2024

Environment: PyPI deltalake 0.16.0

Delta-rs version: deltalake 0.16.0
Binding: Python

Environment:

  • Cloud provider: AWS
  • OS: Linux
  • Other:

Bug

What happened:
A string value from the source is converted to python decimal.Decimal using decimal.create_decimal(str).
When this value is written to deltalake using write_deltalake, the datatype is created as decimal with exact precision of the incoming value, for ex: decimal(2,1). Next time, when I try to append a value with a higher precision, it throws the error: Parser error: parse decimal overflow

What you expected to happen:
expectation is to load the data without errors.
merge_schema feature could increase the precision when incoming data is of higher precision(?)

How to reproduce it:

import argparse
import pandas as pd
from deltalake.writer import write_deltalake
from decimal import (
    Clamped,
    Context,
    Inexact,
    Overflow,
    Rounded,
    Underflow
)

# Create decimal Context object
DDB_CONTEXT = Context(
    Emin=-128,
    Emax=126,
    prec=38,
    traps=[Clamped, Overflow, Inexact, Rounded, Underflow],
)

# Parse script arguments
parser = argparse.ArgumentParser("simple-writer")
parser.add_argument("str_dec", type=str)
args = parser.parse_args()

# Creatre Decimal object from string argument
var_dec = DDB_CONTEXT.create_decimal(args.str_dec)

# Create DataFrame
df = pd.DataFrame({"id": [1], "desc": ["abc"], "amount": [var_dec]})

# Write DataFrame to Delta table
write_deltalake(
    "/home/deltauser/deltalake_0_16/tables/test-table", df, mode='append', schema_mode='merge', engine='rust'
)

First execution: python ./simple-writer.py "8.6" will succeed
Second execution: python ./simple-writer.py "30.6" will fail with Parser error: parse decimal overflow

More details:

@neo4py neo4py added the bug Something isn't working label Mar 11, 2024
@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Mar 24, 2024

@neo4py I don't think we can increase the precision while doing mergeSchema. from this discussion at delta: delta-io/delta#514 (comment), they also don't allow it since it can cause loss of information.

I suggest you create the table first with the DeltaTable.create() api and then explicitly mention what type of precision your decimal column needs

@ion-elgreco ion-elgreco closed this as not planned Won't fix, can't repro, duplicate, stale Mar 24, 2024
@ion-elgreco
Copy link
Collaborator

ion-elgreco commented Mar 24, 2024

Actually it is a bug in the rust Writer, didn't see that the second write action succeeds and puts the table in an invalid state. That needs to be handled and there should be a raise before writing.

@ion-elgreco ion-elgreco reopened this Mar 24, 2024
@ion-elgreco ion-elgreco changed the title Decimal overflow error with schema_mode=merge in Python deltalake 0.16.0 Rust writer should raise when decimal types are incompatible (currently writers and puts table in invalid state) Mar 24, 2024
@ion-elgreco ion-elgreco added the binding/rust Issues for the Rust crate label Mar 24, 2024
@ion-elgreco ion-elgreco self-assigned this Mar 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants