Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python): fixed large_dtype to schema convert #2635

Merged
merged 2 commits into from
Jun 29, 2024

Conversation

sherlockbeard
Copy link
Contributor

@sherlockbeard sherlockbeard commented Jun 29, 2024

Description

added large_dtype to schema convert

closes #2374

Related Issue(s)

Documentation

@github-actions github-actions bot added the binding/python Issues for the Python package label Jun 29, 2024
@ion-elgreco ion-elgreco enabled auto-merge (squash) June 29, 2024 12:32
@rtyler rtyler disabled auto-merge June 29, 2024 21:01
@rtyler rtyler merged commit d8a244f into delta-io:main Jun 29, 2024
10 checks passed
@chitralverma
Copy link
Contributor

@rtyler facing the same issue with merge instead of write.
can this option be added to merge as well ?

@ion-elgreco
Copy link
Collaborator

@chitralverma what exactly are you facing?

@hitesh1997
Copy link

hitesh1997 commented Sep 18, 2024

@ion-elgreco
The issue arises when we try to perform the merge operation like below -

(
    df.write_delta(
        delta_table_path,
        mode="merge",
        delta_merge_options={
            "predicate": "s.col1 = t.col1",
            "source_alias": "s",
            "target_alias": "t",
        },
        storage_options=delta_storage_options,
    )
    .when_matched_update_all()
    .when_not_matched_insert_all()
    .execute()
)

Here delta_storage_options is this -

delta_storage_options = {
    "aws_access_key_id": aws_key_id,
    "aws_secret_access_key": aws_secret_key,
    "aws_endpoint": aws_endpoint,
    "AWS_S3_ALLOW_UNSAFE_RENAME": "true",
}

The above code works when trying to run this on local without mentioning storage_options.

When we try to run this on aws, we get this error message -

Error writing to Delta Table: Generic DeltaTable error: **type_coercion**
caused by
Error during planning: Failed to coerce then ([LargeList(Field { name: "item", data_type: LargeUtf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), LargeList(Field { name: "item", data_type: LargeUtf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), List(Field { name: "item", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), List(Field { name: "item", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), List(Field { name: "item", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} })]) and else (None) to common types in CASE WHEN expression

Our delta table schema is like this below. Which contain columns of type "List" -

schema = pa.schema([
    ('col1', pa.string()),
    ('col2', pa.string()),
    ('col3', pa.string()),
    ('col4', pa.string()),
    ('col5', pa.string()),
    ('col6', pa.string()),
    ('col7', pa.string()),
    ('col8', pa.string()),
    ('col9', pa.string()),
    ('col10', pa.list_(pa.string())),
    ('col11', pa.list_(pa.string())),
    ('col12', pa.string()),
    ('col13', pa.float64()),
])


@ion-elgreco
Copy link
Collaborator

@hitesh1997 please have a look at this issue: #2851

@hitesh1997
Copy link

Hi @ion-elgreco,

The issue was fixed by downgrading to deltalake==0.18.2 from 0.19.1

@ion-elgreco
Copy link
Collaborator

@hitesh1997 0.18.2 just buries the issue by always downcasting to normal types, you could do the same thing in 0.20 already by manually providing large_dtypes = False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

write_deltalake identifies large_string as datatype even though string is set in schema
5 participants