Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_deltalake is not creating checkpoints #1815

Closed
yefetBenTili opened this issue Nov 7, 2023 · 5 comments
Closed

write_deltalake is not creating checkpoints #1815

yefetBenTili opened this issue Nov 7, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@yefetBenTili
Copy link

yefetBenTili commented Nov 7, 2023

Delta-rs version: 0.10.0

Binding:

Environment:
Cloud provider: AWS
OS: macOs
Other:


We have a Delta Lake on S3 with over 2TB of data, which we write to daily. using we use write_deltalake (writing new partitions every day with partition filters)

We noticed a significant decline in read performance after a few weeks. which led to further investigation I discovered that no checkpoint files were being written. Currently, I am at over 4000 transaction JSON files, and no checkpoint file is there.

As far as I know, Delta's default behavior includes checkpointing after the 10th version. Is there a way to enforce this or trigger it manually?

    write_deltalake(
        df
        mode="overwrite",
        schema=config.persrec_history_schema,
        storage_options={"AWS_S3_ALLOW_UNSAFE_RENAME": "True"},
        partition_by=[*partition_dict.keys()],
        partition_filters= partiton_filters],
    )

@yefetBenTili yefetBenTili added the bug Something isn't working label Nov 7, 2023
@yefetBenTili yefetBenTili changed the title write_deltalake is not creatring checkpoints write_deltalake is not creating checkpoints Nov 7, 2023
@djouallah
Copy link

I use this

dt = DeltaTable(Path_Delta,storage_options=storage_options) 
      if len(dt.file_uris()) >= 9 :
            dt.optimize.compact()
            dt.vacuum(retention_hours=172,dry_run=False,  enforce_retention_duration=False)
            dt.create_checkpoint()

@ion-elgreco
Copy link
Collaborator

@yefetBenTili we are already tracking this here: #913

@ion-elgreco
Copy link
Collaborator

Closing it since we track it in #913

@slanton-a
Copy link

Saw the conversion on the other issue, but I still don't understand. Should I expect write_deltalake to create a checkpoint automatically? or do I need to manually call create_checkpoint?

@djouallah
Copy link

djouallah commented Nov 4, 2024

@slanton-a with the latest version, it is automatic by default every 100 commits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants