Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: perform bulk deletes during metadata cleanup #1763

Merged

Conversation

cmackenzie1
Copy link
Contributor

@cmackenzie1 cmackenzie1 commented Oct 23, 2023

Description

In addition to doing bulk deletes, I removed what seems like (at least to me) unnecessary code. At it's core, files are considered up for deletion when their last_modified time is older than the cutoff time AND the version is less than the specific version (usually the latest version).

Related Issue(s)

Documentation

@github-actions github-actions bot added binding/rust Issues for the Rust crate rust labels Oct 23, 2023
@rtyler rtyler added this to the Rust v0.17 milestone Oct 25, 2023
@rtyler
Copy link
Member

rtyler commented Oct 25, 2023

@cmackenzie1 I looked this over and thought "coo coo" but I didn't like the intermediate iterators, so I put my thinking cap on and wondered if we could just feed streams into streams (sup dawg).

I haven't done any testing other running the existing test suite, but I think this works. If you don't like it feel free to revert my commit out and we can discuss more 😄

@rtyler rtyler force-pushed the cole/issue-1761-bulk-deletes branch from 7d1140d to 1847b90 Compare October 25, 2023 06:30
Copy link
Contributor Author

@cmackenzie1 cmackenzie1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks!

.list(Some(storage.log_path()))
.await?
// Pass along only the Ok results from storage.list
.filter(|res| futures::future::ready(res.is_ok()))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the only reason I didn't take this approach first. If we are ok with not handling the error here that's fine with me!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the nature of the function, I don't think we would want to halt execution on a single error would we?

Assuming that's correct, I I will modify the code to log errors but continue processing

cmackenzie1 and others added 2 commits October 25, 2023 18:53
In addition to doing bulk deletes, I removed what seems like (at least to me)
unnecessary code. At it's core, files are considered up for deletion
when their last_modified time is older than the cutoff time AND the version
if less than the specific version (usually the latest version).
…aning up expired logs

This change builds on @cmackenzie1's work and feeds the list stream directly into
the delete_stream with a predicate function to identify paths for deletion
@rtyler
Copy link
Member

rtyler commented Oct 26, 2023

Since I contributed here, I would like @wjones127 or @roeap to tap this one into main after a review

Copy link
Collaborator

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

@rtyler rtyler merged commit f9b7080 into delta-io:main Oct 30, 2023
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate rust
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use bulk deletes where possible
3 participants