Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: impl rename_noreplace with std::fs::hard_link by default #621

Merged
merged 5 commits into from
Jun 6, 2022

Conversation

wjones127
Copy link
Collaborator

@wjones127 wjones127 commented Jun 4, 2022

Description

I've been working in influxdata/object_store_rs, with the goal of switching to that object store implementation in delta-rs (see #610). It was suggested to simply use hard_link to implement rename_no_replace() easily cross platform.

Since this is causing issues in the Python writer right now (see #620), I'm thinking maybe I should patch this change immediately.

This PR provides a default implementation of rename_noreplace() using std::fs::hard_link, which should be supported on all platforms. It also enables Python writer tests for all platforms.

Related Issue(s)

Documentation

@wjones127
Copy link
Collaborator Author

All tests are passing, including Python on old glibc (💯).

I'm throwing out a fair amount of code in this though, so I'd welcome any critical feedback.

@wjones127 wjones127 requested review from houqp and mosyp June 4, 2022 01:06
Copy link
Member

@houqp houqp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea @wjones127! My current thinking on this is we could use hardlink as the fallback when native atomic rename is not available. Here is my reasoning for keeping renameat2:

  1. Hardlink followed by remove is not atomic, so in very rare edge cases, we could leave the old path behind
  2. It is less efficient compared to renameat2 because we need to issue 2 system calls instead of just one.

Both of the above shortcomings probably won't matter much in most of the production workloads because deltalake is designed for low frequency commits, but I think the tradeoff is worth it in this case because the added complexity is manageable (about 200+ LoC). Happy to hear your thought on this as well.

Ok(())
})
.await
.unwrap()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we propagate the io error back to the caller instead?

@wjones127
Copy link
Collaborator Author

  1. Hardlink followed by remove is not atomic, so in very rare edge cases, we could leave the old path behind

This is already true for the S3 and GCS, right? So only Azure and local filesystems support this operation with that kind of atomicity. I think object_store_rs is aiming to support operations that are common across object stores, with similar guarantees across them. Why enforce that atomicity for some but not others?

And Windows is not atomic in the way we want right now; it does a check for a file and then runs replace, which could overwrite if there is a concurrent writer. So I think hard_link is an improvement in that case, at least.

  1. It is less efficient compared to renameat2 because we need to issue 2 system calls instead of just one.

True, but we only run this operation once per a commit, so it's doubtful there's a performance hit in the big picture.

In object_store_rs, I implemented copy_if_not_exists instead for these reasons. At the time, I thought that no (non-local) object store actually implements replace, but seeing that Azure does makes me rethink that a bit. I don't think it's critical that delta-rs has this; but there are marginal advantages to reliability and performance you've mentioned that could be implemented for local and Azure backends.

So maybe for time being, I'll do this just for Windows and old glibc Linux, and we'll think more about whether we really need this for other platforms.

@houqp
Copy link
Member

houqp commented Jun 5, 2022

This is already true for the S3 and GCS, right? So only Azure and local filesystems support this operation with that kind of atomicity. I think object_store_rs is aiming to support operations that are common across object stores, Why enforce that atomicity for some but not others?

This is what I have in mind as well. Just to clarify, I wasn't suggesting we add this to object_store_rs. I was suggesting we keep it within delta-rs assuming the maintenance burden is low. If that's not the case, I am cool with us switching to hardlink everywhere.

Like I mentioned in my previous comment, I don't think the overhead and correctness issue would matter in production. I am only suggested keeping it because I would like delta-rs to adhere to Rust and C++'s zero cost abstraction philosophy as much as we could.

I am also not the one who is writing the code, so I won't be blocking the merge for minor nitpicks like this :D

And Windows is not atomic in the way we want right now; it does a check for a file and then runs replace, which could overwrite if there is a concurrent writer. So I think hard_link is an improvement in that case, at least.

100% agree, I was trying to suggest that we use renameat2 if it's available, otherwise fallback to hardlink+remove. The current windows implementation is not multi-writer safe as mentioned in the fs module's documentation.

In object_store_rs, I implemented copy_if_not_exists instead for these reasons. At the time, influxdata/object_store_rs#14 (comment) no (non-local) object store actually implements replace, but seeing that Azure does makes me rethink that a bit. I don't think it's critical that delta-rs has this; but there are marginal advantages to reliability and performance you've mentioned that could be implemented for local and Azure backends.

The performance trade offs between PUT if absent and atomic rename is quite nuance. My current understanding of this is that for non-append only and single writer append only commits, PUT if absent is more performant and correct because it only requires a single API call. For highly concurrent append only commits, atomic rename is more performant because it avoids repetitive object writes on subsequent retries. Because delta lake is not designed for highly current write use-cases, I think we should go with PUT if absent whenever possible. This means even though both Azure and HDFS support atomic rename, we probably shouldn't be using it.

Therefore, I think we should probably rewrite the commit loop in delta-rs to use put if absent instead.

@wjones127
Copy link
Collaborator Author

Therefore, I think we should probably rewrite the commit loop in delta-rs to use put if absent instead.

Good points there; I haven't looked to far into that yet, but I might try that when writing the PR to switch to object_store_rs.

@wjones127 wjones127 changed the title feat: simplify rename_no_replace feat: impl rename_noreplace with std::fs::hard_link by default Jun 5, 2022
@wjones127 wjones127 marked this pull request as ready for review June 5, 2022 23:24
@wjones127 wjones127 requested a review from houqp June 5, 2022 23:24
Copy link
Member

@houqp houqp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@houqp houqp merged commit 7633fd6 into delta-io:main Jun 6, 2022
@wjones127 wjones127 deleted the simplify-rename branch June 6, 2022 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants