-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement atomic put_obj. #367
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, although I'd like to resolve the flush()
call question
rust/src/storage/file/mod.rs
Outdated
f.flush().await?; | ||
|
||
Ok(()) | ||
drop(f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@houqp Does the f.flush()
that you've added makes sense now? Also, is it actually required to flush a handle since it's being closed immediately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is still needed, the content could still be partially cached in kernel without an explicit flush/sync.
after I took a second look at the doc, I think we should be using sync_all instead of flush.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #368 as follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've also changed flush to sync_all.
rust/src/storage/file/mod.rs
Outdated
Ok(()) | ||
drop(f); | ||
|
||
match fs::rename(tmp_path, path).await { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zijie0 could you change this to use rename::atomic_rename(src, dst)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed our atomic rename is not async, which could block the tokio runtime and lead to latency issues in production. filed #369 as a follow up for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh god, I made a very bad suggestion here :( I think the fs:rename
you used here is the correct one. It does exactly what we needed, i.e. overwrite the destination if already exists. Our atomic rename should be named something else like rename_if_not_exists
so people won't confuse it with rename
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc to the builtin rename function: https://doc.rust-lang.org/std/fs/fn.rename.html, the tokio fs module should have a equivalent async version as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@houqp It seems that rename on Unix does have some atomicity issue:
If newpath already exists, it will be atomically replaced, so
that there is no point at which another process attempting to
access newpath will find it missing. However, there will
probably be a window in which both oldpath and newpath refer to
the file being renamed.
But other projects like tempfile use it anyway. I think it's due to the temp file is transparent to the end user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the main goal we are trying to achieve here is to avoid reading partially written data from newpath
/targetpath
. It's ok for both paths to exist and refer to the same file content as long as the file content write itself is atomically served to the readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If @mosyp and you all agree, I may submit a new PR to use tokio::fs::rename
in put_obj
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zijie0 hey, I've missed the notification, sure, works for me!
Thank you @zijie0 for the fix |
Description
Use write and rename file to implement atomic put_obj in fs backend.
Related Issue(s)
put_obj
implementation is not atomic #361Documentation