Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement atomic put_obj. #367

Merged
merged 2 commits into from
Aug 10, 2021
Merged

Implement atomic put_obj. #367

merged 2 commits into from
Aug 10, 2021

Conversation

zijie0
Copy link
Contributor

@zijie0 zijie0 commented Aug 9, 2021

Description

Use write and rename file to implement atomic put_obj in fs backend.

Related Issue(s)

Documentation

Copy link
Contributor

@mosyp mosyp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although I'd like to resolve the flush() call question

Comment on lines 114 to 115
f.flush().await?;

Ok(())
drop(f);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@houqp Does the f.flush() that you've added makes sense now? Also, is it actually required to flush a handle since it's being closed immediately

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is still needed, the content could still be partially cached in kernel without an explicit flush/sync.

after I took a second look at the doc, I think we should be using sync_all instead of flush.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #368 as follow up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also changed flush to sync_all.

@mosyp mosyp requested a review from houqp August 9, 2021 15:27
Ok(())
drop(f);

match fs::rename(tmp_path, path).await {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zijie0 could you change this to use rename::atomic_rename(src, dst)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed our atomic rename is not async, which could block the tokio runtime and lead to latency issues in production. filed #369 as a follow up for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh god, I made a very bad suggestion here :( I think the fs:rename you used here is the correct one. It does exactly what we needed, i.e. overwrite the destination if already exists. Our atomic rename should be named something else like rename_if_not_exists so people won't confuse it with rename.

Copy link
Member

@houqp houqp Aug 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc to the builtin rename function: https://doc.rust-lang.org/std/fs/fn.rename.html, the tokio fs module should have a equivalent async version as well.

Copy link
Contributor Author

@zijie0 zijie0 Aug 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@houqp It seems that rename on Unix does have some atomicity issue:

If newpath already exists, it will be atomically replaced, so
that there is no point at which another process attempting to
access newpath will find it missing. However, there will
probably be a window in which both oldpath and newpath refer to
the file being renamed.

But other projects like tempfile use it anyway. I think it's due to the temp file is transparent to the end user.

Copy link
Member

@houqp houqp Aug 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the main goal we are trying to achieve here is to avoid reading partially written data from newpath/targetpath. It's ok for both paths to exist and refer to the same file content as long as the file content write itself is atomically served to the readers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If @mosyp and you all agree, I may submit a new PR to use tokio::fs::rename in put_obj method.

Copy link
Contributor

@mosyp mosyp Aug 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zijie0 hey, I've missed the notification, sure, works for me!

@houqp houqp requested a review from mosyp August 9, 2021 20:20
@houqp houqp merged commit f381850 into delta-io:main Aug 10, 2021
@houqp
Copy link
Member

houqp commented Aug 10, 2021

Thank you @zijie0 for the fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants