Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Write validator definitions atomically #2338

Closed

Conversation

michaelsproul
Copy link
Member

@michaelsproul michaelsproul commented May 10, 2021

Issue Addressed

Closes #2159

Proposed Changes

Rather than trying to write the validator definitions to disk directly, use a temporary file called .validator_defintions.yml.tmp and then atomically rename it to validator_definitions.yml. This avoids truncating the primary file, which can cause permanent damage when the disk is full.

The same treatment is also applied to the validator key cache, although the situation is less dire if it becomes corrupted because it can just be deleted without the user having to reimport keys or resupply passwords.

Additional Info

Further Work

@michaelsproul michaelsproul added ready-for-review The code is ready for review A0 labels May 10, 2021
@michaelsproul michaelsproul requested a review from paulhauner May 10, 2021 01:45
@michaelsproul michaelsproul force-pushed the empty-validator-definitions branch from 9d60924 to 75da0ed Compare May 10, 2021 02:05
@michaelsproul
Copy link
Member Author

MVP for reproducing the bug:

use std::fs::File;
use std::io::Write;

fn main() {
    let mut f1 = File::create("/lol/good_file.txt").expect("creating good file failed");
    let mut f2 = File::create("/lol/junk_file.txt").expect("creating bad file failed");

    let megabyte_of_crap = vec![0xff; 1024 * 1024];

    f2.write_all(&megabyte_of_crap)
        .expect("writing bad file failed");

    f1.write_all(b"This is going to go missing :(")
        .expect("writing good file failed");
}

For use with a 1MB tempfs mounted on /lol/:

$ sudo mkdir /lol
$ sudo chmod 777 /lol
$ sudo mount -t tmpfs -o size=1M tmpfs /lol

If you create /lol/good_file.txt with some contents and then run the above program, the file will be truncated and left empty when the disk is filled by the write to junk_file.txt.


I used the same tempfs setup to simulate a validator datadir running out of space, and was not able to reproduce the bug using the updated code, implying that it is fixed.

@michaelsproul
Copy link
Member Author

I've pushed commit 9d60924 to the Pyrmont nodes and the VCs restarted twice without issue (once from old->new, once from new->new).

Copy link
Member

@paulhauner paulhauner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks good, but I have a comment regarding permissions :)

create_with_600_perms(&temp_path, &bytes).map_err(Error::UnableToWriteTempFile)?;

// With the temporary file created, perform an atomic rename.
fs::rename(&temp_path, &config_path).map_err(Error::UnableToRenameFile)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I haven't tested directly, I think this will technically be a breaking change since we will replace the current perms of the non-temp file with 600.

In effect, Lighthouse will continuously force its own file permissions upon that file. I'm not sure this is necessarily a bad thing, but perhaps it might hurt people with fancy setups?

Copying the permissions of the non-temp file to the temp file seems like the least-breaky approach, but I imagine that might get complex in a cross-platform world. Perhaps you've considered this already?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. I'll look into this further tomorrow.

@ethDreamer if you have a sec and would like to chime in on the windows situation that would be great (but no worries if you're busy).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've found a good solution for the permissions that will work cross platform, but doesn't require us to block this PR on Windows support. The function write_file_via_temporary uses fs::copy to copy from primary file to temp (which copies permissions), writes to the temp file, and then moves it into place.

validator_client/src/key_cache.rs Outdated Show resolved Hide resolved
@paulhauner paulhauner added waiting-on-author The reviewer has suggested changes and awaits thier implementation. and removed ready-for-review The code is ready for review labels May 10, 2021
@michaelsproul michaelsproul added ready-for-review The code is ready for review and removed waiting-on-author The reviewer has suggested changes and awaits thier implementation. labels May 11, 2021
Copy link
Member

@paulhauner paulhauner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, nice fix!

There's a potential for some weirdness during concurrent execution, but that's not introduced by this PR nor do I think it poses a material risk.

Merge at will!

@paulhauner paulhauner added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels May 12, 2021
@michaelsproul
Copy link
Member Author

Thank you!

bors r+

bors bot pushed a commit that referenced this pull request May 12, 2021
## Issue Addressed

Closes #2159

## Proposed Changes

Rather than trying to write the validator definitions to disk directly, use a temporary file called `.validator_defintions.yml.tmp` and then atomically rename it to `validator_definitions.yml`. This avoids truncating the primary file, which can cause permanent damage when the disk is full.

The same treatment is also applied to the validator key cache, although the situation is less dire if it becomes corrupted because it can just be deleted without the user having to reimport keys or resupply passwords.

## Additional Info

* `File::create` truncates upon opening: https://doc.rust-lang.org/std/fs/struct.File.html#method.create
* `fs::rename` uses `rename` on UNIX and `MoveFileEx` on Windows: https://doc.rust-lang.org/std/fs/fn.rename.html
* UNIX `rename` call is atomic: https://unix.stackexchange.com/questions/322038/is-mv-atomic-on-my-fs
* Windows `MoveFileEx` is _not_ atomic in general, and Windows lacks any clear API for atomic file renames :(
   https://stackoverflow.com/questions/167414/is-an-atomic-file-rename-with-overwrite-possible-on-windows

## Further Work

* Consider whether we want to try a different Windows syscall as part of #2333. The `rust-atomicwrites` crate seems promising, but actually uses the same syscall under the hood presently: untitaker/rust-atomicwrites#27.
@bors bors bot changed the title Write validator definitions atomically [Merged by Bors] - Write validator definitions atomically May 12, 2021
@bors bors bot closed this May 12, 2021
@michaelsproul michaelsproul deleted the empty-validator-definitions branch May 12, 2021 03:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A0 ready-for-merge This PR is ready to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants