Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug with Lockfile sticking around #1341

Merged
merged 9 commits into from
Jun 5, 2024
Merged

Fix bug with Lockfile sticking around #1341

merged 9 commits into from
Jun 5, 2024

Conversation

bfops
Copy link
Collaborator

@bfops bfops commented Jun 5, 2024

Description of Changes

Fixes #1339.

I updated Config::save to use an atomic write, so we no longer have the race condition that the Lockfile was originally created to address.

Correspondingly, I've removed Config's usage of Lockfile entirely.

API and ABI breaking changes

Nope

Expected complexity level and risk

2

Testing

Describe any testing you've done, and any testing you'd like your reviewers to do,
so that you're confident that all the changes work as expected!

  • Test that spacetime identity add still works as expected
  • CI passes

std::fs::write(&temp_path, data)?;
std::fs::rename(&temp_path, file_path)?;
Ok(())
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to the fs_utils crate

@gefjon
Copy link
Contributor

gefjon commented Jun 5, 2024

I don't agree with removing the use of Lockfile - even with atomic writes, we can still get into situations where concurrent writers attempt to use the config leading to one of the sets of changes overwriting the other. This is admittedly less severe than the original bug, but it's still an incorrect behavior.

@bfops
Copy link
Collaborator Author

bfops commented Jun 5, 2024

I don't agree with removing the use of Lockfile - even with atomic writes, we can still get into situations where concurrent writers attempt to use the config leading to one of the sets of changes overwriting the other. This is admittedly less severe than the original bug, but it's still an incorrect behavior.

While this is true, I don't see any subcommands that spend a "meaningful" amount of time between reading the config and writing it; it seems unlikely that e.g. two parallel calls to spacetime identity will happen on the same machine. Even so, with the atomic solution, one will just clobber the other's changes rather than creating an invalid file. It's pretty minimal fallout that happens in a specific kind of edge case (the user actively trying to run multiple config-mutating commands in parallel).

Meanwhile, the Lockfiles are causing active issues, and even if we hold the lock for less time, they're still prone to erroneous "locking out" of a user who does a completely valid Ctrl-C at the wrong time. (Who among us hasn't run a command only to immediately realize it was the wrong one?) I claim this is, at least from a UX perspective, more incorrect than potentially losing a config update because the user ran multiple config-mutating things in parallel. The surface area for the Lockfile issue is all commands, regardless of whether they mutate the config.

If we were to address this remaining config-clobbering issue, I would actually push to just have save check whether the config file has been changed since the initial read, and fail in that edge case. But, given that this case seems pretty unlikely, I'm not compelled to fix it now.

}

#[doc(hidden)]
pub fn clone_for_test(&self) -> Self {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer needed since Config is back to being "plain ol' data"

@bfops bfops marked this pull request as ready for review June 5, 2024 20:12
@bfops bfops requested review from jdetter and gefjon June 5, 2024 20:13
@gefjon
Copy link
Contributor

gefjon commented Jun 5, 2024

If we were to address this remaining config-clobbering issue, I would actually push to just have save check whether the config file has been changed since the initial read, and fail in that edge case.

This is insufficient because the CLI may have already performed observable side-effects by the time you abort.

But, given that this case seems pretty unlikely, I'm not compelled to fix it now.

🤷

Copy link
Contributor

@gefjon gefjon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I've said, I don't agree with this direction, but I'm willing to defer to you on the design here. Code is clear and does the thing it's supposed to.

@bfops
Copy link
Collaborator Author

bfops commented Jun 5, 2024

If we were to address this remaining config-clobbering issue, I would actually push to just have save check whether the config file has been changed since the initial read, and fail in that edge case.

This is insufficient because the CLI may have already performed observable side-effects by the time you abort.

I agree with this as a general point, but I didn't see any obvious commands where this seemed like an unrecoverable loss. I also didn't look very hard - Are there specific side-effects you're thinking about / know of?

@bfops
Copy link
Collaborator Author

bfops commented Jun 5, 2024

As I've said, I don't agree with this direction, but I'm willing to defer to you on the design here. Code is clear and does the thing it's supposed to.

I appreciate it! But I'm pretty new here - let's get someone else to weigh in before I just override your objections.

@cloutiertyler
Copy link
Contributor

Coincidentally these types of problems were what databases were invented to solve.

I tend to agree with Phoebe on this one, but I would also not block this merge because I think this does fix 80% of the problem.

One potential way to fix the correctness issues would be to use file locks (the OS concept, not lock files) when reading and writing the file.

https://chatgpt.com/share/ce2f67f5-b277-4594-8f21-6c4235f4b288

@bfops bfops enabled auto-merge June 5, 2024 22:22
// We used to use `Lockfile` to prevent this from happening, but we had other issues with
// that approach (see https://github.com/clockworklabs/SpacetimeDB/issues/1339).
// We should eventually reintroduce `Lockfile`, with further fixes (see the TODO in
// `lockfile.rs`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that we should ever reintroduce Lockfile. I think we need a more general solution to the problem. The issue with Lockfile is that it may never be released.

Copy link
Contributor

@cloutiertyler cloutiertyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bfops bfops disabled auto-merge June 5, 2024 22:28
@bfops bfops enabled auto-merge June 5, 2024 22:28
@bfops bfops added this pull request to the merge queue Jun 5, 2024
Merged via the queue into master with commit b06b2e5 Jun 5, 2024
6 checks passed
@bfops bfops deleted the bfops/fix-config-lock branch September 5, 2024 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(latest master) Issue with config lock file
3 participants