-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
machine config: make write atomic #21857
Conversation
As indicated in containers#21849, loading the machine config can flake/fail with an EOF JSON error indicating an incomplete file. Address the issue by atomically writing the config. This way, it is not possible to load an incomplete or partially written file. The lock can be acquired later on to sync state. [NO NEW TESTS NEEDED] as it's a hard-to-hit race. Fixes: containers#21849 Signed-off-by: Valentin Rothberg <[email protected]>
Cockpit tests failed for commit f8abd7f. @martinpitt, @jelly, @mvollmer please check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This patch is correct and useful to prevent writing corrupted files in case a process gets killed mid write.
However the reason we see this flake is that there are fundamental locking issues in the code that still need to get fixed.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: giuseppe, Luap99, vrothberg The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Can you elaborate further? What I noticed is that loading is done without having the lock. Whether that's an issue or not depends on the consistency model which I am not totally sure of. |
Reading without the lock held is always broken. How can we know the data is still accurate if we do not hold the lock? |
That depends on the consistency model. As long as writes are atomic, it is valid to read without holding a lock as long as the data is not considered to be accurate.
For querying the state, QEMU does a |
There is no consistency model here, yes you can read the file sure. But the data you have must be consider incorrect.
Only QEMU does that and Refresh is unlocked (unless a caller locked it) which is the reason we the the flake in the first place because the ls quires the State() which is also unlocked. For something like podman machine ls that is fine I agree and this patch will fix that. However podman machine start is broken, it can easily start two VMS if you call it at the same time. Also the only provider using Refresh() at all is qemu which means the other ones are even worse. Anyhow no point in arguing this here, #21854 should fix most of the issues I think. |
Good to merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
2d4ef6f
into
containers:main
As indicated in #21849, loading the machine config can flake/fail with an EOF JSON error indicating an incomplete file. Address the issue by atomically writing the config. This way, it is not possible to load an incomplete or partially written file. The lock can be acquired later on to sync state.
[NO NEW TESTS NEEDED] as it's a hard-to-hit race.
Fixes: #21849
Does this PR introduce a user-facing change?