-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
save transaction: database is locked
on podman exec
#20809
Comments
@mheon @edsantiago Did you see this error before? @jpalus Any chance you can share what commands you were running? I agree that it looks like a race when running podman in parallel. |
@Luap99 I guess I could though I guess they won't be of much use. These are pretty specific containers. From a high level perspective:
Command that failed with mentioned error was package installation -- after installation was done and |
Ugh.... no, I haven't seen "database is locked" since May. But we do zero parallel testing. |
@Luap99 Yes - this is why we disabled WAL mode, which was enough to calm CI. Evidently not sufficient. I think we may need some form of retry wrapper around SQLite operations, because that error should not be fatal - we ought to be able to spin until the DB is unlocked and continue without issue. |
(As a bonus, we could investigate reenabling WAL mode if we had such a wrapper) |
And a new occurrence when I wanted to remove those containers:
|
Doesn't retry just hide the real issue? At least reading https://www2.sqlite.org/cvstrac/wiki?p=DatabaseIsLocked this seems like a logic bug in our code. By looking at all the transaction code paths. I found at least one call that does not call Commit() on the transaction. Line 1659 in 8387d2d
But this is rare pod rm code paths lo I find it unlikely that it is hit here. |
I had 10+ podman run processes running in parallel for hours but never got any error, I also tried with exec. I think there might be more to the story here. One difference is I am on x86_64 and you one aarch64. Also looking at performance during my stress testing podman was IO bound so I wonder if it has to do anything with the underlying fs/disk speed. |
Checked on arm64/kernel 6.6.3 as well as x86/kernel 6.6.0 with same outcome. I had more luck with reproducing the latter error during
for i in `seq 1 100`; do podman create docker.io/fedora; done
strace -o log -s1024 -f podman rm -a |
Thanks, running with strace does indeed let me reproduce. |
Only one process can write to the sqlite db at the same time, if another process tries to use it at that time it fails and a database is locked error is returned. If this happens sqlite should keep retrying until it can write. To do that we can just set the _busy_timeout option. A 100s timeout should be enough even on slower systems but not to much in case there is a deadlock so it still returns in a reasonable time. [NO NEW TESTS NEEDED] I think we strongly need to consider some form of parallel stress testing to catch bugs like this. Fixes containers#20809 Signed-off-by: Paul Holzinger <[email protected]>
Only one process can write to the sqlite db at the same time, if another process tries to use it at that time it fails and a database is locked error is returned. If this happens sqlite should keep retrying until it can write. To do that we can just set the _busy_timeout option. A 100s timeout should be enough even on slower systems but not to much in case there is a deadlock so it still returns in a reasonable time. [NO NEW TESTS NEEDED] I think we strongly need to consider some form of parallel stress testing to catch bugs like this. Fixes containers#20809 Signed-off-by: Paul Holzinger <[email protected]>
A little late to the party but so far so good. Didn't notice any |
One strange issue occurred just now, that is with a stock podman 4.8.0 without a fix and sqlite db backend. Not sure if it's hidden "database locked" issue or wrong ordering somewhere.
|
That's probably separate; can you open a fresh bug for that? Probably missing a |
Just got another instance of it with
These 3 containers are slow to remove (lots of files, spinning storage) so not really sure if those transactions are really "short-lived". |
Removal is done in steps, we hold the transaction lock only for the last
part of it - when the container’s storage has already been removed. The
only thing under that lock should be deleting two rows from the database.
…On Tue, Dec 19, 2023 at 11:09 Jan Palus ***@***.***> wrote:
Just got another instance of it with podman 4.8.1 which is supposed to
fix the issue:
$ podman rm -f -a
3b804f42923c8b5674a20fc0a820d46d6a0afcbcd80883d3b7953c589a1d4c2e
8a19b804c4f766404697acf8a32e1065063802a13c04770564ba3faa1dd895c8
Error: cleaning up container 1e6a8fb3e45f5fc7139583632bff90cc048601cac11e28bf5e11816b77125e2b: unmounting container 1e6a8fb3e45f5fc7139583632bff90cc048601cac11e28bf5e11816b77125e2b storage: unmounting container 1e6a8fb3e45f5fc7139583632bff90cc048601cac11e28bf5e11816b77125e2b: saving container 1e6a8fb3e45f5fc7139583632bff90cc048601cac11e28bf5e11816b77125e2b state: beginning container 1e6a8fb3e45f5fc7139583632bff90cc048601cac11e28bf5e11816b77125e2b save transaction: database is locked
These 3 containers are slow to remove (lots of files, spinning storage) so
not really sure if those transactions are really "short-lived".
—
Reply to this email directly, view it on GitHub
<#20809 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3AOCAADJHKSIARKKADOC3YKG32RAVCNFSM6AAAAAA75WSINCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNRTGA2TQMBUGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Issue Description
Upgraded to
podman 4.8.0
, did asystem reset
and tooksqlite
db for a spin. Started creation of 3 independent containers in parallel followed by some setup steps done withpodman exec
and one such invocation ended up with:ERRO[0008] Container 68e95ee6822e1975764367770b5f9def2717de18a0624d71f1c857809303343d exec session cda744d36c8c72ce7f701cad851001f22e32432855dae12f1cda745bf4372aea error: saving container 68e95ee6822e1975764367770b5f9def2717de18a0624d71f1c857809303343d state: beginning container 68e95ee6822e1975764367770b5f9def2717de18a0624d71f1c857809303343d save transaction: database is locked
Steps to reproduce the issue
No reproducible steps I'm afraid. I assume it's a timing issue.
Describe the results you received
podman exec
failed once with:ERRO[0008] Container 68e95ee6822e1975764367770b5f9def2717de18a0624d71f1c857809303343d exec session cda744d36c8c72ce7f701cad851001f22e32432855dae12f1cda745bf4372aea error: saving container 68e95ee6822e1975764367770b5f9def2717de18a0624d71f1c857809303343d state: beginning container 68e95ee6822e1975764367770b5f9def2717de18a0624d71f1c857809303343d save transaction: database is locked
Describe the results you expected
No error should be present.
podman info output
Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
Yes
Additional environment details
No response
Additional information
No response
The text was updated successfully, but these errors were encountered: