-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
overlay: lock staging directories #1916
overlay: lock staging directories #1916
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: giuseppe The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
6ebcbf6
to
46a3ae8
Compare
I vendored in your PR and ran podman tests. tons of failures, all in seccomp, all timeouts. Too many failures to be coincidence, I think. |
Signed-off-by: Giuseppe Scrivano <[email protected]>
sorry for wasting your time on this. I've updated the PR and ran the Podman tests: containers/podman#22573 which are green now |
Thanks! podman + composefs test now in progress, containers/podman#22425 |
Same thing: timeouts in seccomp-policy. Common factor seems to be podman-local (not remote) + sqlite (not boltdb) + fedora (not debian). I also vendored c-common, so maybe the bug is there? |
Confirmed. # cat /etc/containers/storage.conf
[storage]
driver = "overlay"
runroot = "/run/containers/storage"
graphroot = "/var/lib/containers/storage"
[storage.options]
pull_options = {enable_partial_images = "true", use_hard_links = "false", ostree_repos="", convert_images = "true"}
[storage.options.overlay]
use_composefs = "true"
# go mod edit --replace github.com/containers/storage=github.com/giuseppe/storage@a5c63583a5f1ae8d9495716f2fd8f84755c64feb
...
# make
...
# bin/podman run --rm --seccomp-policy '' quay.io/libpod/alpine-with-seccomp:label ls
bin
dev
...
HANG! |
ah no, this is another issue in the PR. I've fixed it now. It happens only when the registry rejects the range request, and I am going to debug now why quay does that with the |
opened a PR in c/image to address it: containers/image#2391 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Note to self: the history is #775 (comment) , so this really needs to access other processes’ layers.)
The general intent of this does make sense to me; marking as “request changes” to make sure this is not merged prematurely.
store.go
Outdated
if !rlstore.Exists(to) { | ||
return struct{}{}, ErrLayerUnknown | ||
func (s *store) ApplyStagedLayer(args ApplyStagedLayerOptions) (*Layer, error) { | ||
layer, err := writeToLayerStore(s, func(rlstore rwLayerStore) (*Layer, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this code path, updating an existing store, going to be used?
Thinking of c/image pulls , we need the layer, when it is unlocked, to either not exist, or to exist with the full contents. This seems to imply a WIP layer which does not contain contents; a concurrent pull of an image with the same layer would succeed, and allow using the image, when the contents are still missing.
Is this for some non-image container-only layers, or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it exists only to be used from the containers-storage
tool. applydiff-using-staging-dir
was added to mirror applydiff
, which applies the diff to an existing layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn’t the primary purpose of containers-storage
to run tests? (And possibly to inspect existing broken stores, for which read-only operations are sufficient.)
It’s a bit disappointing to maintain a code path for the rare use case; to have no test coverage for the atomic create+apply code path we actually are going to exercise; and to pay for that with an extra lock/presence check on every ApplyStagedLayer
call. Most of that is pre-existing, not the fault of this PR…
… but maybe we can leave ApplyDiffFromStagingDirectory
around, then?
Or, well, change the applydiff-using-staging-dir
and the calling test to create the layer atomically, and hope (?) that there are no external users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d still prefer the tests to exercise the atomic creation code path, and not to add the test-only one here.
I’m not currently sure that’s blocking for me.
drivers/overlay/overlay.go
Outdated
parentStagingDir := filepath.Dir(stagingDirectory) | ||
if filepath.Dir(parentStagingDir) != d.getStagingDir(id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intuitively it seems to me that the API of the driver should always return the “top level staging” directory; how it organizes the insides into locks / contents / … is an internal matter of this subpackage.
Or at the very least, the layout needs to be thoroughly documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The staging directory returned now is the directory where the content for the layer will be stored.
A staging directory is something like /var/lib/containers/storage/overlay/staging/12345/dir
, and the lock file is ``/var/lib/containers/storage/overlay/staging/12345/staging.lock`
A caller shouldn't care about anything except the directory where the files are expected to be written.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My view is that the caller has no business reading any individual files either way; it essentially gets an ~opaque handle. (Now that the other APIs are being removed, ApplyStagedLayerOptions
and CleanupStagedLayer
both ask for a complete, presumably unmodified, DriverWithDifferOutput
.)
That’s even more the case with ComposeFS, where the structure of the staged contents is no longer the traditional overlayFS filesystem.
Assuming the above way of thinking, I think it’s simpler and clearer for the overlay driver itself to structure it so that the “handle” being passed is the top-level directory; affecting anything in parent directories is unusual and unexpected. “Unusual and unexpected” is not automatically a blocker, but I think overlay is already complex and minimizing complexity is good. Or, at least, documenting the complexity, which is “the very least” thing I wish for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still outstanding, somehow.
return fmt.Errorf("%q is not a staging directory", stagingDirectory) | ||
} | ||
|
||
defer func() { | ||
if lock, ok := d.stagingDirsLocks[parentStagingDir]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this, ApplyDiffFromStagingDirectory
unlocks the lock on some error return paths, but not all. A caller can’t deal with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
… but there are also various error paths on callers. So maybe the staging area should only be unlocked when this succeeds?
Or maybe this does not matter?
Note to self: FIXME investigate.
Well, FWIW, with this and the c-i PR podman tests now pass |
pushed a new version that addresses all the comments. |
@mtrmac waiting on you. |
@mohanboddu should the jira label create automatically the issue? |
I'd like to get this into the next release, can we move it forward? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to unblock progress, not a careful review I’m afraid.
Signed-off-by: Giuseppe Scrivano <[email protected]>
Signed-off-by: Giuseppe Scrivano <[email protected]>
Signed-off-by: Giuseppe Scrivano <[email protected]>
Signed-off-by: Giuseppe Scrivano <[email protected]>
the callers in c/image were already replaced, so simplify the store API and drop the functions. Signed-off-by: Giuseppe Scrivano <[email protected]>
this is a preparatory patch to allow storing a lock file for each staging directory. Signed-off-by: Giuseppe Scrivano <[email protected]>
631b7b9
to
780f7c1
Compare
extend the public API to allow a non blocking usage. Signed-off-by: Giuseppe Scrivano <[email protected]>
lock any staging directory while it is being used so that another process cannot delete it. Now the Cleanup() function deletes only the staging directories that are not locked by any other user. Closes: containers#1915 Signed-off-by: Giuseppe Scrivano <[email protected]>
780f7c1
to
f1352df
Compare
Thanks. Fixed the comments and pushed a new version |
Successful CI run with composefs in #22425, with latest (this morning's) main. Also a successful run this weekend but that was before the giant vendor merge. Either way, tentative LGTM from the passing-tests front. |
/lgtm |
// If we're the first reference on the lock, we need to open the file again. | ||
fd, err := openLock(l.file, l.ro) | ||
if err != nil { | ||
l.rwMutex.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readLock
, this must be RUnlock
.
// reader lock or a writer lock. | ||
if err = lockHandle(l.fd, lType, true); err != nil { | ||
closeHandle(fd) | ||
l.rwMutex.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readLock
, this must be RUnlock
.
defer s.containerStore.stopWriting() | ||
|
||
// putLayer requires the rlstore, rlstores, as well as s.containerStore (even if not an argument to this function) to be locked for write. | ||
func (s *store) putLayer(rlstore rwLayerStore, rlstores []roLayerStore, id, parent string, names []string, mountLabel string, writeable bool, lOptions *LayerOptions, diff io.Reader, slo *stagedLayerOptions) (*Layer, int64, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the read-only layer stores must not be locked on entry.
Fix locking bugs from #1916, and one more
lock any staging directory while it is being used so that another process cannot delete it.
Now the Cleanup() function deletes only the staging directories that are not locked by any other user.
Closes: #1915
Signed-off-by: Giuseppe Scrivano [email protected]