Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overlay: lock staging directories #1916

Merged
merged 8 commits into from
May 13, 2024

Conversation

giuseppe
Copy link
Member

lock any staging directory while it is being used so that another process cannot delete it.

Now the Cleanup() function deletes only the staging directories that are not locked by any other user.

Closes: #1915

Signed-off-by: Giuseppe Scrivano [email protected]

Copy link
Contributor

openshift-ci bot commented Apr 30, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@edsantiago
Copy link
Member

I vendored in your PR and ran podman tests. tons of failures, all in seccomp, all timeouts. Too many failures to be coincidence, I think.

@giuseppe giuseppe force-pushed the lock-staging-dir branch from 46a3ae8 to a5c6358 Compare May 2, 2024 08:13
giuseppe added a commit to giuseppe/libpod that referenced this pull request May 2, 2024
@giuseppe
Copy link
Member Author

giuseppe commented May 2, 2024

I vendored in your PR and ran podman tests. tons of failures, all in seccomp, all timeouts. Too many failures to be coincidence, I think.

sorry for wasting your time on this.

I've updated the PR and ran the Podman tests: containers/podman#22573 which are green now

@edsantiago
Copy link
Member

Thanks! podman + composefs test now in progress, containers/podman#22425

@edsantiago
Copy link
Member

Same thing: timeouts in seccomp-policy. Common factor seems to be podman-local (not remote) + sqlite (not boltdb) + fedora (not debian).

I also vendored c-common, so maybe the bug is there?

@edsantiago
Copy link
Member

Confirmed.

# cat /etc/containers/storage.conf 
[storage]
driver = "overlay"
runroot = "/run/containers/storage"
graphroot = "/var/lib/containers/storage"

[storage.options]
pull_options = {enable_partial_images = "true", use_hard_links = "false", ostree_repos="", convert_images = "true"}

[storage.options.overlay]
use_composefs = "true"

# go mod edit --replace github.com/containers/storage=github.com/giuseppe/storage@a5c63583a5f1ae8d9495716f2fd8f84755c64feb
...
# make
...

# bin/podman run --rm --seccomp-policy '' quay.io/libpod/alpine-with-seccomp:label ls
bin
dev
...
HANG!

@giuseppe giuseppe force-pushed the lock-staging-dir branch from a5c6358 to ff3eccc Compare May 2, 2024 13:46
@giuseppe
Copy link
Member Author

giuseppe commented May 2, 2024

ah no, this is another issue in the PR. I've fixed it now. It happens only when the registry rejects the range request, and I am going to debug now why quay does that with the quay.io/libpod/alpine-with-seccomp:labels image

@giuseppe
Copy link
Member Author

giuseppe commented May 2, 2024

ah no, this is another issue in the PR. I've fixed it now. It happens only when the registry rejects the range request, and I am going to debug now why quay does that with the quay.io/libpod/alpine-with-seccomp:labels image

opened a PR in c/image to address it: containers/image#2391

@giuseppe giuseppe added the jira label May 2, 2024
Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note to self: the history is #775 (comment) , so this really needs to access other processes’ layers.)

The general intent of this does make sense to me; marking as “request changes” to make sure this is not merged prematurely.

store.go Outdated
if !rlstore.Exists(to) {
return struct{}{}, ErrLayerUnknown
func (s *store) ApplyStagedLayer(args ApplyStagedLayerOptions) (*Layer, error) {
layer, err := writeToLayerStore(s, func(rlstore rwLayerStore) (*Layer, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this code path, updating an existing store, going to be used?

Thinking of c/image pulls , we need the layer, when it is unlocked, to either not exist, or to exist with the full contents. This seems to imply a WIP layer which does not contain contents; a concurrent pull of an image with the same layer would succeed, and allow using the image, when the contents are still missing.

Is this for some non-image container-only layers, or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it exists only to be used from the containers-storage tool. applydiff-using-staging-dir was added to mirror applydiff, which applies the diff to an existing layer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn’t the primary purpose of containers-storage to run tests? (And possibly to inspect existing broken stores, for which read-only operations are sufficient.)

It’s a bit disappointing to maintain a code path for the rare use case; to have no test coverage for the atomic create+apply code path we actually are going to exercise; and to pay for that with an extra lock/presence check on every ApplyStagedLayer call. Most of that is pre-existing, not the fault of this PR…

… but maybe we can leave ApplyDiffFromStagingDirectory around, then?

Or, well, change the applydiff-using-staging-dir and the calling test to create the layer atomically, and hope (?) that there are no external users.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d still prefer the tests to exercise the atomic creation code path, and not to add the test-only one here.

I’m not currently sure that’s blocking for me.

store.go Outdated Show resolved Hide resolved
Comment on lines 2173 to 2174
parentStagingDir := filepath.Dir(stagingDirectory)
if filepath.Dir(parentStagingDir) != d.getStagingDir(id) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intuitively it seems to me that the API of the driver should always return the “top level staging” directory; how it organizes the insides into locks / contents / … is an internal matter of this subpackage.

Or at the very least, the layout needs to be thoroughly documented.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The staging directory returned now is the directory where the content for the layer will be stored.

A staging directory is something like /var/lib/containers/storage/overlay/staging/12345/dir, and the lock file is ``/var/lib/containers/storage/overlay/staging/12345/staging.lock`

A caller shouldn't care about anything except the directory where the files are expected to be written.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My view is that the caller has no business reading any individual files either way; it essentially gets an ~opaque handle. (Now that the other APIs are being removed, ApplyStagedLayerOptions and CleanupStagedLayer both ask for a complete, presumably unmodified, DriverWithDifferOutput.)

That’s even more the case with ComposeFS, where the structure of the staged contents is no longer the traditional overlayFS filesystem.

Assuming the above way of thinking, I think it’s simpler and clearer for the overlay driver itself to structure it so that the “handle” being passed is the top-level directory; affecting anything in parent directories is unusual and unexpected. “Unusual and unexpected” is not automatically a blocker, but I think overlay is already complex and minimizing complexity is good. Or, at least, documenting the complexity, which is “the very least” thing I wish for.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still outstanding, somehow.

pkg/lockfile/lockfile_unix.go Outdated Show resolved Hide resolved
pkg/lockfile/lockfile.go Outdated Show resolved Hide resolved
drivers/overlay/overlay.go Outdated Show resolved Hide resolved
store.go Outdated Show resolved Hide resolved
drivers/overlay/overlay.go Outdated Show resolved Hide resolved
drivers/overlay/overlay.go Outdated Show resolved Hide resolved
return fmt.Errorf("%q is not a staging directory", stagingDirectory)
}

defer func() {
if lock, ok := d.stagingDirsLocks[parentStagingDir]; ok {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this, ApplyDiffFromStagingDirectory unlocks the lock on some error return paths, but not all. A caller can’t deal with that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… but there are also various error paths on callers. So maybe the staging area should only be unlocked when this succeeds?

Or maybe this does not matter?

Note to self: FIXME investigate.

@edsantiago
Copy link
Member

Well, FWIW, with this and the c-i PR podman tests now pass

@giuseppe giuseppe force-pushed the lock-staging-dir branch from ff3eccc to 631b7b9 Compare May 2, 2024 20:36
@giuseppe
Copy link
Member Author

giuseppe commented May 3, 2024

pushed a new version that addresses all the comments.

@rhatdan
Copy link
Member

rhatdan commented May 4, 2024

@mtrmac waiting on you.

@giuseppe
Copy link
Member Author

giuseppe commented May 6, 2024

@mohanboddu should the jira label create automatically the issue?

@giuseppe
Copy link
Member Author

giuseppe commented May 8, 2024

I'd like to get this into the next release, can we move it forward?

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to unblock progress, not a careful review I’m afraid.

pkg/lockfile/lockfile.go Show resolved Hide resolved
pkg/lockfile/lockfile_test.go Outdated Show resolved Hide resolved
store.go Show resolved Hide resolved
store.go Show resolved Hide resolved
giuseppe added 5 commits May 10, 2024 15:00
Signed-off-by: Giuseppe Scrivano <[email protected]>
Signed-off-by: Giuseppe Scrivano <[email protected]>
the callers in c/image were already replaced, so simplify the store
API and drop the functions.

Signed-off-by: Giuseppe Scrivano <[email protected]>
this is a preparatory patch to allow storing a lock file for each
staging directory.

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe giuseppe force-pushed the lock-staging-dir branch from 631b7b9 to 780f7c1 Compare May 10, 2024 13:21
giuseppe added 2 commits May 10, 2024 15:42
extend the public API to allow a non blocking usage.

Signed-off-by: Giuseppe Scrivano <[email protected]>
lock any staging directory while it is being used so that another
process cannot delete it.

Now the Cleanup() function deletes only the staging directories that
are not locked by any other user.

Closes: containers#1915

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe giuseppe force-pushed the lock-staging-dir branch from 780f7c1 to f1352df Compare May 10, 2024 13:42
@giuseppe
Copy link
Member Author

Just to unblock progress, not a careful review I’m afraid.

Thanks. Fixed the comments and pushed a new version

@edsantiago
Copy link
Member

Successful CI run with composefs in #22425, with latest (this morning's) main. Also a successful run this weekend but that was before the giant vendor merge. Either way, tentative LGTM from the passing-tests front.

@rhatdan
Copy link
Member

rhatdan commented May 13, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm label May 13, 2024
@openshift-merge-bot openshift-merge-bot bot merged commit 0cea595 into containers:main May 13, 2024
18 checks passed
// If we're the first reference on the lock, we need to open the file again.
fd, err := openLock(l.file, l.ro)
if err != nil {
l.rwMutex.Unlock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ For readLock, this must be RUnlock.

// reader lock or a writer lock.
if err = lockHandle(l.fd, lType, true); err != nil {
closeHandle(fd)
l.rwMutex.Unlock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ For readLock, this must be RUnlock.

defer s.containerStore.stopWriting()

// putLayer requires the rlstore, rlstores, as well as s.containerStore (even if not an argument to this function) to be locked for write.
func (s *store) putLayer(rlstore rwLayerStore, rlstores []roLayerStore, id, parent string, names []string, mountLabel string, writeable bool, lOptions *LayerOptions, diff io.Reader, slo *stagedLayerOptions) (*Layer, int64, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the read-only layer stores must not be locked on entry.

@giuseppe
Copy link
Member Author

@mtrmac thanks for the review, I've opened a PR: #1926

openshift-merge-bot bot added a commit that referenced this pull request May 22, 2024
Fix locking bugs from #1916, and one more
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants