-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split RW and RO Layer locks per method #473
Conversation
This is now ready for a first review, I tried to implement with as low overhead as possible. |
Tests are failing? |
Yes, thanks for the hint. My main concern right now is that we give the responsibility over "locking the right way" to the consumer of the API. But since the option is off per default and not serialized into the config, it should be fine... |
LGTM |
Aren't we having a meeting for this topic on Friday? |
The intention of the meeting was more about locking enhancements per layer, see: #421 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not yet convinced about a dedicated option, mostly because I am scared of the increasing complexity of locking and the locking paths. But also reading the commit message
"This allows use cases like listing container images while pushing an image to a remote location in parallel."
I don't completely understand why we need a dedicated option; I'd expect read-only functions (e.g., listing containers) to always acquire reader locks.
Considering #420 (comment) and the comments in this issue, I begin to think that new{RO,}LayerStore
shouldn't lock at all - it doesn't add up to me to have this critical section conditionally as writer or reader since both execute the same code.
Maybe I just need a fresh coffee :^) In any case, @nalind is the expert of this code, so I want him to weigh in.
Ready for a new round of review, @vrothberg @nalind @rhatdan PTAL |
Can you explain what the code is intended to be doing now (and ideally update the commit message)? Why can't we drop the locks (see #473 (comment)) causing parts of the problem? |
We can, but we change the behavior of the cleanup on init. I don't think that this is harmful and created a dedicated PR for that, since it is not directly related to this topic: #493 |
@saschagrunert @vrothberg @nalind Bringing this back up for discussion. Where are we on this now? Should we go forward with this? |
@saschagrunert @vrothberg PING... |
We should reevaluate how far #493 brought us. |
Yes some details have change but the main approach would be the same. I ping you here when this is ready for review. |
containers/podman#4660 is now working, PTAL @rhatdan @vrothberg @nalind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@vrothberg @nalind PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really miss a detailed explanation why we need to go this path but cannot optimize reading code paths. Does setting ForceReadOnlyLayers
prevent us from writing? If I set ForceReadOnlyLayers=true
, I should not be able to write data. Can we give this guarantee?
It somehow feels like a symptomatic fix for a deeper issue in c/storage. Given we have a database, I expect reading accesses to proceed unless the specific data is currently being written to. Following this analogy, we are now marking the database as RO entirely because for some reason it doesn't perform well. So if I want to write to it at a later point, I had to create a new object which allows for writes without data corruption.
My understanding is, right now if to users of the storage want to use it at the same time, the second one in, needs to wait til the first is done. In the new design, if both users want to READ, then they are allowed in, and only writers would block. So a writer can not write while a reader exists. And readers can not read when a writer holds the lock. I guess there is a chance that writers would get frozen out if a continuous stream of readers come in. |
The main issue IMHO is that we decide the READ or WRITE access when creating the store and try to enforce that. So we are not fixing/optimizing the code paths but change the nature of the locks altogether. This only works for short-lived store objects (see https://github.com/containers/libpod/pull/4660/files) and is not generally useful for CRI-O (or the Assume I want to access the store as a reader: I have to create a new object, then do my work and subsequent writing accesses would require to create a new object. I would prefer to make c/storage smart enough to figure out when we need to acquire the locks as a reader or writer (and further localize the bottlenecks). It really feels like a symptomatic patch while the underlying bottlenecks in c/storage remain. |
Ok so Nalin and I talked about this further, and the problem is each Runtime when it starts takes a Read/Write lock, even if they do not need it. So when you are pushing the image, you only have a read/lock, but when |
I preferred this ^ approach, yes. It's transparent to consumers of c/storage and seems to tackle the root cause. |
@kolyshkin I would love to have you review this PR. |
I'm kind of in a trial and error mode now with the Podman PR containers/podman#7603. I need the integration/system tests there to see if it works, because the issue is not reproducible on my machine. I'll keep you posted, this can take some iterations. |
The Podman PR as well as this one is now green, but I have to verify that my changes logically make sense… |
@saschagrunert Just to move the boudary a little further, could you try a PR against CRI-O to see how that goes. |
Signed-off-by: Sascha Grunert <[email protected]>
Yep, opened the PR there: cri-o/cri-o#4339 |
@vrothberg @nalind @TomSweeneyRedHat @mtrmac @mrunalp This PR is now passing tests in Podman and CRI-O, I believe it is time to give it another review. |
@saschagrunert , is your work intended to fix issues like: |
Accessing layers in RO or RW mode is now decided in the dedicated method which increases the flexibility in terms of locking. This allows use cases like listing container images while pushing an image to a remote location in parallel. Signed-off-by: Sascha Grunert <[email protected]> Signed-off-by: Sascha Grunert <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kolyshkin I fixed your mentioned review points. PTAL
@saschagrunert , sorry for double posting but I think it is important to know whether your PR is suppose to fix issues like containers/podman#8667 If not, then could you clarify what use cases should be considered supported? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, as a stranger to the codebase, I’d appreciate fairly detailed comments about the design: What do the various locks protect, what is authoritative data vs. caches, and so on and so on.
- The concurrent maintenance of two stores seems risky to me, see elsewhere. Wouldn’t it make sense to only have an one-directional “upgrade”, where the
store
either has a read-only layer store, or replaces it with a read-write, but never creates a read-write one again? - Having two stores means two caches = two reads of the data. Is that really worth it?
- If there has to be a separate
ShutdownRO
and the caller is expected to call it to benefit from it (and the code breaks if the caller calls RO after updates!), this is not completely transparent anyway. How are the library callers going to use it, then? If the code that creates/shuts down thestorage.Store
must know whether the store is RO or RW, it would seem much cleaner to me to have that code create thestorage.Store
in a RO or RW mode (and then c/storage itself can be in control of its invariants, e.g. refuse to do any write operations on a RO store). (Yes, I have noticed that this was considered about a year ago; as I said at the start, I’m a stranger to the code base and my opinion should not be authoritative — I can’t really understand the CRI-O argument in there, for example.)
@@ -1470,7 +1455,8 @@ func (r *layerStore) Modified() (bool, error) { | |||
} | |||
|
|||
func (r *layerStore) IsReadWrite() bool { | |||
return r.lockfile.IsReadWrite() | |||
// the mountsLockfile is only set for read-write enabled stores. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This is preexisting = non-blocking:) This looks like a important invariant; it should be documented with the structure field.
@@ -524,27 +530,6 @@ func (s *store) newLayerStore(rundir string, layerdir string, driver drivers.Dri | |||
return &rlstore, nil | |||
} | |||
|
|||
func newROLayerStore(rundir string, layerdir string, driver drivers.Driver) (ROLayerStore, error) { | |||
lockfile, err := GetROLockfile(filepath.Join(layerdir, "layers.lock")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newLayerStore
as a replacement for this will always create a RW lockfile. Is that desirable? It changes semantics of r.lockfile.RecursiveLock
== r.RecursiveLock
, which seems worrisome.
OTOH
storage/pkg/lockfile/lockfile.go
Lines 92 to 100 in 9f88983
if locker, ok := lockfiles[cleanPath]; ok { | |
if ro && locker.IsReadWrite() { | |
return nil, errors.Errorf("lock %q is not a read-only lock", cleanPath) | |
} | |
if !ro && !locker.IsReadWrite() { | |
return nil, errors.Errorf("lock %q is not a read-write lock", cleanPath) | |
} | |
return locker, nil | |
} |
@@ -1470,7 +1455,8 @@ func (r *layerStore) Modified() (bool, error) { | |||
} | |||
|
|||
func (r *layerStore) IsReadWrite() bool { | |||
return r.lockfile.IsReadWrite() | |||
// the mountsLockfile is only set for read-write enabled stores. | |||
return r.lockfile.IsReadWrite() && r.mountsLockfile != nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r.lockfile.IsReadWrite
is always true now, AFAICS.
func (s *store) LayerStore() (LayerStore, error) { | ||
s.graphLock.Lock() | ||
defer s.graphLock.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, this lock protected all of LayerStore
, including s.getGraphDriver
; now the lock is scoped to resetIfNeeded
; why is that sufficient? (It might be, but it’s not at all obvious to me — the lock’s scope is not documented.)
@@ -491,7 +491,10 @@ func (r *layerStore) saveMounts() error { | |||
return r.loadMounts() | |||
} | |||
|
|||
func (s *store) newLayerStore(rundir string, layerdir string, driver drivers.Driver) (LayerStore, error) { | |||
// newLayerStore creates a new layer storage based on the provided directories | |||
// and selected driver. The `withMountsLock` bool indicates if we want to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it really withMountsLock
or readWrite
? The latter seems semantically more useful, and might e.g. allow skipping the MkdirAll
calls below.
@@ -582,6 +593,7 @@ type store struct { | |||
autoNsMaxSize uint32 | |||
graphDriver drivers.Driver | |||
layerStore LayerStore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name this rwLayerStore
? Possibly document the fields, even if they are “a private cache for $method”.
@@ -582,6 +593,7 @@ type store struct { | |||
autoNsMaxSize uint32 | |||
graphDriver drivers.Driver | |||
layerStore LayerStore | |||
roLayerStore ROLayerStore | |||
roLayerStores []ROLayerStore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose the user calls:
s.Names(foo)
// PopulatesroLayerStore
, reads from disks.PutLayer(…)
// PopulateslayerStore
, reads + writes to disks.Names(foo)
What happens now? AFAICS:
s.graphLock.TouchedSince
is false (no.Shutdown*
), so the cachedroLayerStore
is used- The RO and RW layer stores share lock objects (
storage/pkg/lockfile/lockfile.go
Lines 92 to 100 in 9f88983
if locker, ok := lockfiles[cleanPath]; ok { if ro && locker.IsReadWrite() { return nil, errors.Errorf("lock %q is not a read-only lock", cleanPath) } if !ro && !locker.IsReadWrite() { return nil, errors.Errorf("lock %q is not a read-write lock", cleanPath) } return locker, nil } .Touch()
calls on locks,.Modified()
will befalse
- The RO layer store will continue reading from its cached data, regardless of modifications by the RW store.
(This is only by reading code I’m unfamiliar with, so I may well be missing something or making a stupid mistake.)
The main use case was to be able to list container images while another push operation in in progress. I think containers/podman#8667 could be solved when locking by layer and not the whole storage. But as folks already mentioned, this would mean a architectural rewrite of containers/storage and I'm not sure if we will get to this point anytime soon. |
I think if we put API compatibility aside for now, then we could create a store which is ready-only locked per default. Just some operations should then be able to lock the certain parts of the store. For example, storing a layer should only RW lock that part of the store. Other operations (read-only or read-write which do not affect the WIP layers) should be still possible. The API compatibility is the main burden here. c/storage is designed to cleanup the layers after storage shutdown, which will always lock the complete layer store for example.
No I don't think so and agree with your concern. The only reason to implement it like this is to stay compatible to current consumers of this project.
Manually, library consumers have to decide on their own if they want to shutdown with cleanup or without. For example a All in all I think we will have a hard time keeping the current API stable while improving the locking. We could probably start with a parallel implementation (which breaks the API) and integrate the enhancements mentioned by @mtrmac. If we speak only about refactoring the layer store, then the main strategy would be:
What do you think? |
Fine-grained locking is very difficult to get right and a common source of concurrency bugs. Having a per-layer lock, for instance, is subject to ABBA dead locks (assuming a process is allowed to lock multiple layers at once). One thing we must consider as well is backward compatibility. If we fundamentally change how c/storage works, older and newer versions cannot run on the same machine. I prefer to keep the coarse-grained locks unless there is no alternative. There may still be more space for optimization (see #420 (comment)) which does not entirely rule out lock monopolization. |
What happens with the current consistency fix-ups that write data on load (see … except that, well, that’s what the code is already doing: So… what’s the actual failure path this PR is trying to address? That |
I opened containers/image#1167 which is tackling the issue with a different perspective. Ultimately, solving the issue in storage is favorable but as shown here, it's very difficult. If callers need to worry about executions paths reading or writing data is likely causing future regressions as it's hard to get right. containers/image#1167 makes pushing a bit slower but it solves the problem of being network bound until unlocking the storage. |
The new storage options
ReadOnlyLayers
indicates that thelayer store should be locked for read only access. This allows use cases
like listing container images while pushing an image to a remote
location in parallel.
Closes #420