You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As discussed in #2254 the progress controller code has some known specific issues in addition to it (and the progress writing in general) being hard to reason about:
Trying to keep track of the state of a progress stream (i.e. whether one needs to be closed, which one is being used within a flightcontrol group, etc.) is difficult and involves tracking hierarchies of Contexts. See i.e. goroutine leak on solves #2112 and Add retry on image push 5xx errors #2043
The progress.Controller is passed through descriptor handlers which then get passed around all over the place through refs, meaning events occurring on single cache record need to synchronize with the controller callbacks that originate from all over the codebase. This makes the above point significantly more convoluted.
All of the above makes the code involving progress (of which there is a lot) harder to change and more bug-prone.
There's a lot of possible approaches to improving this. One initial suggestion would be to centralize progress handling in a singleton ProgressManager object, which has a pub-sub API for publishing events and subscribing progress streams to them. So, as a more concrete example:
At the beginning of a build, a progress stream is registered with the ProgressManager
As the build progresses, it subscribes its stream to certain events it wants to show progress for. E.g. the pull code would, instead of setting up a progress controller in the descriptor handler, just call the ProgressManager and subscribe the current stream to any events for the descriptors it's creating refs for.
Event publishing would happen independently, e.g. when a descriptor is being de-lazied and actually pulled, the CacheManager would just publish those events to the ProgressManager, which would internally handle forwarding those events to any open progress streams subscribed to them. This synchronization of writers+readers is much simpler due to centralization of it.
At the end of the build, the progress stream is closed with the ProgressManager
That's just one starting idea, in general anything that centralizes synchronization logic and decouples the different parts of the code that read/write progress would probably be a promising approach.
As discussed in #2254 the progress controller code has some known specific issues in addition to it (and the progress writing in general) being hard to reason about:
Context
s. See i.e. goroutine leak on solves #2112 and Add retry on image push 5xx errors #2043progress.Controller
is passed through descriptor handlers which then get passed around all over the place through refs, meaning events occurring on single cache record need to synchronize with the controller callbacks that originate from all over the codebase. This makes the above point significantly more convoluted.All of the above makes the code involving progress (of which there is a lot) harder to change and more bug-prone.
There's a lot of possible approaches to improving this. One initial suggestion would be to centralize progress handling in a singleton
ProgressManager
object, which has a pub-sub API for publishing events and subscribing progress streams to them. So, as a more concrete example:ProgressManager
ProgressManager
and subscribe the current stream to any events for the descriptors it's creating refs for.CacheManager
would just publish those events to theProgressManager
, which would internally handle forwarding those events to any open progress streams subscribed to them. This synchronization of writers+readers is much simpler due to centralization of it.ProgressManager
That's just one starting idea, in general anything that centralizes synchronization logic and decouples the different parts of the code that read/write progress would probably be a promising approach.
cc @coryb @tonistiigi
The text was updated successfully, but these errors were encountered: