-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Split controllers into thin workqueue controllers and "receivers". #1110
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mattmoor The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@mattmoor: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
return err | ||
} | ||
|
||
for _, dr := range c.receivers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, the effects of this loop are:
- All receivers must succeed to mark this item successful in the workqueue.
- If a single receiver returns an error, all following receivers are skipped and the item is retried later (using the default workqueue retry rules).
- Receivers that return success before a later receiver returns an error will be tried again with the same item (using the default workqueue retry rules).
- Only one receiver may process an item at a time.
I'm concerned about the conflation of success/failure of all receivers into the failure state of a single queue item. Issues that come to mind include:
- If a receiver consistently fails to process an event successfully, receivers later in the list never see that event.
- Later receivers must wait until all prior receivers have finished before processing an event, leading to higher reconcile latencies for those receivers.
Randomizing or reversing the order of the list would mitigate these issues somewhat, but I suspect they'd still cause hard-to-debug problems occasionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I specifically included this ~disclaimer in the pseudo-code of the body above for this reason:
// This is an example, we could also fan out goroutines with errgroup, or something else.
for _, fr := range c.receivers {
It would certainly mean a coarser grain of retries to go this route, but I'm unsure if that's a thing we should optimize for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, errgroup would fix the starvation and latency issues. The retries thing feels vaguely incorrect but I agree it doesn't seem like a blocker right now.
It's possible that in larger clusters the additional reconciles would cause unacceptable apiserver load, but at that point there's nothing stopping the internals from changing to one workqueue per receiver. The Receiver interface remains the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not aware of any controller across the kube ecosystem that has succeeded in having shared workqueues, since it becomes really difficult to reason about the state of the caches relative to the action. Generally the most successful patterns are where you try to get all of your work into a single action (level driven) and queue up synchronizations of that. I've seen that scale to the largest kube clusters in the world, so you're unlikely to go too far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a quick question about the model before I dig deeper.
|
||
for _, dr := range c.receivers { | ||
// Don't modify the informer's copy, and give each receiver a fresh copy. | ||
cp := original.DeepCopy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is this really what want here? Seems like if a receiver (say the first one) modifies the object, then by copying the original, we're now sending an out of date object to the next receiver (say the second one) and if they try to update it, seems like the update will be rejected by k8s because the object is out of date, or the receiver will have to explicitly fetch a newer version of the object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more important in the case of parallelism (see the other thread about using goroutines here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like it would be more correct to give each receiver the resource key and a lookup/update function. The controller (that name seems incorrect given this new pattern -- maybe "observer"?) could then perform caching, fetches, and possibly even merging as needed.
Unfortunately, if all of our controllers are actually using GetControllerOf
, they'll probably all need to fetch the object in order to figure out if they should be acting on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd rather see some sort of explicit registration of type -> observed object, rather than the implicit format we have here (mostly as I noticed that e.g. route implements a Receiver for 3-4 types, but only one of those is type-asserted).
revision.NewController(kubeClient, elaClient, kubeInformerFactory, elaInformerFactory, buildInformerFactory, cfg, &revControllerConfig, logger), | ||
route.NewController(kubeClient, elaClient, kubeInformerFactory, elaInformerFactory, cfg, *controllerConfig, autoscaleEnableScaleToZero, logger), | ||
service.NewController(kubeClient, elaClient, kubeInformerFactory, elaInformerFactory, cfg, *controllerConfig, logger), | ||
// The receivers are what implement our core logic. Each of these subscribe to some subset of the resources for which we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Receiver" feels like a funny name for core logic.
What about "Implementation", "Director", "Manager", or "Producer"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally up for a naming 🚲🏠 once we settle on if/what we're naming :)
receivers := []interface{}{ | ||
revrcv.New(kubeClient, elaClient, kubeInformerFactory, elaInformerFactory, cfg, &revControllerConfig, logger), | ||
rtrcv.New(kubeClient, elaClient, kubeInformerFactory, elaInformerFactory, cfg, *controllerConfig, autoscaleEnableScaleToZero, logger), | ||
cfgrcv.New(kubeClient, elaClient, buildClient, kubeInformerFactory, elaInformerFactory, cfg, *controllerConfig, logger), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit/note: it feels to me like a common "Options" struct would be useful here for configuring these receivers. In particular, it feels like there are small differences in the arguments (e.g. the configuration takes a buildClient as the 3rd argument, but the others take additional arguments around the 6th position, with logger always last) which makes this harder to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, totally agree. We see the same thing with the controller constructors today (initially we were trying to use a common ctor and a loop to cut down on this boilerplate, but ultimately unrolled and diverged).
Following on @mdemirhan's initial refactoring, we should perhaps define controller.Options
with the common arguments taken by controller.Base
and a majority of these argument lists would shrink dramatically. I can open an issue to track this discussion.
if err != nil { | ||
logger.Fatalf("Error creating Ingress controller: %v", err) | ||
} | ||
controllers := []controller.Interface{rtc, svc, config, rc, dc, ec, bc, ic} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if dr, ok := rcv.(Receiver); ok { | ||
controller.receivers = append(controller.receivers, dr) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest doing this block above all the informer setup; that way there's no need to do any cleanup in the error case. (In the possible future where we wanted to be able to flag enable certain features/directors.)
|
||
// syncHandler compares the actual state with the desired, and attempts to | ||
// converge the two. It then updates the Status block of the Build | ||
// resource with the current status of the resource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment still accurate?
"If the Deployment controller is no longer needed it should be removed.") | ||
} | ||
return controller, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like there's a lot of duplicated code for these controllers/observers. I'm guessing that's so we can more easily see what's going on, and that the final code will be factored so that each observer is ~2 dozen lines?
Receiver interface
Type/lister functions
Registration of constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@evankanderson Yes, in fact each is a copy/paste and search/replace of a stripped down implementation. I think the goal of these should be: so boilerplate we stop reviewing them, which to me says: "codegen" (thus the caption on the diagram). If we can reach that point through other means, even better :)
return err | ||
} | ||
|
||
revClient := c.ElaClientSet.ServingV1alpha1().Revisions(config.Namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it should go through the Revision observer, rather than doing direct gets against the clientset. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, I worry about going full ORM here -- maybe standardizing the update interface more formally is sufficient?
) | ||
|
||
// SyncRevision implements revsion.Receiver | ||
func (c *Receiver) SyncRevision(revision *v1alpha1.Revision) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slight packaging note -- I'm not sure that I like splitting the controller/manager logic across multiple files in this way. I feel like it becomes harder to actually grok what the controller is doing.
I do like defining the "Receiver" interfaces to create some common interfaces for managing updates for different types. Passing in some sort of RevisionObservation rather than the concrete Revision object would allow us to provide additional information later if needed, such as the previous state of the object from the EventHandler's UpdateFunc
, or timing information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This organization was to try and improve the clarity of the entrypoints to reconciliation. We can avoid this in follow up.
// TODO(mattmoor): Move controller deps into receiver | ||
"github.com/knative/serving/pkg/controller" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"helper.go" seems like a kitchen sink -- I'd rather see these methods distributed to be close to their users, or next to the definition of the Receiver struct if most appropriate.
} | ||
|
||
// Receiver implements ingress.Receiver | ||
var _ ingress.Receiver = (*Receiver)(nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it also implements configuration.Receiver?
Though I think these type assertions are optional anyway, right?
@evankanderson FYI, this was meant to be illustrative in its completeness for the purposes of high-level RFC. It is not my intention to check this in as a monolith, but to take the parts of it that folks find valuable and increment towards this sort of model (probably the same way I built this up with lots of functional checkpoints). Otherwise I'll live in merge hell until I put everyone else into merge hell, and nobody wants that :) Totally ack your nits, and I'll try to keep these in mind if/when we pursue parts of this in follow up. Apologies for not making my intentions with this change clearer, and thanks for all of the feedback. |
@mdemirhan suggested that we may want to add a |
This is an experimental refactor to illustrate a proposal. I don't plan to make this a monolithic commit, but wanted to illustrate what things might look like in this new world.
The background of this change is that as
./pkg/controller/...
has grown, it's become more difficult to follow. In addition, the way in which we watch assorted objects is inconsistent which may mean there are varying degrees of subtle bugs waiting to bite us (related: #823). This change isn't intended to solve all these problems, but to hopefully be a step in the right direction.The general idea is illustrated in this diagram:
Legend:
pkg/controller/...
In this proposal, these directories would stop holding business logic and become thin workqueues for handling events we are sensitive to. So how does it work?
We would define a fairly boilerplate (possibly codegen-able)
pkg/controller/foo
for each typefoo
we are watching. Today this contains the same boilerplate skeleton we've been using. It also defines a trivialReceiver
interface to deliver these events to suitableReceivers
, e.g.At construction, each controller iterates over a list of potential receivers to determine viability through:
As events are received, the controller will delegate reconciliation to the matching receivers:
pkg/receiver/...
We will have
pkg/receiver/bar
1:1 with today's notion of controllers, so{revision, route, service, configuration}
. Eachbar
implements some set offoo.Receiver
interfaces:foo == bar
, thebar
receiver generally always implementsbar.Receiver
(e.g.pkg/receiver/revision
implementsrevision.Receiver
).foo != bar
, thebar
receiver may also implementfoo.Receiver
for other resources it wants to observe (e.g.pkg/receiver/revision
implementsdeployment.Receiver
).Within
pkg/receiver/bar/...
in this PR, the files are organized as follows:receiver.go
: defines the receiver type, interface assertions, andNew()
constructor.foo.go
: the portion of the receiver implementation that implementsfoo.Receiver
, by conventionSyncFoo
is at the top of the file.helper.go
: holds misc shared utilities within the package.cc @evankanderson @vaikas-google @grantr