[WIP] Split controllers into thin workqueue controllers and "receivers". #1110

mattmoor · 2018-06-08T17:47:20Z

This is an experimental refactor to illustrate a proposal. I don't plan to make this a monolithic commit, but wanted to illustrate what things might look like in this new world.

The background of this change is that as ./pkg/controller/... has grown, it's become more difficult to follow. In addition, the way in which we watch assorted objects is inconsistent which may mean there are varying degrees of subtle bugs waiting to bite us (related: #823). This change isn't intended to solve all these problems, but to hopefully be a step in the right direction.

The general idea is illustrated in this diagram:

Legend:

Bold lines are simply to indicate the primary resources a given receiver watches.
Dotted lines reflect "watches" that we should probably have, but do not today.
solid lines indicate the secondary / sub-resources a given receiver watches.

`pkg/controller/...`

In this proposal, these directories would stop holding business logic and become thin workqueues for handling events we are sensitive to. So how does it work?

We would define a fairly boilerplate (possibly codegen-able) pkg/controller/foo for each type foo we are watching. Today this contains the same boilerplate skeleton we've been using. It also defines a trivial Receiver interface to deliver these events to suitable Receivers, e.g.

package foo

type Receiver {
   SyncFoo(f *v37.Foo) error
}

At construction, each controller iterates over a list of potential receivers to determine viability through:

for _, rcvr := range possibleReceivers {
  if fr, ok := rcvr.(*Receiver); ok {
    controller.receivers = append(controller.receivers, fr)
  }
}

if len(controller.receivers) == 0 {
  log.Fatalf("This controller matched no receivers.  Remove it if no longer needed.")
}

As events are received, the controller will delegate reconciliation to the matching receivers:

// This is an example, we could also fan out goroutines with errgroup, or something else.
for _, fr := range c.receivers {
  if err := fr.SyncFoo(foo); err != nil {
    ... 
  }
}

`pkg/receiver/...`

We will have pkg/receiver/bar 1:1 with today's notion of controllers, so {revision, route, service, configuration}. Each bar implements some set of foo.Receiver interfaces:

foo == bar, the bar receiver generally always implements bar.Receiver (e.g. pkg/receiver/revision implements revision.Receiver).
foo != bar, the bar receiver may also implement foo.Receiver for other resources it wants to observe (e.g. pkg/receiver/revision implements deployment.Receiver).

Within pkg/receiver/bar/... in this PR, the files are organized as follows:

receiver.go: defines the receiver type, interface assertions, and New() constructor.
foo.go: the portion of the receiver implementation that implements foo.Receiver, by convention SyncFoo is at the top of the file.
helper.go: holds misc shared utilities within the package.
I also simply moved the existing sibling files we had in the controllers.

cc @evankanderson @vaikas-google @grantr

google-prow-robot · 2018-06-08T17:47:31Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mattmoor

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [mattmoor]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

google-prow-robot · 2018-06-08T17:49:25Z

@mattmoor: The following tests failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
pull-knative-serving-go-coverage	`b0cf75f`	link	`/test pull-knative-serving-go-coverage`
pull-knative-serving-unit-tests	`b0cf75f`	link	`/test pull-knative-serving-unit-tests`
pull-knative-serving-build-tests	`b0cf75f`	link	`/test pull-knative-serving-build-tests`
pull-knative-serving-integration-tests	`b0cf75f`	link	`/test pull-knative-serving-integration-tests`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

grantr · 2018-06-08T18:41:50Z

pkg/controller/build/build.go

+		return err
+	}
+
+	for _, dr := range c.receivers {


IIUC, the effects of this loop are:

All receivers must succeed to mark this item successful in the workqueue.

If a single receiver returns an error, all following receivers are skipped and the item is retried later (using the default workqueue retry rules).

Receivers that return success before a later receiver returns an error will be tried again with the same item (using the default workqueue retry rules).

Only one receiver may process an item at a time.

I'm concerned about the conflation of success/failure of all receivers into the failure state of a single queue item. Issues that come to mind include:

If a receiver consistently fails to process an event successfully, receivers later in the list never see that event.

Later receivers must wait until all prior receivers have finished before processing an event, leading to higher reconcile latencies for those receivers.

Randomizing or reversing the order of the list would mitigate these issues somewhat, but I suspect they'd still cause hard-to-debug problems occasionally.

Yes, I specifically included this ~disclaimer in the pseudo-code of the body above for this reason:

// This is an example, we could also fan out goroutines with errgroup, or something else. for _, fr := range c.receivers {

It would certainly mean a coarser grain of retries to go this route, but I'm unsure if that's a thing we should optimize for.

Good point, errgroup would fix the starvation and latency issues. The retries thing feels vaguely incorrect but I agree it doesn't seem like a blocker right now.

It's possible that in larger clusters the additional reconciles would cause unacceptable apiserver load, but at that point there's nothing stopping the internals from changing to one workqueue per receiver. The Receiver interface remains the same.

I'm not aware of any controller across the kube ecosystem that has succeeded in having shared workqueues, since it becomes really difficult to reason about the state of the caches relative to the action. Generally the most successful patterns are where you try to get all of your work into a single action (level driven) and queue up synchronizations of that. I've seen that scale to the largest kube clusters in the world, so you're unlikely to go too far.

vaikas

Just a quick question about the model before I dig deeper.

vaikas · 2018-06-09T17:08:27Z

pkg/controller/build/build.go

+
+	for _, dr := range c.receivers {
+		// Don't modify the informer's copy, and give each receiver a fresh copy.
+		cp := original.DeepCopy()


So is this really what want here? Seems like if a receiver (say the first one) modifies the object, then by copying the original, we're now sending an out of date object to the next receiver (say the second one) and if they try to update it, seems like the update will be rejected by k8s because the object is out of date, or the receiver will have to explicitly fetch a newer version of the object?

This is more important in the case of parallelism (see the other thread about using goroutines here).

It feels like it would be more correct to give each receiver the resource key and a lookup/update function. The controller (that name seems incorrect given this new pattern -- maybe "observer"?) could then perform caching, fetches, and possibly even merging as needed.

Unfortunately, if all of our controllers are actually using GetControllerOf, they'll probably all need to fetch the object in order to figure out if they should be acting on it.

evankanderson

I think I'd rather see some sort of explicit registration of type -> observed object, rather than the implicit format we have here (mostly as I noticed that e.g. route implements a Receiver for 3-4 types, but only one of those is type-asserted).

evankanderson · 2018-06-13T13:42:09Z

cmd/controller/main.go

-		revision.NewController(kubeClient, elaClient, kubeInformerFactory, elaInformerFactory, buildInformerFactory, cfg, &revControllerConfig, logger),
-		route.NewController(kubeClient, elaClient, kubeInformerFactory, elaInformerFactory, cfg, *controllerConfig, autoscaleEnableScaleToZero, logger),
-		service.NewController(kubeClient, elaClient, kubeInformerFactory, elaInformerFactory, cfg, *controllerConfig, logger),
+	// The receivers are what implement our core logic.  Each of these subscribe to some subset of the resources for which we


"Receiver" feels like a funny name for core logic.

What about "Implementation", "Director", "Manager", or "Producer"?

Totally up for a naming 🚲🏠 once we settle on if/what we're naming :)

evankanderson · 2018-06-13T13:43:58Z

cmd/controller/main.go

+	receivers := []interface{}{
+		revrcv.New(kubeClient, elaClient, kubeInformerFactory, elaInformerFactory, cfg, &revControllerConfig, logger),
+		rtrcv.New(kubeClient, elaClient, kubeInformerFactory, elaInformerFactory, cfg, *controllerConfig, autoscaleEnableScaleToZero, logger),
+		cfgrcv.New(kubeClient, elaClient, buildClient, kubeInformerFactory, elaInformerFactory, cfg, *controllerConfig, logger),


Nit/note: it feels to me like a common "Options" struct would be useful here for configuring these receivers. In particular, it feels like there are small differences in the arguments (e.g. the configuration takes a buildClient as the 3rd argument, but the others take additional arguments around the 6th position, with logger always last) which makes this harder to use.

Yeah, totally agree. We see the same thing with the controller constructors today (initially we were trying to use a common ctor and a loop to cut down on this boilerplate, but ultimately unrolled and diverged).

Following on @mdemirhan's initial refactoring, we should perhaps define controller.Options with the common arguments taken by controller.Base and a majority of these argument lists would shrink dramatically. I can open an issue to track this discussion.

evankanderson · 2018-06-13T13:46:29Z

cmd/controller/main.go

+	if err != nil {
+		logger.Fatalf("Error creating Ingress controller: %v", err)
+	}
+	controllers := []controller.Interface{rtc, svc, config, rc, dc, ec, bc, ic}


It seems like there may be two different goals here:

Reduce the number of informers/watches managed by the controller.

Normalize the process of having changes delivered to the business logic.

I'm presuming the priority is on #2, and not on #1?

evankanderson · 2018-06-13T13:51:00Z

pkg/controller/build/build.go

+		if dr, ok := rcv.(Receiver); ok {
+			controller.receivers = append(controller.receivers, dr)
+		}
+	}


I'd suggest doing this block above all the informer setup; that way there's no need to do any cleanup in the error case. (In the possible future where we wanted to be able to flag enable certain features/directors.)

evankanderson · 2018-06-13T13:55:29Z

pkg/controller/build/build.go

+
+// syncHandler compares the actual state with the desired, and attempts to
+// converge the two. It then updates the Status block of the Build
+// resource with the current status of the resource.


Is this comment still accurate?

evankanderson · 2018-06-13T14:25:25Z

pkg/controller/deployment/deployment.go

+			"If the Deployment controller is no longer needed it should be removed.")
+	}
+	return controller, nil
+}


It feels like there's a lot of duplicated code for these controllers/observers. I'm guessing that's so we can more easily see what's going on, and that the final code will be factored so that each observer is ~2 dozen lines?

Receiver interface
Type/lister functions
Registration of constructor

@evankanderson Yes, in fact each is a copy/paste and search/replace of a stripped down implementation. I think the goal of these should be: so boilerplate we stop reviewing them, which to me says: "codegen" (thus the caption on the diagram). If we can reach that point through other means, even better :)

evankanderson · 2018-06-13T14:28:12Z

pkg/receiver/configuration/configuration.go

+		return err
+	}
+
+	revClient := c.ElaClientSet.ServingV1alpha1().Revisions(config.Namespace)


This feels like it should go through the Revision observer, rather than doing direct gets against the clientset. WDYT?

On the other hand, I worry about going full ORM here -- maybe standardizing the update interface more formally is sufficient?

evankanderson · 2018-06-13T14:35:05Z

pkg/receiver/configuration/revision.go

+)
+
+// SyncRevision implements revsion.Receiver
+func (c *Receiver) SyncRevision(revision *v1alpha1.Revision) error {


Slight packaging note -- I'm not sure that I like splitting the controller/manager logic across multiple files in this way. I feel like it becomes harder to actually grok what the controller is doing.

I do like defining the "Receiver" interfaces to create some common interfaces for managing updates for different types. Passing in some sort of RevisionObservation rather than the concrete Revision object would allow us to provide additional information later if needed, such as the previous state of the object from the EventHandler's UpdateFunc, or timing information.

This organization was to try and improve the clarity of the entrypoints to reconciliation. We can avoid this in follow up.

evankanderson · 2018-06-13T14:51:30Z

pkg/receiver/revision/helper.go

+	// TODO(mattmoor): Move controller deps into receiver
+	"github.com/knative/serving/pkg/controller"
+)
+


"helper.go" seems like a kitchen sink -- I'd rather see these methods distributed to be close to their users, or next to the definition of the Receiver struct if most appropriate.

evankanderson · 2018-06-13T14:53:00Z

pkg/receiver/route/receiver.go

+}
+
+// Receiver implements ingress.Receiver
+var _ ingress.Receiver = (*Receiver)(nil)


I think it also implements configuration.Receiver?

Though I think these type assertions are optional anyway, right?

mattmoor · 2018-06-13T15:27:48Z

@evankanderson FYI, this was meant to be illustrative in its completeness for the purposes of high-level RFC. It is not my intention to check this in as a monolith, but to take the parts of it that folks find valuable and increment towards this sort of model (probably the same way I built this up with lots of functional checkpoints). Otherwise I'll live in merge hell until I put everyone else into merge hell, and nobody wants that :)

Totally ack your nits, and I'll try to keep these in mind if/when we pursue parts of this in follow up. Apologies for not making my intentions with this change clearer, and thanks for all of the feedback.

mattmoor · 2018-06-13T16:17:04Z

@mdemirhan suggested that we may want to add a context.Context as the first argument to the SyncFoo methods.

…ain (knative#1110)

WIP Split controllers into thin workqueue controllers and "receivers".

b0cf75f

google-prow-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jun 8, 2018

google-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 8, 2018

evankanderson assigned grantr, rootfs, evankanderson and vaikas Jun 8, 2018

grantr reviewed Jun 8, 2018

View reviewed changes

vaikas reviewed Jun 9, 2018

View reviewed changes

mattmoor mentioned this pull request Jun 11, 2018

Consider splitting labelling into a separate reconciliation process. #1129

Closed

evankanderson reviewed Jun 13, 2018

View reviewed changes

mattmoor mentioned this pull request Jun 13, 2018

NewController constructors have unwieldy argument lists #1189

Closed

mattmoor closed this Jun 13, 2018

skonto pushed a commit to skonto/serving that referenced this pull request May 4, 2022

🤖 Triggering CI on branch 'release-next' after synching to upstream/m…

a7f8444

…ain (knative#1110)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Split controllers into thin workqueue controllers and "receivers". #1110

[WIP] Split controllers into thin workqueue controllers and "receivers". #1110

mattmoor commented Jun 8, 2018

google-prow-robot commented Jun 8, 2018

google-prow-robot commented Jun 8, 2018

grantr Jun 8, 2018

mattmoor Jun 8, 2018

grantr Jun 8, 2018

smarterclayton Jun 13, 2018

vaikas left a comment

vaikas Jun 9, 2018

mattmoor Jun 10, 2018

evankanderson Jun 13, 2018

evankanderson left a comment

evankanderson Jun 13, 2018

mattmoor Jun 13, 2018

evankanderson Jun 13, 2018

mattmoor Jun 13, 2018

evankanderson Jun 13, 2018

evankanderson Jun 13, 2018

evankanderson Jun 13, 2018

evankanderson Jun 13, 2018

mattmoor Jun 13, 2018

evankanderson Jun 13, 2018

evankanderson Jun 13, 2018

evankanderson Jun 13, 2018

mattmoor Jun 13, 2018

evankanderson Jun 13, 2018

evankanderson Jun 13, 2018

mattmoor commented Jun 13, 2018

mattmoor commented Jun 13, 2018

[WIP] Split controllers into thin workqueue controllers and "receivers". #1110

[WIP] Split controllers into thin workqueue controllers and "receivers". #1110

Conversation

mattmoor commented Jun 8, 2018

pkg/controller/...

pkg/receiver/...

google-prow-robot commented Jun 8, 2018

google-prow-robot commented Jun 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vaikas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evankanderson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattmoor commented Jun 13, 2018

mattmoor commented Jun 13, 2018

`pkg/controller/...`

`pkg/receiver/...`