Only do work for instances from a single queue #1074

pmorie · 2017-07-28T05:35:55Z

Fixes the race condition uncovered during #1017 by only doing work for instances from a single work queue. Instances are now added to the polling queue in a rate limited manner, and Instead of the polling queue triggering a re-reconcile of the instance, it instead adds the instance's key into the main instance work queue. This means that only a single goroutine will do work on an instance at a time.

Fixes #780

pmorie · 2017-07-28T05:44:20Z

@vaikas-google your review appreciated, since we've talked about this a bunch before.

pmorie · 2017-07-28T06:01:37Z

In a follow-up, i will add integration tests for more permutations

nilebox · 2017-07-28T06:21:26Z

pkg/controller/controller_instance_test.go

+	// Since polling is rate-limited, it is not possible to check whether the
+	// instance is in the polling queue.
+	//
+	// TODO: add a way to peak into rate-limited adds that are still pending,


Are you going to address this before merging a PR?

Nah, that is a change to client-go

I am going to add a couple additional integration tests in this PR, though.

super nit, s/peak/peek/

pmorie · 2017-07-28T08:18:43Z

We should think about how we want to rate-limit the polling queue. There's a fast-slow rate limiter that makes n fast attempts before switching to slow attempts that seems suitable.

pmorie · 2017-08-01T04:47:48Z

@nilebox test added to PR; we now have integration test coverage for:

async provision/deprovision
sync provision/deprovision
failed provision

vaikas · 2017-07-30T23:24:00Z

pkg/controller/controller_instance.go

+// queue for instances.  It is used to trigger polling for the status of an
+// async operation on and instance and is called by the worker servicing the
+// instance polling queue.  After requeueInstanceForPoll exits, the worker
+// forgets the key from the polling queue, so the controller must call


This comment makes me little worried, since the division of labor here is split between couple of components.

It does seem hard to trace data through the queue, but I'm not sure if we can do anything about it at this point without a large refactor. @pmorie @vaikas-google what are your thoughts RE cleaning up the flow? Either way, I think that work would be outside the scope.

what kind of refactor do you have in mind? we could probably do some method moves that make it clearer, but I think this is the best mechanism we currently have to solve this problem.

vaikas · 2017-07-30T23:31:13Z

pkg/controller/controller_instance_test.go

+	// Since polling is rate-limited, it is not possible to check whether the
+	// instance is in the polling queue.
+	//
+	// TODO: add a way to peak into rate-limited adds that are still pending,


super nit, s/peak/peek/

arschles

@pmorie looks good overall. I made a few comments requesting issues to track future work to improve tests.

arschles · 2017-08-01T21:37:45Z

pkg/controller/controller_instance.go

+// queue for instances.  It is used to trigger polling for the status of an
+// async operation on and instance and is called by the worker servicing the
+// instance polling queue.  After requeueInstanceForPoll exits, the worker
+// forgets the key from the polling queue, so the controller must call


It does seem hard to trace data through the queue, but I'm not sure if we can do anything about it at this point without a large refactor. @pmorie @vaikas-google what are your thoughts RE cleaning up the flow? Either way, I think that work would be outside the scope.

arschles · 2017-08-01T21:38:49Z

pkg/controller/controller_instance_test.go

+	// Since polling is rate-limited, it is not possible to check whether the
+	// instance is in the polling queue.
+	//
+	// TODO: add a way to peek into rate-limited adds that are still pending,


is there an issue for this? if not, can you create one? same with below

There is, kubernetes/kubernetes#49783

kibbles-n-bytes · 2017-08-01T22:20:07Z

pkg/controller/controller_instance.go

+		err = c.continuePollingInstance(instance)
+		if err != nil {
+			return err
+		}
 		return fmt.Errorf("last operation not completed (still in progress) for %v/%v", instance.Namespace, instance.Name)


We don't need this error anymore, do we? Since we're calling continuePollingInstance to re-add the key to the polling queue, we could return nil here and the instance will still be reprocessed.

kibbles-n-bytes · 2017-08-01T22:48:04Z

Overall the architecture looks fine to me for now to fix the race condition. I have some further questions about the rate limiting, but nothing that should block this going in. I'll merge this and rebase #1067 .

…tired#1074)" This reverts commit a6e80ea.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jul 28, 2017

pmorie requested review from MHBauer and vaikas July 28, 2017 06:01

nilebox reviewed Jul 28, 2017

View reviewed changes

pmorie force-pushed the instance-polling branch 2 times, most recently from 861fb7f to d56da9b Compare August 1, 2017 04:47

pmorie requested review from arschles and nilebox August 1, 2017 12:53

vaikas reviewed Aug 1, 2017

View reviewed changes

vaikas added the LGTM1 label Aug 1, 2017

Only do work for instances from a single queue

119f2ab

pmorie force-pushed the instance-polling branch from d56da9b to 119f2ab Compare August 1, 2017 18:56

arschles approved these changes Aug 1, 2017

View reviewed changes

arschles added the LGTM2 label Aug 1, 2017

kibbles-n-bytes reviewed Aug 1, 2017

View reviewed changes

kibbles-n-bytes merged commit a6e80ea into kubernetes-retired:master Aug 1, 2017

This was referenced Aug 1, 2017

make deprovisioning an instance asynchronously not fall-through to synchronous deprovision #1067

Merged

Deprovisioning an instance asynchronously should add it to the polling queue #1066

Closed

pmorie mentioned this pull request Aug 3, 2017

Add test to ensure that asynchronously provisioning (and deprovisioning) instances eventually succeed #985

Closed

kibbles-n-bytes pushed a commit to kibbles-n-bytes/service-catalog that referenced this pull request Aug 7, 2017

Revert "Only do work for instances from a single queue (kubernetes-re…

c12431f

…tired#1074)" This reverts commit a6e80ea.

kibbles-n-bytes pushed a commit to kibbles-n-bytes/service-catalog that referenced this pull request Aug 11, 2017

Revert "Only do work for instances from a single queue (kubernetes-re…

774c1dc

…tired#1074)" This reverts commit a6e80ea.

kibbles-n-bytes pushed a commit to kibbles-n-bytes/service-catalog that referenced this pull request Aug 11, 2017

Revert "Only do work for instances from a single queue (kubernetes-re…

d465584

…tired#1074)" This reverts commit a6e80ea.

kibbles-n-bytes pushed a commit to kibbles-n-bytes/service-catalog that referenced this pull request Aug 11, 2017

Revert "Only do work for instances from a single queue (kubernetes-re…

369b385

…tired#1074)" This reverts commit a6e80ea.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only do work for instances from a single queue #1074

Only do work for instances from a single queue #1074

pmorie commented Jul 28, 2017

pmorie commented Jul 28, 2017

pmorie commented Jul 28, 2017

nilebox Jul 28, 2017

pmorie Jul 28, 2017

pmorie Jul 28, 2017

vaikas Jul 30, 2017

pmorie Aug 1, 2017

pmorie commented Jul 28, 2017

pmorie commented Aug 1, 2017

vaikas Jul 30, 2017

arschles Aug 1, 2017

pmorie Aug 1, 2017

vaikas Jul 30, 2017

arschles left a comment

arschles Aug 1, 2017

arschles Aug 1, 2017

pmorie Aug 1, 2017

kibbles-n-bytes Aug 1, 2017

kibbles-n-bytes commented Aug 1, 2017 •

edited

Loading

Only do work for instances from a single queue #1074

Only do work for instances from a single queue #1074

Conversation

pmorie commented Jul 28, 2017

pmorie commented Jul 28, 2017

pmorie commented Jul 28, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pmorie commented Jul 28, 2017

pmorie commented Aug 1, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arschles left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kibbles-n-bytes commented Aug 1, 2017 • edited Loading

kibbles-n-bytes commented Aug 1, 2017 •

edited

Loading