Initial pass at proposal to change subscribers #727

maleck13 · 2018-02-04T21:49:19Z

Describe what this PR does and why we need it:
Proposal based on conversations with the team about the current subscriber pattern and some of its weaknesses.
This proposal is fairly complete in my opinion but keen to get feedback on the approaches outlined and to ensure I am fulfilling everyone's expectations around this change.

Which issue this PR fixes (This will close that issue when PR gets merged)
fixes #638

maleck13 · 2018-02-04T21:49:46Z

@eriknelson @jmrodri initial proposal based on our conversations.

jmrodri · 2018-02-05T03:09:43Z

@maleck13 would it be possible to draw a picture of what it will look like? Even a picture on paper would work. Otherwise, I'll look at it tomorrow and see if I can get a picture done.

jmrodri

grammar checking review. Another review coming regarding actual implementation proposal.

jmrodri · 2018-02-05T03:24:34Z

docs/proposals/subscriber-fan-in-fan-out.md

@@ -0,0 +1,189 @@
+# Switch to using Fan in Fan out / Observer pattern for subscribers


Fan in Fan out -> Fan-in & Fan-out

jmrodri · 2018-02-05T03:25:21Z

docs/proposals/subscriber-fan-in-fan-out.md

@@ -0,0 +1,189 @@
+# Switch to using Fan in Fan out / Observer pattern for subscribers
+
+This is not quite the same as a fan in fan out pattern but not a


fan in fan out -> fan-in / fan-out

jmrodri · 2018-02-05T03:25:51Z

docs/proposals/subscriber-fan-in-fan-out.md

+
+**Observer:** suggestion to change the name from subscriber to observer
+as I think it more clearly conveys what they will be doing.
+For this doc I will continue to call them subscribers to avoid


comma after doc "For this doc, I "

jmrodri · 2018-02-05T03:26:43Z

docs/proposals/subscriber-fan-in-fan-out.md

+To surmise:
+
+- **Shared Channel with single subscriber**: There is a single buffered channel per topic (topics being broker
+actions such as provision, bind etc). All jobs of the given type write to


add commas "All jobs, of the given type, write to this channel"

jmrodri · 2018-02-05T03:28:18Z

docs/proposals/subscriber-fan-in-fan-out.md

+this channel. This has the potential to cause a bottleneck, particularly
+as there is currently only a single subscriber reading from this channel and
+this subscriber does not read the next message until it has completed
+all of its work. Currently there is not too much happening in these


comma missing after Currently

jmrodri · 2018-02-05T03:34:15Z

docs/proposals/subscriber-fan-in-fan-out.md

+The stop channel here would be tied to os signals
+[example](https://gist.github.com/reiki4040/be3705f307d3cd136e85).
+
+In this case we would still have a single channel per topic, but as we


comma after "In this case"

jmrodri · 2018-02-05T03:34:31Z

docs/proposals/subscriber-fan-in-fan-out.md

+
+The work engine will now be the one reading messages from the channel
+and passing it on to the Observers. To do this I propose we add a  public
+``Start(stop <-chan struct{})`` method, that would be called in it's own


TYPO: it's -> its

jmrodri · 2018-02-05T03:35:02Z

docs/proposals/subscriber-fan-in-fan-out.md

+#### 2) Create a channel per job
+
+Remove the single channel per topic and replace it with a channel per
+job. The work engine would be updated to create a channel per Job,


remove the comma after Job

jmrodri · 2018-02-05T03:35:35Z

docs/proposals/subscriber-fan-in-fan-out.md

+job. The work engine would be updated to create a channel per Job,
+when starting a new job. The work engine would be responsible for
+the lifecycle of these channels. As above, the subscribers would be
+registered in a map and called in a non blocking way when a message was received.


TYPO: non blocking -> nonblocking

jmrodri · 2018-02-05T03:36:32Z

docs/proposals/subscriber-fan-in-fan-out.md

+
+## Considerations
+
+- perhaps want a config option to limit the number of in progress jobs


TYPO: in progress -> in-progress

maleck13 · 2018-02-05T09:18:27Z

@jmrodri Thanks for the quick response ! I will fix up the typos etc. Happy to do a picture, but I am mostly unavailable today and tomorrow due to training.
I believe however it reasonably close to @eriknelson drawing
https://user-images.githubusercontent.com/410933/35408136-021411f2-01dc-11e8-8cd9-0464766d19da.jpg
Except the engine is also the distributor. Rough attempt here.

In the first Option there would be a listener in the workengine that distributes to the observers for each topic. The image only shows provision.

The difference for the second option is that there would be a channel and a listener created per job and disappear once the job was finished. If I get chance today I will attempt to draw that up.

maleck13 · 2018-02-06T20:19:36Z

updated based on comments

rthallisey

nice work!

rthallisey · 2018-02-12T17:15:41Z

@jmrodri @eriknelson any more comments here?

eriknelson · 2018-02-12T17:19:40Z

On my review stack!

eriknelson · 2018-02-12T17:23:45Z

@maleck13 This is pedantic, but is there a semantic difference between the pattern language of Observer - Subject vs. Publisher - Subscriber? I'm entirely indifferent between the two, aside from the fact that there is project precedence for Subscriber. I've always used PubSub to convey the idea, but I open that up to folks to argue otherwise if there is a meaningful difference.

shawn-hurley · 2018-02-12T17:40:11Z

docs/proposals/subscriber-fan-in-fan-out.md

+``Start(stop <-chan struct{})`` method, that would be called in its own
+go routine during ASB start up. I also propose adding a signal channel
+that will send a message to anything running  in the background
+(could also use context.Context). This would signal for the background


+1 to the use of context.

shawn-hurley · 2018-02-12T17:51:07Z

docs/proposals/subscriber-fan-in-fan-out.md

+
+```go 
+
+StartNewJob(token string, work Work, topic WorkTopic){


Just for my own clarification, the var a in the Sudo Code is for the work engine itself.

so from this, we see that a keeps track of all of the channels per a token, and keeps track of all of the subscribers for a given topic. Just want to make sure there is no other concerns that the engine is keeping track of.

Also, do we need to think about passing a context around here? I envision a use case where a subscriber says hey some really odd happened here and needs to send a message to other subscribers. I believe that is the use case for context?

Yes those are the two things that would need to be tracked by the engine, If we were to use a channel per job.

On context, it is normally used to signal timeout and cancellations. I would not be against accepting a context as part of a Jobs definition to enable cancellation or potentially timeouts.
Context can also carry values across API lines, but I am not such a fan of this approach.

I think adding the context for timeout is something that we should do.

I think that canceling a running subscriber if another subscriber fails is something we should consider.

@eriknelson @jmrodri do you guys have thoughts on timeouts or cancellations?

I think that canceling a running subscriber if another subscriber fails is something we should consider.

My gut feeling is that they're intended to take a simple data structure that contains the result of some work and do something useful with it completely independent from other subscribers. I'm not sure I like the idea of them communicating with each other.

I think the concept of Jobs that can be cancelled is probably a proposal of its own and outside the remit of this work.
If we had jobs that could somehow be cancelled, then the Job itself would take the context, I don't think the subscribers would as they only react to messages passed in. In the case of cancellation, the likelihood is a message would be passed in with a state of cancelled rather than the subscriber watching the context itself.

maleck13 · 2018-02-12T18:24:36Z

@eriknelson Happy to leave them as subscribers, really just a suggestion as I like the term observer in the context of the Job. They are observing / watching the job. I agree though, I don't think there is really much difference and I can see that channel is essentially a message bus

maleck13 · 2018-02-14T10:41:24Z

I believe the actions on me here are to

Add a diagram showing the channel per job approach
Remove the change to the name of the subscribers

Remaining is a discussion about passing a context in to Job to allow for timeout and cancellation. (should this be a separate proposal perhaps)?

Am I missing anything?

eriknelson

A few questions.

My preferred approach is 2), but honestly 1) is attractive for it's simplicity.

eriknelson · 2018-02-12T17:35:11Z

docs/proposals/subscriber-fan-in-fan-out.md

+Rather than passing the channel into the subscriber, we would rename and
+change this interface to accept a ```JobMsg```, value not a pointer to stop
+unexpected mutation, and return an error.
+The current subscribers would handle state persistence and would be


The current subscribers would handle state persistence and would be renamed to reflect their role.

+1, this is the important aspect to me. I would like to keep the jobs decoupled from the consumers that do something useful with the result. It's a deliberate aspect of the original design that I still think is important.

eriknelson · 2018-02-21T01:11:32Z

docs/proposals/subscriber-fan-in-fan-out.md

+   select{
+       case msg := <-provisionChan:
+           for _,s := range e.subscribers[ProvisionTopic]{
+                go s(msg)              


There's a bit of an inconsistency here I think. (I'm going to use Subscriber here for the sake of clarity, assume it is synonymous with Observer). If we are moving away from subscriber's taking a channel as an argument with Notify(msg JobMsg)error, this should be s.Notify(msg). Additionally, this is fire-and-forget; the engine doesn't have an opportunity to deal with errors that may come back. That may be desirable for this soln? The question I would ask is:

Is it okay to just fire these off and ignore the outcome? If so, it doesn't make sense to return an error from Notify. How do we handle error scenarios in the subscribers then, if at all?

If it's not okay, how do we expect the engine to handle errors that have been returned from subscribers?

👍 I should not have had the error return on the interface, I will change this. As each WorkSubscriber is an independent function unaware of any of the other work that is happening, I think it makes sense for them to handle the error internally (similar to how the subscribers work now, marking the action as failed, cleaning up etc) as the only real options for the WorkEngine on receiving an error would be to log it.

eriknelson · 2018-02-21T01:18:15Z

docs/proposals/subscriber-fan-in-fan-out.md

+job. The work engine would be updated to create a channel per Job
+when starting a new job. The work engine would be responsible for
+the lifecycle of these channels. As above, the subscribers would be
+registered in a map and called in a non-blocking way when a message was received.


Is the expectation that there will be unique instances of registered subscriber types, per job? I.E., if I have an EtcdSub, and a RabbitSub, the work engine will instantiate unique ones for each job? Or each topic?

I think as they are intended to be stateless functions, they would be registered at startup for each topic. I hadn't intended for them to be created per job just called per job. I will make this clearer in the proposal

eriknelson · 2018-02-21T01:29:05Z

docs/proposals/subscriber-fan-in-fan-out.md

+      go func(){
+      // will auto stop once the channel is closed
+        for msg := range a.jobs[token]{
+            for _, s := range a.subscribers[topic]{


Can you confirm my understanding of this? It appears that there is a single instance of a given subscriber for each topic in this model, where there is only one topic per action. The subscribers listening on a given topic have their Notify methods triggered when a JobMsg comes in. It's spun off immediately into a go routine? Begs the question, why are subscribers structs at all, if the same instance is triggered for all jobs. Seems to me it should just be a stateless handler function.

Your understanding here is correct and inline with what I was thinking. One clarification: in the current proposal the subscribers would implement the WorkObserver / WorkSubscriber interface allowing us to have many different types to be Subscribers. That said, the WorkSubscriber could be declared as a func type

WorkSubscriber func(msg JobMsg)

eriknelson · 2018-02-21T01:29:24Z

docs/proposals/subscriber-fan-in-fan-out.md

+            for _, s := range a.subscribers[topic]{
+               // Note we could choose to run these sync but again if there was a slow
+               // subscriber it would block receiving on the channel.
+               go s(msg)


Same comment WRT Notify and what we should do about errors.

eriknelson · 2018-02-21T01:34:32Z

docs/proposals/subscriber-fan-in-fan-out.md

+
+```go 
+
+StartNewJob(token string, work Work, topic WorkTopic){


I think that canceling a running subscriber if another subscriber fails is something we should consider.

My gut feeling is that they're intended to take a simple data structure that contains the result of some work and do something useful with it completely independent from other subscribers. I'm not sure I like the idea of them communicating with each other.

eriknelson · 2018-02-21T02:33:00Z

We need a consensus on whether we'd like to pursue implementation 1 or 2. 2 feels closer to the pitch made on our call. I would personally prefer it because it feels more flexible as @maleck13 points out. Is it fair to say most people prefer 2?

maleck13 · 2018-02-26T13:47:02Z

Have updated with more details and removed the Observer name change.
@jmrodri Are you happy with this now? Do you need an additional drawing?
@shawn-hurley I think we have answered your questions about context (anything outstanding now)?
@eriknelson Made the changes you have suggested.

eriknelson

Thanks @maleck13, looking good. I think we should go with 2) unless someone wants to argue otherwise.

eriknelson · 2018-03-01T19:56:14Z

@maleck13 I'm considering this accepted given the acks and discussion. Thanks again.

maleck13 · 2018-03-02T08:28:58Z

Great ! thanks guys

openshift-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 4, 2018

jmrodri requested review from jmrodri and eriknelson February 5, 2018 03:24

jmrodri suggested changes Feb 5, 2018

View reviewed changes

jmrodri added proposal needs-review labels Feb 5, 2018

rthallisey approved these changes Feb 7, 2018

View reviewed changes

shawn-hurley requested changes Feb 12, 2018

View reviewed changes

eriknelson suggested changes Feb 21, 2018

View reviewed changes

maleck13 added 3 commits February 26, 2018 12:20

add initial pass at proposal to change subscribers

067e089

changes based on comments

6fe4e82

update proposal with more details and remove the observers

fe112c2

maleck13 force-pushed the issue-638-proposal-fanin-fanout branch from 68f7868 to fe112c2 Compare February 26, 2018 13:44

eriknelson approved these changes Feb 27, 2018

View reviewed changes

jmrodri approved these changes Mar 1, 2018

View reviewed changes

shawn-hurley approved these changes Mar 1, 2018

View reviewed changes

eriknelson merged commit 585749f into openshift:master Mar 1, 2018

		@@ -0,0 +1,189 @@
		# Switch to using Fan in Fan out / Observer pattern for subscribers

		@@ -0,0 +1,189 @@
		# Switch to using Fan in Fan out / Observer pattern for subscribers

		This is not quite the same as a fan in fan out pattern but not a


		## Considerations

		- perhaps want a config option to limit the number of in progress jobs

Initial pass at proposal to change subscribers #727

Initial pass at proposal to change subscribers #727

Conversation

maleck13 commented Feb 4, 2018 • edited by rthallisey Loading

maleck13 commented Feb 4, 2018

jmrodri commented Feb 5, 2018

jmrodri left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maleck13 commented Feb 5, 2018 • edited Loading

maleck13 commented Feb 6, 2018

rthallisey left a comment

Choose a reason for hiding this comment

rthallisey commented Feb 12, 2018

eriknelson commented Feb 12, 2018

eriknelson commented Feb 12, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eriknelson Feb 21, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maleck13 commented Feb 12, 2018

maleck13 commented Feb 14, 2018

eriknelson left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eriknelson Feb 21, 2018 • edited Loading

Choose a reason for hiding this comment

eriknelson commented Feb 21, 2018 • edited Loading

maleck13 commented Feb 26, 2018

eriknelson left a comment

Choose a reason for hiding this comment

eriknelson commented Mar 1, 2018

maleck13 commented Mar 2, 2018

maleck13 commented Feb 4, 2018 •

edited by rthallisey

Loading

maleck13 commented Feb 5, 2018 •

edited

Loading

eriknelson commented Feb 12, 2018 •

edited

Loading

eriknelson Feb 21, 2018 •

edited

Loading

eriknelson left a comment •

edited

Loading

eriknelson Feb 21, 2018 •

edited

Loading

eriknelson commented Feb 21, 2018 •

edited

Loading