Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reactor callback prototypes need redesign #124

Closed
garlick opened this issue Dec 26, 2014 · 24 comments
Closed

reactor callback prototypes need redesign #124

garlick opened this issue Dec 26, 2014 · 24 comments

Comments

@garlick
Copy link
Member

garlick commented Dec 26, 2014

Some issues with the current reactor callback API design

  • Flux API probably shouldn't expose zmsg_t from CZMQ zmsg class.
  • zmsg_t destruction by zmsg_send() and our functions that wrap it is confusing
  • The reactor should not pass handlers messages that do not conform to RFC3
  • For RPC's, the reactor should assist with error responses
@garlick
Copy link
Member Author

garlick commented Dec 26, 2014

Couple more issues:

  • typemask argument allowing a handler to match multiple message types is little used and could be eliminated
  • handlers are registered to match a topic string that always begins with the module name. Maybe we should make the module name implicit so that later on we can load modules under names other than the hardwired MOD_NAME() one.

@garlick
Copy link
Member Author

garlick commented Mar 18, 2015

A few ideas from discussion with @grondo today:

@grondo:

I was thinking it would be nice if we could think of a way to expose the underlying ev implementation somehow to avoid a requirement to wrap every libev call.

@garlick:

Yes, it feels wrong to wrap libev, and if we were going to continue to wrap it, I would want to morph our API into something that looks much more like libev's so it could be as thin a layer as possible, and becaus it's so much nicer. But let's pursue using it directly first.

If we say that utilities and comms modules (internal and external to flux-core) use libev directly, then it seems like libev should be an external dependency, not "provided" by flux-core. Instead, flux-core should provide specialized libev watchers for the flux_t handle.

Specialized watchers would be:

  • libutil/ev_*.h
  • watcher that returns when calling flux_recvmsg() will return
    any message without blocking
  • watcher for message flux_match_t that returns when calling
    flux_recvmsg() would return a matching message wihtout blocking

@garlick:

Do we want to be locked in forever to libev? I suppose if we are just providing watchers, we can provide similar things for other event loops as needed.

@grondo
Copy link
Contributor

grondo commented Mar 18, 2015

Ideally we would allow users of flux api to BYOEL (Bring Your Own Event Library). I think exposing specialized flux watchers for libev is the right first step, and in the future someone could write bindings for whatever event loop they are using (e.g. libuv) so that flux could be more nicely integrated into their application (ie. tools would not necessarily need to create a thread to handle flux messages separate from their own main loop)

This may also reduce code in the flux api implementations since we could defer "generic" handlers (fd, signal, etc) to the loop implementation.

Therefore, I don't think this approach would lock us into libev forever, or at least not users of flux api, at long as we think we can embed necessary flux core functionality in "watchers" generic enough to be implemented in any/most modern event loop interfaces.

I haven't thought about your idea in detail, but I think this work could be nicely staged into the following work items

  • Create specialized watchers for libev as you describe above and expose this via a flux-core library or perhaps a plugin (some design work needed here to ensure we can extend this to other event loop interfaces)
  • Pick a couple tools or utilities to convert to use libev directly and iterate on design/implementation
  • Deprecate flux_reactor* and convert existing users one by one
  • Once all users have been converted, remove reactor interfaces as needed

This probably grossly oversimplifies things, but is my general opinion of a good direction to move in.

@garlick
Copy link
Member Author

garlick commented Mar 18, 2015

Two thoughts:

  • Coprocesses are a complication. We need to be able to wrap the execution of an ev handler (of arbitrary type I suppose, if we want all of them to be capable of making sleepable flux RPCs) in the shim code that manages their coproc context.
  • It would be nice there were an optional shorthand Node.js like structure, at least for comms modules, that hides the reactor details in cases where you don't want to think about the reactor details or deal with libev quirks.

Are those two points compelling enough to warrant keeping a streamlined version of the built-in reactor while still providing the specialized watchers for integration with an external libev loop?

@grondo
Copy link
Contributor

grondo commented Mar 19, 2015

I was thinking mainly of flux api users in my comments above. Comms modules themselves seem specialized enough that they would always use the "event loop implied" style you have now. I can't see any reason why a comms module would need to use a different event loop implementation from cmbd itself.

Sleepable RPCs in an event driven framework is a generic problem, I guess. I don't have a good answer for that one, except that the common way to do it (though I'm far from an expert) seems to be to have async callables that invoke a callback when the blocking function returns (e.g. see async dns lookup implementations). Coprocs seem like a neat solution but may not help in this case if native support from your event loop is required. (Also, what about any other blocking function an API user
might use?)

Sorry, I may have gotten this issue a bit off topic. However, I think we've resolved that

  • Comms modules should continue using "event-loop implied" style they have now, i.e. no need for allowing multiple event loop implementation
  • We're not sure what to do about Flux API users...

@garlick
Copy link
Member Author

garlick commented Mar 19, 2015

Oh, it suddenly dawned on me that the handle implementations really just need to export a file descriptor that, when ready, indicates that flux_recvmsg() would not block. Then it becomes trivial to integrate with any type of event loop. This is sort of how zeromq works except that their file descriptor is edge-triggered for some reason, and ours should be level-triggered. Maybe implement this with pipe(2) or eventfd(2)?

For api users we could offer a choice of the internal reactor (the same one implied in comms modules), or the above.

@grondo
Copy link
Contributor

grondo commented Mar 19, 2015

How would you notify the pipefd/eventfd without having a callback wired into the event loop the application is using, or a thread? Or could you piggyback on something zeromq is already doing?

@garlick
Copy link
Member Author

garlick commented Mar 19, 2015

Maybe aggregate internal fd's into one epoll fd?

@garlick
Copy link
Member Author

garlick commented Mar 19, 2015

Specifically, the zlist_t putmsg queue in the API handle could have an eventfd associated with it. The socket connected to the broker is a UNIX domain socket fd. There are no zeromq sockets in the API handle implementation. The eventfd and the socket fd could just be wrapped up in the epoll fd that would be watched by the external event loop. Does that make sense?

(I admit I tossed out that idea without having thought it through!)

@grondo
Copy link
Contributor

grondo commented Mar 19, 2015

Yes, that makes sense.

So flux would also provide a callback or function that user would run when there was activity on that fd to process flux messages and events, or would they invoke a "run once" instance of the flux reactor? Or were you thinking that users would directly use flux_recvmsg() and be responsible for their own dispatch?

@garlick
Copy link
Member Author

garlick commented Mar 19, 2015

I was thinking that the user would call flux_recvmsg() to read a message and then either do what they want with it, or pass it to some version of the current dispatch engine that we export to them.

E.g.

flux_dispatch_add (h, match, handler_cb, arg);
fd = flux_eventfd (h);
// register fd with event loop so reactor_cb() is called when ready

reactor_cb()
{
    zmsg =  flux_recvmsg(h);
    flux_dispatch_msg (h, zmsg);
}

At least in this model we are not in the event loop business.

@grondo
Copy link
Contributor

grondo commented Mar 19, 2015

I understand.

Unfortunately I think each user would end up writing their own dispatcher, which is not nearly as nice an interface as we have now (IMO). What would end up happening, I'd guess, is that we'd be writing wrapping a "convenience" dispatcher for users anyway, so things like kvs watches would work without being cumbersome for the calling library.

The example above also exposes zmsg_t...

@grondo
Copy link
Contributor

grondo commented Mar 19, 2015

Following on your ideas, if flux api provided an fd to register to any event loop, and a callback to process/dispatch all flux related callbacks, I think we could get what you want (get out of event loop business) and what I want (simple APIs to register callbacks for various flux events).

I.e. API users would register callbacks to the flux handle as they do now, then they would call a "wireup" function to register the flux reactor to their own event loop, along with a callback that would dispatch flux callbacks as needed when there is flux msg activity.

This approach could also allow us to abstract the zmsg_t type behind the api handle.

This may be what you were talking about all along, and it just took me awhile to understand it.

@garlick
Copy link
Member Author

garlick commented Mar 19, 2015

Yeah I think that is what I was trying to say two comments up.

So should we bite the bullet and encapsulate zmsg_t in a fluxmsg_t?

@garlick
Copy link
Member Author

garlick commented May 6, 2015

The plan we discussed above is, to summarize:

  • deprecate the flux reactor interface (reactor.h)
  • add a flux_get_eventfd() handle op, which returns a file descriptor that will be ready when flux_recvmsg() can be called without blocking
  • add a dispatch interface that could be fed a flux_msg_t and call appropriate application callbacks

API users would integrate the eventfd and dispatcher into their own event loops.

Comms module users would get an implied libev loop and dispatcher. They would have direct access to the libev event loop so we don't have to wrap those calls to allow them to extend it.

Are we still thinking this sounds like a good idea?

@garlick
Copy link
Member Author

garlick commented May 6, 2015

A consequence of exposing the ev_loop to modules is that they will need access to libev headers. Therefore out of tree modules require out of tree libev, and src/common/libev should be removed to avoid version mismatch between the installed libev and internal one. Right?

@grondo
Copy link
Contributor

grondo commented May 6, 2015

Ok, I'm having trouble wrapping my head around this one. I found the provision of a simple event loop interface within flux with specific flux-y things to be nice. I think in porting all the Lua bindings and other event-based code to raw libev, I would be tempted to write a convenience library anyway that simplified common operations, so I'm not keen on completely deprecating everything in reactor.h.

However, I do think we definitely need flux_get_eventfd() -- that is a no-brainer for using Flux API from existing applications that have their own event loops.

Can we compromise and add a flux_get_item (flux_t h, FLUX_ITEM_EVENT_LOOP, ...) or similar call for modules that want to use underlying libev calls, but keep the current reactor interface for simplistic use cases? I imagine if we don't provide that, a convenience library would pop up at some point anyway...

Also, your comment about libev headers is a good one, I don't have any good ideas about that today.

@garlick
Copy link
Member Author

garlick commented May 6, 2015

Thanks for that @grondo. WIll ponder also, and your compromise sounds perfectly reasonable.

@garlick
Copy link
Member Author

garlick commented May 7, 2015

Maybe the accessor for the event loop (with aforementioned problems) can be omitted as long as there is a way for a user to substitute their own event loop using flux_get_eventfd() and dispatcher.

@garlick
Copy link
Member Author

garlick commented Jun 18, 2015

I am working on this code right now and wanted to put a mini brain dump in this issue on the internals to clarify my own thinking, leave some detailed notes for posterity, and (possibly if anyone has time to read this) get some feedback.

Pattern after Zeromq Event Loop Integration

The entries for ZMQ_FD and ZMQ_EVENTS in zmq-getsockopt(3) tersely explain the zeromq interfaces for integrating with external event loops. There is also a useful blog post on integrating zeromq with libev. I implemented the blog post's technique in flux-core in libutil/ev_zmq.c. Flux could pattern its interface for integrating flux handles with external event loops after zeromq's, that is, provide:

int flux_pollfd (flux_t h);
int flux_pollevents (flux_t h);

or equivalent (perhaps like flux_get_item())

Refactoring the Flux Reactor

In current master e.g. ad54d7d the internal libev reactor loop is replicated in each "connector", which provides a set of callbacks that allow that reactor to be accessed from the generic reactor interface in the handle. Each connector also interposes a message queue in front of its socket for flux_requeue(). I am going to try to move the internal libev reactor loop and this queue to the generic code and have each connector export an interface similar to the zeromq one above, so that the connector callbacks look like this:

struct flux_handle_ops {
    int         (*pollfd)(void *impl);
    int         (*pollevents)(void *impl);
    int         (*send)(void *impl, const flux_msg_t *msg, int flags);
    flux_msg_t* (*recv)(void *impl, int flags);
    // snip: other calls not relevant to discussion
}

The internal (generic) reactor loop will then watch the connector's pollfd file descriptor. The pollfd become readable whenever the POLLIN, POLLOUT, or POLLERR bits for the connector are raised, in an edge triggered manner. When the pollfd becomes readable, it indicates that pollevents should be read, and send/recvs can be handled one by one, checking pollevents again after each is handled until poll bits of interest are no longer set.

re-queue queue

The generic reactor interface will need to multiplex between the connector and the flux_requeue() queue. The main trick here is that when a message is added to the queue, it should unblock the reactor just like when POLLIN is raised on connector. I plan to use an eventfd(2) to track the queue state, and fold both the connector pollfd and the eventfd into an epoll(2) file descriptor which the generic libev reactor loop will watch. Combining the two file descriptors permits a single file descriptor to be exported for external event loop integration (otherwise we would just watch two file descriptors in the internal event loop).

edge triggered necessary?

The downside of edge-triggered notification is that integration with a level triggered event loop requires machinations like the ones in the blog post above, which took me a little while to get my head around. Ideally you'd like a file descriptor that behaves more like a regular one, that is, level triggered, and responding as expected to being poll(2)ed with subsets of POLLIN, POLLOUT. Then you could just register it with your event loop and call flux_read() and flux_write() each time you get a callback.

However, I don't think its possible to do this. For one thing, file descriptors from epoll_create() only raise POLLIN when managed by poll() for any event type, so you can't use that trick to aggregate multiple fd's. There's also no way to be notified that the external poll loop is only polling on say POLLIN but not POLLOUT, so there's no scheme for propagating the poll bits to other fds. Then there's the fact that zeromq offers this interface and no other, and at least one of our connectors embeds a zeromq socket. To create the desired effect, I think you'd have to do something more "active" like create a thread that monitors the various underlying fd's and tries to create the right events on an eventfd or pipe, undesirable for a number of reasons.

I'm open to suggestion here if I'm missing something obvious.

@grondo
Copy link
Contributor

grondo commented Jun 18, 2015

Thanks @garlick, great summary of the issues!

Overall the approach above of having each connector expose an fd and wrap a generic reactor implementation around each connector's ops sounds like a massive improvement and simplification of the reactor and individual connector's codes.

I didn't know anything about wrapping up multiple fds into a single epollfd until now -- It might take me awhile to wrap my head around that one.

If we try to expose a "flux reactor fd" for users to integrate into the poll or event loops of their own applications, I worry that having this edge v. level triggered will make that very bug prone if not impossible for a "casual" user. Perhaps we can document the usage very explicitly, or build some tools around the implementation to make this work the way people expect. Otherwise I would have to assume it might not be useful or not used very often.

That being said, staying away from creating another thread is a good idea.

Also, for my own benefit, could you elaborate on why flux_requeue() posts messages to a local list instead of reposting the message to the zeromq queue and reusing an internal zmq queue?

@garlick
Copy link
Member Author

garlick commented Jun 19, 2015

The separate queue is necessary because zeromq doesn't let you treat a zeromq socket as a raw queue, i.e. you can't avoid zeromq's semantics for message flow (like the dealer-router push/pop of address frames), you don't always have access to the "sending end" of a socket, and sending only allows you to append to the internal queue; it's not possible to push a message to the otehr end.

@trws
Copy link
Member

trws commented Jul 13, 2015

has this been adequately addressed by #225?

@garlick
Copy link
Member Author

garlick commented Jul 13, 2015

Closing. We can open up new issues for further tinkering with the reactor API.

@garlick garlick closed this as completed Jul 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants