reactor callback prototypes need redesign #124

garlick · 2014-12-26T15:55:23Z

Some issues with the current reactor callback API design

Flux API probably shouldn't expose zmsg_t from CZMQ zmsg class.
zmsg_t destruction by zmsg_send() and our functions that wrap it is confusing
The reactor should not pass handlers messages that do not conform to RFC3
For RPC's, the reactor should assist with error responses

The text was updated successfully, but these errors were encountered:

garlick · 2014-12-26T16:01:58Z

Couple more issues:

typemask argument allowing a handler to match multiple message types is little used and could be eliminated
handlers are registered to match a topic string that always begins with the module name. Maybe we should make the module name implicit so that later on we can load modules under names other than the hardwired MOD_NAME() one.

garlick · 2015-03-18T19:47:11Z

A few ideas from discussion with @grondo today:

@grondo:

I was thinking it would be nice if we could think of a way to expose the underlying ev implementation somehow to avoid a requirement to wrap every libev call.

@garlick:

Yes, it feels wrong to wrap libev, and if we were going to continue to wrap it, I would want to morph our API into something that looks much more like libev's so it could be as thin a layer as possible, and becaus it's so much nicer. But let's pursue using it directly first.

If we say that utilities and comms modules (internal and external to flux-core) use libev directly, then it seems like libev should be an external dependency, not "provided" by flux-core. Instead, flux-core should provide specialized libev watchers for the flux_t handle.

Specialized watchers would be:

libutil/ev_*.h

watcher that returns when calling flux_recvmsg() will return
any message without blocking

watcher for message flux_match_t that returns when calling
flux_recvmsg() would return a matching message wihtout blocking

@garlick:

Do we want to be locked in forever to libev? I suppose if we are just providing watchers, we can provide similar things for other event loops as needed.

grondo · 2015-03-18T21:40:21Z

Ideally we would allow users of flux api to BYOEL (Bring Your Own Event Library). I think exposing specialized flux watchers for libev is the right first step, and in the future someone could write bindings for whatever event loop they are using (e.g. libuv) so that flux could be more nicely integrated into their application (ie. tools would not necessarily need to create a thread to handle flux messages separate from their own main loop)

This may also reduce code in the flux api implementations since we could defer "generic" handlers (fd, signal, etc) to the loop implementation.

Therefore, I don't think this approach would lock us into libev forever, or at least not users of flux api, at long as we think we can embed necessary flux core functionality in "watchers" generic enough to be implemented in any/most modern event loop interfaces.

I haven't thought about your idea in detail, but I think this work could be nicely staged into the following work items

Create specialized watchers for libev as you describe above and expose this via a flux-core library or perhaps a plugin (some design work needed here to ensure we can extend this to other event loop interfaces)
Pick a couple tools or utilities to convert to use libev directly and iterate on design/implementation
Deprecate flux_reactor* and convert existing users one by one
Once all users have been converted, remove reactor interfaces as needed

This probably grossly oversimplifies things, but is my general opinion of a good direction to move in.

garlick · 2015-03-18T23:12:09Z

Two thoughts:

Coprocesses are a complication. We need to be able to wrap the execution of an ev handler (of arbitrary type I suppose, if we want all of them to be capable of making sleepable flux RPCs) in the shim code that manages their coproc context.
It would be nice there were an optional shorthand Node.js like structure, at least for comms modules, that hides the reactor details in cases where you don't want to think about the reactor details or deal with libev quirks.

Are those two points compelling enough to warrant keeping a streamlined version of the built-in reactor while still providing the specialized watchers for integration with an external libev loop?

grondo · 2015-03-19T16:22:03Z

I was thinking mainly of flux api users in my comments above. Comms modules themselves seem specialized enough that they would always use the "event loop implied" style you have now. I can't see any reason why a comms module would need to use a different event loop implementation from cmbd itself.

Sleepable RPCs in an event driven framework is a generic problem, I guess. I don't have a good answer for that one, except that the common way to do it (though I'm far from an expert) seems to be to have async callables that invoke a callback when the blocking function returns (e.g. see async dns lookup implementations). Coprocs seem like a neat solution but may not help in this case if native support from your event loop is required. (Also, what about any other blocking function an API user
might use?)

Sorry, I may have gotten this issue a bit off topic. However, I think we've resolved that

Comms modules should continue using "event-loop implied" style they have now, i.e. no need for allowing multiple event loop implementation
We're not sure what to do about Flux API users...

garlick · 2015-03-19T16:46:37Z

Oh, it suddenly dawned on me that the handle implementations really just need to export a file descriptor that, when ready, indicates that flux_recvmsg() would not block. Then it becomes trivial to integrate with any type of event loop. This is sort of how zeromq works except that their file descriptor is edge-triggered for some reason, and ours should be level-triggered. Maybe implement this with pipe(2) or eventfd(2)?

For api users we could offer a choice of the internal reactor (the same one implied in comms modules), or the above.

grondo · 2015-03-19T17:15:46Z

How would you notify the pipefd/eventfd without having a callback wired into the event loop the application is using, or a thread? Or could you piggyback on something zeromq is already doing?

garlick · 2015-03-19T17:20:44Z

Maybe aggregate internal fd's into one epoll fd?

garlick · 2015-03-19T17:26:51Z

Specifically, the zlist_t putmsg queue in the API handle could have an eventfd associated with it. The socket connected to the broker is a UNIX domain socket fd. There are no zeromq sockets in the API handle implementation. The eventfd and the socket fd could just be wrapped up in the epoll fd that would be watched by the external event loop. Does that make sense?

(I admit I tossed out that idea without having thought it through!)

grondo · 2015-03-19T17:54:59Z

Yes, that makes sense.

So flux would also provide a callback or function that user would run when there was activity on that fd to process flux messages and events, or would they invoke a "run once" instance of the flux reactor? Or were you thinking that users would directly use flux_recvmsg() and be responsible for their own dispatch?

garlick · 2015-03-19T18:08:29Z

I was thinking that the user would call flux_recvmsg() to read a message and then either do what they want with it, or pass it to some version of the current dispatch engine that we export to them.

E.g.

flux_dispatch_add (h, match, handler_cb, arg);
fd = flux_eventfd (h);
// register fd with event loop so reactor_cb() is called when ready

reactor_cb()
{
    zmsg =  flux_recvmsg(h);
    flux_dispatch_msg (h, zmsg);
}

At least in this model we are not in the event loop business.

grondo · 2015-03-19T18:23:18Z

I understand.

Unfortunately I think each user would end up writing their own dispatcher, which is not nearly as nice an interface as we have now (IMO). What would end up happening, I'd guess, is that we'd be writing wrapping a "convenience" dispatcher for users anyway, so things like kvs watches would work without being cumbersome for the calling library.

The example above also exposes zmsg_t...

grondo · 2015-03-19T18:32:00Z

Following on your ideas, if flux api provided an fd to register to any event loop, and a callback to process/dispatch all flux related callbacks, I think we could get what you want (get out of event loop business) and what I want (simple APIs to register callbacks for various flux events).

I.e. API users would register callbacks to the flux handle as they do now, then they would call a "wireup" function to register the flux reactor to their own event loop, along with a callback that would dispatch flux callbacks as needed when there is flux msg activity.

This approach could also allow us to abstract the zmsg_t type behind the api handle.

This may be what you were talking about all along, and it just took me awhile to understand it.

garlick · 2015-03-19T22:23:56Z

Yeah I think that is what I was trying to say two comments up.

So should we bite the bullet and encapsulate zmsg_t in a fluxmsg_t?

garlick · 2015-05-06T18:10:02Z

The plan we discussed above is, to summarize:

deprecate the flux reactor interface (reactor.h)
add a flux_get_eventfd() handle op, which returns a file descriptor that will be ready when flux_recvmsg() can be called without blocking
add a dispatch interface that could be fed a flux_msg_t and call appropriate application callbacks

API users would integrate the eventfd and dispatcher into their own event loops.

Comms module users would get an implied libev loop and dispatcher. They would have direct access to the libev event loop so we don't have to wrap those calls to allow them to extend it.

Are we still thinking this sounds like a good idea?

garlick · 2015-05-06T18:24:00Z

A consequence of exposing the ev_loop to modules is that they will need access to libev headers. Therefore out of tree modules require out of tree libev, and src/common/libev should be removed to avoid version mismatch between the installed libev and internal one. Right?

grondo · 2015-05-06T21:58:01Z

Ok, I'm having trouble wrapping my head around this one. I found the provision of a simple event loop interface within flux with specific flux-y things to be nice. I think in porting all the Lua bindings and other event-based code to raw libev, I would be tempted to write a convenience library anyway that simplified common operations, so I'm not keen on completely deprecating everything in reactor.h.

However, I do think we definitely need flux_get_eventfd() -- that is a no-brainer for using Flux API from existing applications that have their own event loops.

Can we compromise and add a flux_get_item (flux_t h, FLUX_ITEM_EVENT_LOOP, ...) or similar call for modules that want to use underlying libev calls, but keep the current reactor interface for simplistic use cases? I imagine if we don't provide that, a convenience library would pop up at some point anyway...

Also, your comment about libev headers is a good one, I don't have any good ideas about that today.

garlick · 2015-05-06T22:40:53Z

Thanks for that @grondo. WIll ponder also, and your compromise sounds perfectly reasonable.

garlick · 2015-05-07T00:31:19Z

Maybe the accessor for the event loop (with aforementioned problems) can be omitted as long as there is a way for a user to substitute their own event loop using flux_get_eventfd() and dispatcher.

garlick · 2015-06-18T18:46:50Z

I am working on this code right now and wanted to put a mini brain dump in this issue on the internals to clarify my own thinking, leave some detailed notes for posterity, and (possibly if anyone has time to read this) get some feedback.

Pattern after Zeromq Event Loop Integration

The entries for ZMQ_FD and ZMQ_EVENTS in zmq-getsockopt(3) tersely explain the zeromq interfaces for integrating with external event loops. There is also a useful blog post on integrating zeromq with libev. I implemented the blog post's technique in flux-core in libutil/ev_zmq.c. Flux could pattern its interface for integrating flux handles with external event loops after zeromq's, that is, provide:

int flux_pollfd (flux_t h);
int flux_pollevents (flux_t h);

or equivalent (perhaps like flux_get_item())

Refactoring the Flux Reactor

In current master e.g. ad54d7d the internal libev reactor loop is replicated in each "connector", which provides a set of callbacks that allow that reactor to be accessed from the generic reactor interface in the handle. Each connector also interposes a message queue in front of its socket for flux_requeue(). I am going to try to move the internal libev reactor loop and this queue to the generic code and have each connector export an interface similar to the zeromq one above, so that the connector callbacks look like this:

struct flux_handle_ops {
    int         (*pollfd)(void *impl);
    int         (*pollevents)(void *impl);
    int         (*send)(void *impl, const flux_msg_t *msg, int flags);
    flux_msg_t* (*recv)(void *impl, int flags);
    // snip: other calls not relevant to discussion
}

The internal (generic) reactor loop will then watch the connector's pollfd file descriptor. The pollfd become readable whenever the POLLIN, POLLOUT, or POLLERR bits for the connector are raised, in an edge triggered manner. When the pollfd becomes readable, it indicates that pollevents should be read, and send/recvs can be handled one by one, checking pollevents again after each is handled until poll bits of interest are no longer set.

re-queue queue

The generic reactor interface will need to multiplex between the connector and the flux_requeue() queue. The main trick here is that when a message is added to the queue, it should unblock the reactor just like when POLLIN is raised on connector. I plan to use an eventfd(2) to track the queue state, and fold both the connector pollfd and the eventfd into an epoll(2) file descriptor which the generic libev reactor loop will watch. Combining the two file descriptors permits a single file descriptor to be exported for external event loop integration (otherwise we would just watch two file descriptors in the internal event loop).

edge triggered necessary?

The downside of edge-triggered notification is that integration with a level triggered event loop requires machinations like the ones in the blog post above, which took me a little while to get my head around. Ideally you'd like a file descriptor that behaves more like a regular one, that is, level triggered, and responding as expected to being poll(2)ed with subsets of POLLIN, POLLOUT. Then you could just register it with your event loop and call flux_read() and flux_write() each time you get a callback.

However, I don't think its possible to do this. For one thing, file descriptors from epoll_create() only raise POLLIN when managed by poll() for any event type, so you can't use that trick to aggregate multiple fd's. There's also no way to be notified that the external poll loop is only polling on say POLLIN but not POLLOUT, so there's no scheme for propagating the poll bits to other fds. Then there's the fact that zeromq offers this interface and no other, and at least one of our connectors embeds a zeromq socket. To create the desired effect, I think you'd have to do something more "active" like create a thread that monitors the various underlying fd's and tries to create the right events on an eventfd or pipe, undesirable for a number of reasons.

I'm open to suggestion here if I'm missing something obvious.

grondo · 2015-06-18T22:23:41Z

Thanks @garlick, great summary of the issues!

Overall the approach above of having each connector expose an fd and wrap a generic reactor implementation around each connector's ops sounds like a massive improvement and simplification of the reactor and individual connector's codes.

I didn't know anything about wrapping up multiple fds into a single epollfd until now -- It might take me awhile to wrap my head around that one.

If we try to expose a "flux reactor fd" for users to integrate into the poll or event loops of their own applications, I worry that having this edge v. level triggered will make that very bug prone if not impossible for a "casual" user. Perhaps we can document the usage very explicitly, or build some tools around the implementation to make this work the way people expect. Otherwise I would have to assume it might not be useful or not used very often.

That being said, staying away from creating another thread is a good idea.

Also, for my own benefit, could you elaborate on why flux_requeue() posts messages to a local list instead of reposting the message to the zeromq queue and reusing an internal zmq queue?

garlick · 2015-06-19T02:51:54Z

The separate queue is necessary because zeromq doesn't let you treat a zeromq socket as a raw queue, i.e. you can't avoid zeromq's semantics for message flow (like the dealer-router push/pop of address frames), you don't always have access to the "sending end" of a socket, and sending only allows you to append to the internal queue; it's not possible to push a message to the otehr end.

trws · 2015-07-13T22:21:36Z

has this been adequately addressed by #225?

garlick · 2015-07-13T23:51:16Z

Closing. We can open up new issues for further tinkering with the reactor API.

garlick mentioned this issue Mar 18, 2015

unwatch kvs values and gracefully return from a flux_reactor based application #75

Closed

garlick mentioned this issue Apr 24, 2015

Flux API: replace zmsg_t with new opaque type #168

Closed

garlick mentioned this issue May 6, 2015

API: add core.h header file for main public API #175

Closed

trws mentioned this issue May 6, 2015

Public API hardening tracking issue #170

Closed

10 tasks

garlick mentioned this issue Jun 4, 2015

finish removing json_object from public flux api #215

Merged

grondo mentioned this issue Jun 10, 2015

eliminate czmq dependency in public api #219

Merged

garlick mentioned this issue Jun 23, 2015

revise reactor API, refactor reactor internals #225

Merged

garlick closed this as completed Jul 13, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reactor callback prototypes need redesign #124

reactor callback prototypes need redesign #124

garlick commented Dec 26, 2014

garlick commented Dec 26, 2014

garlick commented Mar 18, 2015

grondo commented Mar 18, 2015

garlick commented Mar 18, 2015

grondo commented Mar 19, 2015

garlick commented Mar 19, 2015

grondo commented Mar 19, 2015

garlick commented Mar 19, 2015

garlick commented Mar 19, 2015

grondo commented Mar 19, 2015

garlick commented Mar 19, 2015

grondo commented Mar 19, 2015

grondo commented Mar 19, 2015

garlick commented Mar 19, 2015

garlick commented May 6, 2015

garlick commented May 6, 2015

grondo commented May 6, 2015

garlick commented May 6, 2015

garlick commented May 7, 2015

garlick commented Jun 18, 2015

grondo commented Jun 18, 2015

garlick commented Jun 19, 2015

trws commented Jul 13, 2015

garlick commented Jul 13, 2015

reactor callback prototypes need redesign #124

reactor callback prototypes need redesign #124

Comments

garlick commented Dec 26, 2014

garlick commented Dec 26, 2014

garlick commented Mar 18, 2015

grondo commented Mar 18, 2015

garlick commented Mar 18, 2015

grondo commented Mar 19, 2015

garlick commented Mar 19, 2015

grondo commented Mar 19, 2015

garlick commented Mar 19, 2015

garlick commented Mar 19, 2015

grondo commented Mar 19, 2015

garlick commented Mar 19, 2015

grondo commented Mar 19, 2015

grondo commented Mar 19, 2015

garlick commented Mar 19, 2015

garlick commented May 6, 2015

garlick commented May 6, 2015

grondo commented May 6, 2015

garlick commented May 6, 2015

garlick commented May 7, 2015

garlick commented Jun 18, 2015

grondo commented Jun 18, 2015

garlick commented Jun 19, 2015

trws commented Jul 13, 2015

garlick commented Jul 13, 2015