implement matchtag protocol enhancement #123

garlick · 2014-12-19T23:02:12Z

This change allows clients to assign a unique integer 'matchtag' to a request which will be echoed back in the corresponding response. This makes it easier for a client to manage multiple outstanding requests.

The flux_t handle implements a "tag pool". Unique matchtags can be obtained using flux_matchtag_alloc() and retired using flux_matchtag_free().

flux_json_rpc() hides tag management from callers.

flux_response_recvmsg() now has a matchtag argument and will manage the requeuing of messages that come in but do not match the requested tag. Alternatively, calling it with a matchtag of 0 disables tag matching.

flux_json_request() also has a matchtag argument.

The previous method of matching on topic strings limited concurrency since at most one request per topic string could be outstanding. This was especially problematic in situations where a request generated an unknown number of replies, such as with kvs_watch().

The integer matchtag scheme was borrowed from the 9P protocol in Plan 9. One difference is 9P uses a 16-bit tag value while I went with an 8-bit one. Easy to change that later if we need more concurrency, but 255 outstanding requests per handle should get us pretty far for now and allows a trivial tagpool implementation.

Add 'matchtag' argument to flux_response_recvmsg() and flux_json_request(). Change flux_json_rpc() to internally allocate/free a unique matchtag. Legacy request functions set 'matchtag' to zero in flux_response_recvmsg() and flux_json_request() to get the legacy behavior. Special KVS note: kvs_watch() was changed to allocate a matchtag and hold it for the life of the handle since a kvs_watch() request has multiple replies and persists until the handle is destroyed. The design of kvs_watch() will be revisited - see issue flux-framework#75.

grondo · 2014-12-19T23:32:10Z

Cool implementation!

My first thought whilst perusing was that flux_response_recvmsg() should have implied flux_response_recvmsg_any() and a new flux_response_recvmsg_matchtag() should have
been implemented. However, now I think I'm wrong and there don't seem to be many callers
of flux_response_recvmsg() anyway, and for most handle users this will be pretty transparent?

Will this have applications to a "protocol" that sends multiple responses to the same request?
For example, a request to execute a process might get (at least) two responses, the reply to
initial request (with a rank and pid), and an exit status message when the process terminates.

Does this matchtag scheme mean a max 255 processes could be launched from a handle if the
above scheme were the implementation? (Probably there is a better way to manage the process
status messages, but I'm using this for illustrative purposes)

(Sorry not familiar with the matchtag paradigm from 9p, so I'm probably asking dumb questions)

Finally, I did see a couple places where matchtag == 0 appeared to be used explicitly in 72bd9e8, is this
tag special, or (more likely) did I not understand the code?

garlick · 2014-12-19T23:49:37Z

Yes matchtag 0 is special - it means "match anything". Setting matchtag to zero in flux_response_recvmsg() makes it equivalent to the previous implementation, and setting it to zero in flux_json_request() avoids the need to allocate a matchtag, a necessity when sending requests that have no replies. This was (somewhat obscurely) documented in request.h:

/* Receive a response message matching 'matchtag', blocking until one is
 * available.  If 'nonblock' and none is available, return NULL
 * with errno == EAGAIN.  If 'matchtag' is 0, match any message.
 * Returns message on success, or NULL on failure with errno set.
 */
zmsg_t *flux_response_recvmsg (flux_t h, uint8_t matchtag, bool nonblock);

and in the RFC 3 PR just submitted:

; Match-tag to correlate request/response
matchtag        = OCTET / matchtag-any
matchtag-any    = %x00

For requests with multiple replies, all replies would have the same matchtag. The client side would need to manage the life cycle of the matchtag for that request. kvs_watch() is such an example - a pathalogical one since, as you pointed out in issue #75, there is no kvs_unwatch() so the matchtag is effectively leaked.

I'm not sure about the max 255 processes question. A guess is maybe yes for a non-scalable, simple implementation of remote execution like we were discussing, but probably no for a scalable lightweight job launch scheme. Hard to answer that without more discussion.

implement matchtag protocol enhancement

garlick added 2 commits December 19, 2014 14:29

libflux: add matchtag to protocol

a493f2a

garlick added the review label Dec 19, 2014

grondo added a commit that referenced this pull request Dec 19, 2014

Merge pull request #123 from garlick/matchtag

49ba060

implement matchtag protocol enhancement

grondo merged commit 49ba060 into flux-framework:master Dec 19, 2014

grondo removed the review label Dec 19, 2014

garlick deleted the matchtag branch January 21, 2015 23:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement matchtag protocol enhancement #123

implement matchtag protocol enhancement #123

garlick commented Dec 19, 2014

grondo commented Dec 19, 2014

garlick commented Dec 19, 2014

implement matchtag protocol enhancement #123

implement matchtag protocol enhancement #123

Conversation

garlick commented Dec 19, 2014

grondo commented Dec 19, 2014

garlick commented Dec 19, 2014