Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flux_respond(h, msg, 0, NULL) results in protocol error? #495

Closed
trws opened this issue Dec 15, 2015 · 39 comments
Closed

flux_respond(h, msg, 0, NULL) results in protocol error? #495

trws opened this issue Dec 15, 2015 · 39 comments
Assignees
Milestone

Comments

@trws
Copy link
Member

trws commented Dec 15, 2015

Perhaps this is intentional, but I did not expect it. When I send a response with a null payload, the receiver gets an EPROTO from the flux_rpc_get call. Clearly I can get around this by allocating an empty JSON object and passing that in, but somehow that feels wrong...

@garlick
Copy link
Member

garlick commented Dec 15, 2015

For better or worse, what happens is if the flux_rpc_get() call supplies a payload pointer it "expects" there to be a payload, and a missing one is an EPROTO. This is documented in flux_rpc(3)

json_out, if non-NULL, is assigned a string containing valid serialized
JSON from the response payload. It is a protocol error if a payload
that is expected (signified by a non-NULL json_out) does not arrive;
similarly, it is a protocol error if an unexpected payload (signified
by a NULL json_out) arrives. The storage associated with json_out
belongs to the flux_rpc_t object and is invalidated when that object is
destroyed.

Is that weird?

@trws
Copy link
Member Author

trws commented Dec 15, 2015

I can see how that makes sense. I came upon this because the current version of the python bindings always pass a payload to be populated in order to return a response object. The interface currently has no way for the user to say whether there should or should not be a payload. I suppose I can add one, but I had assumed that the result would either be an empty json object or that *payload would equal NULL for a NULL payload.

@trws
Copy link
Member Author

trws commented Dec 15, 2015

I was thinking about this on the way home, and I think I know why this surprised me as much as it did. The payload is explicitly noted as being optional, and is in fact ignored if the error is non-0. So, if a protocol normally responds with a payload, flux_respond(h, msg, !0, NULL) is valid, but flux_respond(h, msg, 0, NULL) is not. By the same token, a get without a payload will happen to be valid for one that has one so long as it is receiving an error. Am I thinking too far into this or does this seem counter-intuitive to anyone else?

@lipari
Copy link
Contributor

lipari commented Dec 15, 2015

Letting you know that I read the above discussion, but don't have a strong opinion one way or the other. As it sits, the onus is on the flux_rpc_get() caller to know whether or not a response is expected. If a response is expected and the json_out is NULL, then you know an error occurred. I guess I could live with that even if it meant passing an empty JSON object to every response, and finding it empty after the get meant that no response was provided.

@grondo
Copy link
Contributor

grondo commented Dec 15, 2015

By the same token, a get without a payload will happen to be valid for one that has one so long as it is receiving an error. Am I thinking too far into this or does this seem counter-intuitive to anyone else?

Without using the interface yet, I can say I agree with @trws that the current behavior seems pretty awkward and possibly violates principle of least surprise. I think it would be much improved if empty payload is allowed (setting json_ptr to NULL if provided) and will save lots of frustration for future developers...

@garlick
Copy link
Member

garlick commented Dec 15, 2015

Is the automatic EPROTO on extra/missing payload, or the "dual function" of flux_respond() (handling both error and non-error response cases) the problem, or both?

I feel a little guilty about both because clearly they do make simple functions more complicated, but I proposed them that way because I found myself repeating the same little chunks of code over and over, e.g.

if (rc < 0) {
    if (flux_respond_errnum (h, msg, errno) < 0)
        flux_log_error (h, "...");
} else {
    if (flux_respond (h, msg, json_str) < 0)
        flux_log_error ("...");
}

versus

if flux_respond (h, msg, rc < 0 ? errnum : 0, json_str) < 0)
    flux_log_error (h, "...");

or

if (flux_response_decode (msg, &json_str) < 0)
    goto done;
if (json_str == NULL) {
    errno = EPROTO;
    goto done;
}

versus without the block testing for NULL.

@grondo
Copy link
Contributor

grondo commented Dec 15, 2015

IMO automatic EPROTO on missing payload is the awkward bit here. I predict it will be surprising to 3rd party users that pass in a pointer to JSON object to get EPROTO error. It also hides the fact that the error was due to missing but presumably expected payload, vs server side EPROTO which could be due to malformed input JSON payload.

To save duplicated code for in-tree users, it would make more sense to have a util wrapper in libflux-internal or macro in macros.h. Then you're specifically saying you know what you are doing.

So far I think the flux_respond usage is nice as you note above.

@trws
Copy link
Member Author

trws commented Dec 15, 2015

I like the flux_respond usage overall, but I would actually prefer that always allow a payload to be either empty or full. In the case of an error, I can understand ignoring it, but it might be useful to send back a payload with further information on the reason for the failure, or application specific information to help in recovery. By the same token, I agree that a missing payload when one is expected is a potential protocol violation, but it isn't an error in the RPC protocol, rather it would be one in the user's protocol that is being defined on top of RPCs, right?

Overall, I'm agreeing with @grondo here that it's the EPROTO for presence or lack of payload, but as a side issue, ignoring payloads on error code seems a bit of a missed opportunity for richer exceptions (unless there's an underlying RPC protocol issue here I'm unaware of that would make it hard).

@grondo
Copy link
Contributor

grondo commented Dec 15, 2015

Well I think of the errnum as a low-level error. Your protocol on top could always define its own JSON error response that includes, say an error string. But for low level RPCs this is maybe overkill...

Good comments all around though!

@garlick
Copy link
Member

garlick commented Dec 15, 2015

Right, I was just going to say the same thing. An error response is defined in RFC 3 as numeric only. That doesn't prevent a service from defining its own structured error response, encoded as a regular response payload. Obviously these things aren't set in stone, but that one's quite a bit harder to change than the EPROTO behavior.

@garlick
Copy link
Member

garlick commented Dec 15, 2015

It should be noted that the EPROTO change is simple but will involve touching quite a lot of code (like every request handler and every RPC user). I'm OK with doing it and soon if people agree it should change though.

@trws
Copy link
Member Author

trws commented Dec 15, 2015

I would be for it, but why so much fallout? It would now be valid to pass NULL in places where one couldn't before, but is that a sufficiently common pattern to cause a lot of fallout, or is there breakage I just missed?

@garlick
Copy link
Member

garlick commented Dec 15, 2015

It's the decode paths that are the problem. One would need to start checking for NULL before passing a payload to json_tokener_parse() or equivalent. It applies to request decoding (server side) and response decoding (client side).

@grondo
Copy link
Contributor

grondo commented Dec 15, 2015

Could you define a new macro or libflux-internal function that wraps the old behavior, then search-replace existing users?

@trws
Copy link
Member Author

trws commented Dec 15, 2015

Maybe something like flux_rpc_get_assert_payload? Or add a flag to flux_rpc? I realize the latter doesn't actually save any changes, but it would give calling code a way to explicitly say whether they care or not.

@garlick
Copy link
Member

garlick commented Dec 15, 2015

Flags and/or variadic decode functions might be the way to go. Macros/internal functions might make a transition to a new idiom a bit easier.

Or will variadics cause problems for bindings?

@trws
Copy link
Member Author

trws commented Dec 15, 2015

What would you want it to be variadic for? Do you have a design in mind? The bindings tend not to like them, so I would prefer to avoid it, but if it's sufficiently useful I can always write a wrapper.

This is where having c++-style default arguments would be nice. C99 macro-based default arguments might be a nice compromise if that's the case.

@garlick
Copy link
Member

garlick commented Dec 15, 2015

I was thinking of something like

// fmt: n=nodeid t=topic string j=json payload r=raw payload l=payload length
flux_rpc_getf (flux_rpc_t *rpc, const char *fmt, ...);

That would allow the raw and normal interfaces to be collapsed and could provide a bit more expressivity for what the caller expects (viz a viz EPROTO).

@garlick
Copy link
Member

garlick commented Dec 15, 2015

Now that I look at it it says "meh" to me.

@trws
Copy link
Member Author

trws commented Dec 15, 2015

How about something like this?

enum flux_rpc_flags {
  FLUX_RPC_NO_FLAGS = 0,
  FLUX_RPC_PAYLOAD_UNEXPECTED = 1<<0,
  FLUX_RPC_PAYLOAD_RAW = 1<<1,
  FLUX_RPC_PAYLOAD_REQUIRED = 1<<2,
  FLUX_RPC_PAYLOAD_LENGTH = 1<<3
};
flux_rpc_getf (flux_rpc_t *rpc, int *errnum, size_t *payload_length, const char ** payload, flux_rpc_flags flags);

@garlick
Copy link
Member

garlick commented Dec 15, 2015

errnum is mutually exclusive with payload in the RFC 3 definition of a response message. The idiom we're using is to return -1 and set errno=errnum if the remote sends one back. That's harder to change as noted above.

There is a flags arg already in flux_rpc(). Maybe we could simply add

FLUX_RPC_GET_PAYLOAD_RAW
FLUX_RPC_GET_PAYLOAD_JSON
FLUX_RPC_GET_PAYLOAD_NONE

If any of those are set, you get EPROTO if the response doesn't have the expected payload. If none of these flags are set, you get the behavior that you anticipated when you opened the bug?

@garlick
Copy link
Member

garlick commented Dec 15, 2015

And maybe similar flags for flux_request_decode()?

@trws
Copy link
Member Author

trws commented Dec 15, 2015

That certainly works for me. Extra points if the flux_respond call can inform the responder of the expectation with an error if they try to misuse it on that end.

@trws
Copy link
Member Author

trws commented Dec 15, 2015

Scratch that last. Your solution looks good, but don't make respond potentially fail because of it. Giving the client the ability to cause an error return against a service just seems like a bad idea...

@garlick
Copy link
Member

garlick commented Dec 15, 2015

No flux_respond() itself only fails if something's gone wrong at a lower level.

@grondo
Copy link
Contributor

grondo commented Dec 15, 2015

This sounds good to me as well. @garlick, I'd be willing to help with grunt work if you'd like since you are already focused on the content service.

@garlick
Copy link
Member

garlick commented Dec 15, 2015

Sure help would be great!

Incidentally, you may already know this, but I have changes pending in PR #493 (commit bca843d) to disable the EPROTO handling in flux_request_decode_raw, and flux_response_decode_raw.

@grondo
Copy link
Contributor

grondo commented Dec 15, 2015

I did see that, but didn't put 2 and 2 together yet. I will base any work I do on top of #493, as that one looks like it will be ready for merge pretty soon.

@grondo
Copy link
Contributor

grondo commented Dec 17, 2015

Ok, I started looking into implementing the suggestions here and I'm quickly getting lost.
Here are some observations we might want to discuss and think about

  • It would be easy and not too disruptive to add flags to flux_rpc_t as described above, but as flux_rpc_get uses flux_response_decode() we still need to "fix" or offer alternative for flux_response_decode().
  • There exists *_raw functions for flux_rpc_get and flux_response_decode to get "raw" (binary data with encoded length), but flux_msg_set_payload() takes "raw" data argument and there is flux_msg_get/set_payload_json for JSON specifically. This asymmetry bothered me and we might want to fix it.
  • Should "JSON string" payload be simplified to just "NUL terminated string" -- since we now allow "raw" binary messages, should we also allow plain string (non-json) payload? It seems like string payload is special case of raw payload, and JSON payload is special case of string payload. Is there perhaps some clever way to handle this so that different payload types can inherit common methods from their subclass?

At this point I'm not sure that adding int flags argument to flux_response_decode() is going to be a real fix. @garlick had suggested perhaps now is a good time to step back and design good high level interfaces (perhaps it won't be too far off from what we have now). That is probably a great idea, but where to start?

BTW, my inclination right now would be to focus on flux_msg_t abstraction and build enough methods on top of that to make all the other convenience functions like flux_response_decode etc. easily implemented. This might mean flux_rpc_get() returns a flux_msg_t (another @garlick suggestion).

I don't feel confident enough to say this is the right way to go though.

@garlick
Copy link
Member

garlick commented Dec 17, 2015

Here's a thought. How about the following low level changes:

  • payload is mandatory
  • payload can be zero length
  • just drop the JSON flag

At the high level (rpc, request/response encode/decode):

  • drop the json_str functions and drop _raw from the raw functions

While that seems like a cleaner foundation to build upon, it will increase repeated code for users. We should take a look at that and see what we can do about it.

@grondo
Copy link
Contributor

grondo commented Dec 17, 2015

Yes, that sounds like a pretty clever way to handle the payload/no payload thing. For duplicated code, I think some kind of convenience functions for "special" payload types may be useful. (This is what I was attempting to get at with the extensible message types I ineloquently hinted at above). We could end up back near where we are now, but with a more solid foundation and (perhaps?) a methodology for adding perhaps other payload types (even for users outside flux-core?) [maybe getting too far off track here.]

@trws
Copy link
Member Author

trws commented Dec 20, 2016

I just happened across this issue randomly, thought it was addressed some time ago but don't see any backreferences. Can we close?

@garlick
Copy link
Member

garlick commented Dec 21, 2016

Not fixed.

flux_request_decode() and flux_response_decode() still return EPROTO if a non-NULL const char **json_str argument is passed in but the message contains no JSON payload.

flux_rpc_get() calls flux_response_decode() so it too has this behavior.

Now that we have flux_rpc_getf() and flux_request_decodef() that clearly should return EPROTO in this case, I wonder if we could just convert the majority of the users of the above functions to the new ones, then change the behavior as proposed without too much fallout?

@garlick garlick added this to the release 0.7.0 milestone Jan 4, 2017
@chu11 chu11 self-assigned this Feb 28, 2017
@chu11
Copy link
Member

chu11 commented Mar 1, 2017

payload is mandatory
payload can be zero length

@garlick, in the above do you mean to say flux_respond() (and some related functions) should no longer be able to take a NULL for json_str and users should minimally have to pass in a "{}"?

@garlick
Copy link
Member

garlick commented Mar 1, 2017

I think the api has evolved a bit since this discussion began, and I'm not too sure that comment stands as the best approach. What I as thinking in the dec 21 2016 comment was that we could just relax the decode function (including flux_rpc_get) handling of optional payload pointer arg so

  • if payload arg but no payload in msg, assign NULL to arg, not an error
  • if no payload arg but payload, not an error

This would mean it falls to callers to detect missing payload before attempting to decode it. However, my point above was that some of those may productively migrate to flux_rpc_getf() or similar which since the decoding is internal, must issue EPROTO if it is missing.

Does that make sense?

@chu11
Copy link
Member

chu11 commented Mar 1, 2017

@garlick Sounds good!

@chu11
Copy link
Member

chu11 commented Mar 3, 2017

As an aside, yesterday I tried to begin cleanup by replacing flux_rpc_get (r, NULL) calls with flux_rpc_getf (r, "{ ! }"). It took me awhile to realize this doesn't work, as internally flux_response_decode() would always be passed a non-NULL json_str argument.

So another fallout from this EPROTO behavior is in some scenarios flux_rpc() cannot be paired with flux_rpc_getf() seemlessly, atleast in this "obvious" scenario where you know no payload is going to be returned.

@garlick
Copy link
Member

garlick commented Mar 3, 2017 via email

@chu11
Copy link
Member

chu11 commented Mar 3, 2017

Yeah, I originally was trying to replace as much usage of flux_rpc_get() as possible, but it seems best to only modify those that actually expect a payload back.

This was referenced Mar 10, 2017
chu11 added a commit to chu11/flux-core that referenced this issue Mar 17, 2017
Rearchitect flux_request_decode.  User can now pass in a NULL
or non-NULL json_str pointer.  The function will succeed regardless
if a payload is available or not.  The function will no longer return
EPROTO under a payload & pointer mismatch.  User is responsible to
check for NULL or non-NULL per their expectations.

Fixes flux-framework#495
chu11 added a commit to chu11/flux-core that referenced this issue Mar 18, 2017
Rearchitect flux_request_decode.  User can now pass in a NULL
or non-NULL json_str pointer.  The function will succeed regardless
if a payload is available or not.  The function will no longer return
EPROTO under a payload & pointer mismatch.  User is responsible to
check for NULL or non-NULL per their expectations.

Fixes flux-framework#495
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants