-
Notifications
You must be signed in to change notification settings - Fork 7.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZOOKEEPER-4210: Preserve return code from nonblocking send #1602
Conversation
Additional notes:
|
Failure in PR-build:
I will try to reproduce the failure locally. |
Thanks. I created https://issues.apache.org/jira/browse/ZOOKEEPER-4210 |
Ping @ztzg @eolivelli @anmolnar :) |
Confirmed that |
Running the build on branch-3.6 I see some unit tests failing, both with this change and without |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @smikes,
Looks good, in general. Thank you!
But…
This is technically an API & ABI break, as it is widening the range of error codes returned by API functions. As such, I would recommend not including it in upcoming 3.6 releases. (I would be okay with 3.7+ as 1/ C only uses integer codes, which most programs don't interpret and 2/ we are not on the "semantic versioning" train. @eolivelli, WDYT?)
In any case, we usually take PRs against master
and backport to the relevant branches; would you mind doing that?
I have also left a couple of comments on the diff, one of which is actionable.
Finally, there is the question of documenting the change in zookeeper.h
; the comment on e.g. zoo_aset
is now outdated and should probably be rephrased:
/* [...]
* \return ZOK on success or one of the following errcodes on failure:
* ZBADARGUMENTS - invalid input parameters
* ZINVALIDSTATE - zhandle state is either ZOO_SESSION_EXPIRED_STATE or ZOO_AUTH_FAILED_STATE
* ZMARSHALLINGERROR - failed to marshall a request; possibly, out of memory
*/
ZOOAPI int zoo_aset(zhandle_t *zh, const char *path, const char *buffer, int buflen,
int version, stat_completion_t completion, const void *data);
Would you agree?
Cheers, -D
P.-S. — You commented:
Running the build on branch-3.6 I see some unit tests failing, both with this change and without
Right; we have some stability issues with the current test suite. Passing -Dsurefire-forkcount=1
helps with Java tests, at the cost of a looong runtime. Something we have to fix ASAP, but please ignore CI failures which don't seem relevant in the meantime, as @maoling suggested.
Sounds good, will take a look. Internally we have 3.4 and 3.6 branches active so I developed against that, I will rebase this against the main branch as you suggest |
acce97b
to
c3081b8
Compare
2335cb3
to
e20bf70
Compare
Summary of current changes:
|
Separately: I was seeing the |
e20bf70
to
bc5c997
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@eolivelli, @anmolnar, @symat: PTAL.
This is this PR. LGTM.
Now #1609 and ZOOKEEPER-4217. Thank you for splitting it. I note that the PR currently is in "draft" state. Do you have specific improvements in mind?
Agree. (But thank you for preparing the commit anyway!) |
Cool! I am happy to take more feedback on this, or it can be merged.
No. Today I removed the "draft" status. After this, I will make some documentation updates to the C client dev notes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me,
but I left one question
bc5c997
to
f0539f2
Compare
548004f
to
96c13e2
Compare
96c13e2
to
bc8adb0
Compare
In a nonblocking send, we may be notified that the connection was lost. Currently this is not handled because the return value of flush_send_queue() is ignored. This change DOES NOT change the ABI or the return codes that are returned by zoo_a* async functions. This change DOES close the file descriptor and change zh->state to reflect that the connection was lost. Reorganized to minimize the change in behavior. Always call adaptor_send_queue() even if there was a marshalling error. Use `fd == -1` as test for whether to call close_zsock()
bc8adb0
to
037a4f8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; thanks!
This is now in |
Async API calls attempt to flush the send buffer, which calls flush_send_queue(); and can report ZOPERATIONTIMEOUT ZSYSTEMERROR ZCONNECTIONLOSS Specifically: send_buffer() calls send(2) with MSG_NOSIGNAL, which can return EPIPE; then send_buffer return -1, causing ZCONNECTIONLOSS from flush_send_queue(). Current async API calls drop the return value from flush_send_queue(), as below: adaptor_send_queue(zh, 0); return (rc < 0)?ZMARSHALLINGERROR:ZOK; The async API then returns ZOK instead of ZCONNECTIONLOSS. Author: Sam Mikes <[email protected]> Reviewers: Enrico Olivelli <[email protected]>, Damien Diederen <[email protected]> Closes apache#1602 from smikes/asyncsend-returncode-3.6
Async API calls attempt to flush the send buffer, which calls flush_send_queue(); and can report ZOPERATIONTIMEOUT ZSYSTEMERROR ZCONNECTIONLOSS Specifically: send_buffer() calls send(2) with MSG_NOSIGNAL, which can return EPIPE; then send_buffer return -1, causing ZCONNECTIONLOSS from flush_send_queue(). Current async API calls drop the return value from flush_send_queue(), as below: adaptor_send_queue(zh, 0); return (rc < 0)?ZMARSHALLINGERROR:ZOK; The async API then returns ZOK instead of ZCONNECTIONLOSS. Author: Sam Mikes <[email protected]> Reviewers: Enrico Olivelli <[email protected]>, Damien Diederen <[email protected]> Closes apache#1602 from smikes/asyncsend-returncode-3.6
Async API calls attempt to flush the send buffer, which calls flush_send_queue(); and can report ZOPERATIONTIMEOUT ZSYSTEMERROR ZCONNECTIONLOSS Specifically: send_buffer() calls send(2) with MSG_NOSIGNAL, which can return EPIPE; then send_buffer return -1, causing ZCONNECTIONLOSS from flush_send_queue(). Current async API calls drop the return value from flush_send_queue(), as below: adaptor_send_queue(zh, 0); return (rc < 0)?ZMARSHALLINGERROR:ZOK; The async API then returns ZOK instead of ZCONNECTIONLOSS. Author: Sam Mikes <[email protected]> Reviewers: Enrico Olivelli <[email protected]>, Damien Diederen <[email protected]> Closes apache#1602 from smikes/asyncsend-returncode-3.6
Async API calls attempt to flush the send buffer, which calls flush_send_queue(); and can report ZOPERATIONTIMEOUT ZSYSTEMERROR ZCONNECTIONLOSS Specifically: send_buffer() calls send(2) with MSG_NOSIGNAL, which can return EPIPE; then send_buffer return -1, causing ZCONNECTIONLOSS from flush_send_queue(). Current async API calls drop the return value from flush_send_queue(), as below: adaptor_send_queue(zh, 0); return (rc < 0)?ZMARSHALLINGERROR:ZOK; The async API then returns ZOK instead of ZCONNECTIONLOSS. Author: Sam Mikes <[email protected]> Reviewers: Enrico Olivelli <[email protected]>, Damien Diederen <[email protected]> Closes apache#1602 from smikes/asyncsend-returncode-3.6
Async API calls attempt to flush the send buffer, which calls flush_send_queue(); and can report ZOPERATIONTIMEOUT ZSYSTEMERROR ZCONNECTIONLOSS Specifically: send_buffer() calls send(2) with MSG_NOSIGNAL, which can return EPIPE; then send_buffer return -1, causing ZCONNECTIONLOSS from flush_send_queue(). Current async API calls drop the return value from flush_send_queue(), as below: adaptor_send_queue(zh, 0); return (rc < 0)?ZMARSHALLINGERROR:ZOK; The async API then returns ZOK instead of ZCONNECTIONLOSS. Author: Sam Mikes <[email protected]> Reviewers: Enrico Olivelli <[email protected]>, Damien Diederen <[email protected]> Closes apache#1602 from smikes/asyncsend-returncode-3.6
Async API calls attempt to flush the send buffer, which calls flush_send_queue(); and can report ZOPERATIONTIMEOUT ZSYSTEMERROR ZCONNECTIONLOSS Specifically: send_buffer() calls send(2) with MSG_NOSIGNAL, which can return EPIPE; then send_buffer return -1, causing ZCONNECTIONLOSS from flush_send_queue(). Current async API calls drop the return value from flush_send_queue(), as below: adaptor_send_queue(zh, 0); return (rc < 0)?ZMARSHALLINGERROR:ZOK; The async API then returns ZOK instead of ZCONNECTIONLOSS. Author: Sam Mikes <[email protected]> Reviewers: Enrico Olivelli <[email protected]>, Damien Diederen <[email protected]> Closes apache#1602 from smikes/asyncsend-returncode-3.6
Async API calls attempt to flush the send buffer, which calls flush_send_queue(); and can report ZOPERATIONTIMEOUT ZSYSTEMERROR ZCONNECTIONLOSS Specifically: send_buffer() calls send(2) with MSG_NOSIGNAL, which can return EPIPE; then send_buffer return -1, causing ZCONNECTIONLOSS from flush_send_queue(). Current async API calls drop the return value from flush_send_queue(), as below: adaptor_send_queue(zh, 0); return (rc < 0)?ZMARSHALLINGERROR:ZOK; The async API then returns ZOK instead of ZCONNECTIONLOSS. Author: Sam Mikes <[email protected]> Reviewers: Enrico Olivelli <[email protected]>, Damien Diederen <[email protected]> Closes apache#1602 from smikes/asyncsend-returncode-3.6 Co-authored-by: Sam Mikes <[email protected]>
Async API calls attempt to flush the send buffer, which
calls flush_send_queue(); and can report
ZOPERATIONTIMEOUT
ZSYSTEMERROR
ZCONNECTIONLOSS
Specifically: send_buffer() calls send(2) with MSG_NOSIGNAL,
which can return EPIPE; then send_buffer return -1, causing
ZCONNECTIONLOSS from flush_send_queue().
Current async API calls drop the return value from flush_send_queue(),
as below:
The async API then returns ZOK instead of ZCONNECTIONLOSS.