Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect errno #164

Open
Miosss opened this issue May 28, 2020 · 5 comments
Open

Incorrect errno #164

Miosss opened this issue May 28, 2020 · 5 comments

Comments

@Miosss
Copy link

Miosss commented May 28, 2020

The problem

I issue non-blocking read on DEALER socket connected to ROUTER socket.
data, err := client.RecvMessage(zmq.DONTWAIT)

ROUTER takes at least 1 second to complete the task (due to sleep()) and I do the read immediately.

I expected to get EAGAIN error, but instead I got err == nil and len(data) == 0 - proper empty read.

Situation

By debugging the library it seems to me, that this call starts the error (RecvBytes, zmq4.go:1077):
size, err = C.zmq_msg_recv(&msg, soc.soc, C.int(flags))

Here, size == -1 but err == nil.
Therefore errget(err) with nil returns nil instead of true error.

Maybe errget should do something when it is call with nil argument?

I believe that the root cause of this particular problem is not using zmq_errno.
In the documentation of that function it is said, that it should be used to properly get errno, when for example in a situation, where the application links to different C runtime, than the libzmq.

This is probably my case, because this happens on Windows, I have libzmq.dll built with MSVC and then generated stub libzmq.a using gcc dlltools. So the setup is exotic (but hey, welcome to compiling C libs on Windows + Go + Cgo).
What's more, during C. calls in Go, it returns plain errno and it is essentialy wrong in this case.

When I tried e := C.zmq_errno() just after the failed read - I get the correct EAGAIN (11) error.

Solutions?

While I probably could check C.zmq_errno() after each call, but I am not sure if it is sufficient enough and will the errors be cleared after succesful calls?
EDIT:
No, the error is not cleared. And since the returned err is nil, there is no way to now that C.zmq_errno() result is valid in this situation (+ all the threading issues possible).

One solution may be to drop all err from _, err := C. ... and call C.zmq_errno() instead? But it will require changes in many places.

Maybe modifications to errget will be sufficient? For example if argument err is nil the check the C.zmq_errno() ?

@pebbe
Copy link
Owner

pebbe commented May 28, 2020

Forget about the previous comment. It's all wrong. An interrupted signal call gives a EINTR, not a EAGAIN. I undid the changes.

So what is the problem exactly? Provide code that demonstrates.

@Miosss
Copy link
Author

Miosss commented May 28, 2020

@pebbe
I am not sure if it is easily reproducible. I believe that the main reason behind this is exactly what zmq_errno() is for. I found it through this SO

Look at the definition of this function:
int zmq_errno (void) { return errno; }

It returns just the errno - but from the context of the library itself.

In my case I have libzmq.dll built with MSVC, but I use gcc from MSYS2 for CGO. Therefore there may be the problem with proper propagation of the errno - situation described in here.
Your error handling relies on what C calling subsystem in GO gives you here:
size, err = C.zmq_msg_recv(&msg, soc.soc, C.int(flags))

zmq_msg_recv only returns the size of the message, the err is given by golang:

Any C function (even void functions) may be called in a multiple assignment context to retrieve both the return value (if any) and the C errno variable as an error (use _ to skip the result value if the function returns void). For example:

by godoc

So this err is basically the same as just reading errno (which you in fact do in errget).

The problem is - errno in dll may be different errno in the app.
libzmq sets errno which only resides in dll, and in my app errno is always 0.

I understand that this problem is why zmq_errno came to be.

The problem itself

I run
msg, err := client.RecvMessage(zmq.DONTWAIT)

I am sure that there is no message in queue - I should receive msg = nil and err = EAGAIN.
This doesn't happen. I get msg = []byte("") (empty message) and err = nil.

By debugging your code I can see that:
size, err = C.zmq_msg_recv(&msg, soc.soc, C.int(flags))
in this example returns size = -1 and err = nil.
Size = -1 clearly indicates that there IS and error, but Go gives you err = nil. In the next if you check the size to see if there is an error (and there is) and to get the actual error - you look into err. Which is nil.

So, size tells that there is an error, and err says there is none.
To me, the cause is in what I wrote in the begging - libzmq sets different errno, than CGO returns. You should probably check zmq_errno instead.

@Miosss
Copy link
Author

Miosss commented May 28, 2020

And look at this quote from zmq.h:

/* This function retrieves the errno as it is known to 0MQ library. The goal /
/
of this function is to make the code 100% portable, including where 0MQ /
/
compiled with certain CRT library (on Windows) is linked to an /
/
application that uses different CRT library. */
ZMQ_EXPORT int zmq_errno (void);

@pebbe
Copy link
Owner

pebbe commented May 29, 2020

I think I may have a fix. Can you try the latest version, please?

@Miosss
Copy link
Author

Miosss commented Jun 1, 2020

The same situation happens when Binding to the same TCP port for the second time - it silently fails, but without an error. Therefore the process thinks that it can accept messages, while the underlying socket is dead.

This occurs in Bind (zmq4.go963):
i, err = C.zmq4_bind(soc.soc, s)

i = -1 but err = nil, so the same as previously.

I have downloaded your latest version and it seems fixed - this case (binding) now correctly returns error (though I am not sure why is it error 100 -> Cannot create another system semaphore, but this must be libzmq thing).
I have not yet tested the EAGAIN case, but I believe it is the same as for Bind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants