You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Started seeing this problem after migrating some apps to kubernetes.
My original setup on bare metal is like this: There's one instance of the "responder" app, listening on a Rep0 socket, and several instances of the "requester" app, dialing the responder with Req0 sockets. All these instances run on the same host machine. Every day on a timer, the requester instances start up first, and after a few minutes the responder starts. The requester's code is like this:
async with contextlib.AsyncExitStack() as stack:
req = stack.enter_context(pynng.Req0(dial='tcp://responder:7470'))
while True:
await req.asend(b'Hello')
resp = await req.arecv()
# do stuff
There was never a problem with requesters start dialing before responder starts listening. The Req0 socket simply fails the initial sync dial, changes to async dialing, and eventually connects.
Then, I had to migrate this setup to kubernetes. So I made a responder deployment with one pod, a responder service pointing to the Rep0 port of the responder pod, and a requester deployment with several pods. The requesters dial the service address of the responder.
In this setup, there's a chance that the requesters' dialing attempts fail outright:
File "/app/requester.py", line 54, in do_work
req = stack.enter_context(pynng.Req0(dial=f'tcp://{responder_host}:{responder_port}'))
File "/app/venv/lib/python3.9/site-packages/pynng/nng.py", line 938, in __init__
super().__init__(**kwargs)
File "/app/venv/lib/python3.9/site-packages/pynng/nng.py", line 349, in __init__
self.dial(dial, block=block_on_dial)
File "/app/venv/lib/python3.9/site-packages/pynng/nng.py", line 374, in dial
return self.dial(address, block=True)
File "/app/venv/lib/python3.9/site-packages/pynng/nng.py", line 371, in dial
return self._dial(address, flags=0)
File "/app/venv/lib/python3.9/site-packages/pynng/nng.py", line 390, in _dial
check_err(ret)
File "/app/venv/lib/python3.9/site-packages/pynng/exceptions.py", line 201, in check_err
raise exc(string, err)
pynng.exceptions.NNGException: Connection shutdown
This is an uncaught exception, and the requester basically dies without retrying. Why does this happen?
The text was updated successfully, but these errors were encountered:
Started seeing this problem after migrating some apps to kubernetes.
My original setup on bare metal is like this: There's one instance of the "responder" app, listening on a Rep0 socket, and several instances of the "requester" app, dialing the responder with Req0 sockets. All these instances run on the same host machine. Every day on a timer, the requester instances start up first, and after a few minutes the responder starts. The requester's code is like this:
There was never a problem with requesters start dialing before responder starts listening. The Req0 socket simply fails the initial sync dial, changes to async dialing, and eventually connects.
Then, I had to migrate this setup to kubernetes. So I made a responder deployment with one pod, a responder service pointing to the Rep0 port of the responder pod, and a requester deployment with several pods. The requesters dial the service address of the responder.
In this setup, there's a chance that the requesters' dialing attempts fail outright:
This is an uncaught exception, and the requester basically dies without retrying. Why does this happen?
The text was updated successfully, but these errors were encountered: