Repeated `asyncio` errors #576

jnsgruk · 2021-11-05T11:30:04Z

I've been using pylibjuju through the pytest-operator for testing, and I'm seeing this error a lot:

ERROR    asyncio:base_events.py:1738 Task exception was never retrieved
future: <Task finished name='Task-30' coro=<Connection._pinger() done, defined at /home/jon/code/canonical/operator-libs-linux/.tox/integration/lib/python3.9/site-packages/juju/client/connection.py:502> exception=Exception('No facade Pinger in facades {}')>
Traceback (most recent call last):
  File "/home/jon/code/canonical/operator-libs-linux/.tox/integration/lib/python3.9/site-packages/juju/client/connection.py", line 516, in _pinger
    pinger_facade = client.PingerFacade.from_connection(self)
  File "/home/jon/code/canonical/operator-libs-linux/.tox/integration/lib/python3.9/site-packages/juju/client/_client.py", line 65, in from_connection
    raise Exception('No facade {} in facades {}'.format(facade_name,
Exception: No facade Pinger in facades {}

and

ERROR    asyncio:base_events.py:1738 Task exception was never retrieved
future: <Task finished name='Task-402' coro=<WebSocketCommonProtocol.recv() done, defined at /home/jon/code/canonical/operator-libs-linux/.tox/integration/lib/python3.9/site-packages/websockets/protocol.py:443> exception=ConnectionClosedOK('code = 1000 (OK), no reason')>
Traceback (most recent call last):
  File "/home/jon/code/canonical/operator-libs-linux/.tox/integration/lib/python3.9/site-packages/websockets/protocol.py", line 509, in recv
    await self.ensure_open()
  File "/home/jon/code/canonical/operator-libs-linux/.tox/integration/lib/python3.9/site-packages/websockets/protocol.py", line 803, in ensure_open
    raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedOK: code = 1000 (OK), no reason

These occur consistently on every run. More examples can be found here: https://github.com/jnsgruk/charm-kubernetes-dashboard/runs/4069754517?check_suite_focus=true#step:4:256

The text was updated successfully, but these errors were encountered:

SimonRichardson · 2021-11-05T17:19:34Z

We should always have a Pinger facade. So either we're not connected any longer or we're attempting to create a Pinger without first building the facades. There are couple of locations that I think need investigation:

This will override the pinger and not close the old one. We should at least guard against that. https://github.com/juju/python-libjuju/blob/master/juju/client/connection.py#L802-L803
Reconnect doesn't rebuild the facades, it should https://github.com/juju/python-libjuju/blob/master/juju/client/connection.py#L684-L697
When tailing the debug-log we should ensure that we build the facades or disable the pinger functionality https://github.com/juju/python-libjuju/blob/master/juju/client/connection.py#L327

#580 ### Description A bunch of background noise is happening in the concurrent asyncio events resulting in repeated error logs to build up, despite that nothing in the foreground is failing. These are often: 1. Actual exceptions that are ignored in the main task/thread. 2. Error logs bleeding from other libraries (potentially when trying to handle the ignored exceptions in (1) (b/c `asyncio` manages their events as well). 3. Tasks that are not gathered properly at teardown. This PR attempts to streamline all the tasks and events that are happening on our `connection.py` and make sure we're: * not producing any background exceptions, * not producing any unnecessary/useless/excess error logs, and * tearing down properly. Hopefully will fix #576 Jira card [#138](https://warthogs.atlassian.net/browse/JUJU-138) ### QA Steps In the CI test runs (especially in the ones with failed tests), we should only see the pass info and exception stacktraces that are relevant to the test logic (i.e. there shouldn't be any asyncio related errors in there). ### Notes & Discussion * This first commit ensures the `Pinger` facade is available when the `_pinger_task` is created, by moving the `_pinger_task` creation after `_build_facade` calls. * Also disables the `_pinger_task` completely for `debug_log` connections, because debug_log connections are not doing any RPCs, they directly connect through the WebSocketClientProtocol, so we're not getting any facades from the apiserver (like we do in the case of `_connect_with_login` etc) * A relevant bug here is https://bugs.python.org/issue36709 : which is about `asyncio.sslproto.SSLProtocol` raising an error whenever the event_loop is closed during an alive websocket connection. So in our `client/connection.py` we avoid keeping the connection alive (by removing `ping_interval=None`). * The websocket object (_ws) is tightly coupled with the asyncio-managed tasks (e.g. Pinger) as well as the Monitor, so interacting with it directly (e.g. doing conn._ws.close() instead of conn.close()) pushes things out of sync/control. So it shouldn't be used directly (e.g. avoid `conn.ws.close()`, instead use `conn.close()`) * Functions such as `model.get_controller` returns a newly constructed Controller object that is already connected with the model's own connection parameters. This puts the responsibility of closing out the connection (and destroying the spawned asyncio tasks etc) to the caller. If not done, everyone (asyncio, websockets etc) yells at pylibjuju about lingering tasks and open connections. So we introduce a `ConnectedController` object that when it's used with the `async with` form automatically disconnects after done. * asyncio loop event_handler is only called on Task exceptions when the Task object is cleared from the memory. But the garbabe collector doesn't clear the Task if we keep a reference for it (e.g. putting it in a neverending Task ---like Pinger) until the very end. So installing our own handler makes sure that the exceptions are retrieved and properly handled and logged regardless of when the Task is being destroyed.

cderici mentioned this issue Nov 5, 2021

[JUJU-138] Streamlining asyncio tasks/events #580

Merged

jujubot closed this as completed in #580 Nov 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repeated `asyncio` errors #576

Repeated `asyncio` errors #576

jnsgruk commented Nov 5, 2021

SimonRichardson commented Nov 5, 2021

Repeated asyncio errors #576

Repeated asyncio errors #576

Comments

jnsgruk commented Nov 5, 2021

SimonRichardson commented Nov 5, 2021

Repeated `asyncio` errors #576

Repeated `asyncio` errors #576