Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warnings when running tests using the graphql-transport-ws websocket protocol #2720

Open
rjwills28 opened this issue Apr 21, 2023 · 2 comments

Comments

@rjwills28
Copy link
Contributor

rjwills28 commented Apr 21, 2023

We are using Strawberry to build our GraphQL server and have noticed a possible issue when running tests with pytest using an aiohttp test client to open a websocket and subscribe using the new graphql-transport-ws protocol. We get the follow warning after the tests have finished:

Task was destroyed but it is pending!
task: <Task pending name='Task-169' coro=<BaseGraphQLTransportWSHandler.handle_connection_init_timeout() running at /home/testApp/venv/lib64/python3.8/site-packages/strawberry/subscriptions/protocols/graphql_transport_ws/handlers.py:82> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7f3c0f9452b0>()]>>

I have taken a look into the code based on this message and I think I have found out what is happening. Essentially an asyncio.sleep() task is created and run (here). The default time for the sleep is passed in as 60 secs from the GraphQLView instantiation. After the 60 sec sleep this method just checks whether the init_connection message was received. The 60 sec asyncio.sleep() is not a problem when actually running the server as the server runs for long enough for this time to elapse, or it is killed. However in the test scenario we create a subscription, check the results and then the tests complete, closing the event loop, which all takes less than 60 secs and hence we get the warning that the sleep task was 'destroyed but it it pending!'.

As a work-around we create a custom GraphQLView and set the connection_init_wait_timeout to a much smaller value (e.g. 5 sec). This stops the warning as the asyncio.sleep() task can complete in this time, before we close the event loop.

Question: We were just wondering whether there is a reason for a default connection_init_wait_timeout of 60 secs that we have missed?

I also wonder whether there may be a better way of managing the long asyncio.sleep() task to check the connection_init_received status. Here are 2 suggestions that might improve the behaviour:

  1. In the handle_connection_init_timeout method run a while loop checking the whether the connection_init message has been received with a short sleep in between and timeout if we do not receive it after 60 secs. This will cause the task to complete as soon as the connection_init message has been received instead of waiting the arbitrary 60 secs. It also means we do not have a long asyncio.sleep() task.
  2. Add some clean up code when a subscription is closed, for example here that checks if this asyncio.sleep(60) task is still running and cancels it if it is.

Any thoughts/feedback appreciated. Thanks!

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar
@DoctorJohn
Copy link
Member

Hey @rjwills28 thanks for the detailed report. This issue has been brought up before, recently in #2460 for example. Whenever it was brought up it kinda stalled, presumably because it doesn't really affect production (the timeout will just run out and the task ends)?

Question: We were just wondering whether there is a reason for a default connection_init_wait_timeout of 60 secs that we have missed?

That's a great question. The graphql-transport-ws protocol doesn't define a default, so we just went with a minute and made it configurable. Maybe lowering the default is a good call. Need to do some research on the defaults of other implementations.

Here are 2 suggestions that might improve the behaviour

I'm very much in favour of the second suggestion. Feel free to open a PR! (there are a decent amount of graphql-transport-ws PRs open right now, I'll review them as soon as I find some time. Maybe check first whether one of them is already addressing this issue).

@rjwills28
Copy link
Contributor Author

Thanks for the prompt response and feedback on this issue.

I've had a look through the open PRs and it looks like PR #2703 would fix the pytest warnings as it includes a clean-up of the self.connection_init_timeout_task in the shutdown method. I haven't reviewed the full PR but a quick test on my local setup shows it does stop the warnings.

On a related note, I just updated to the most recent version of Strawberry (0.171.2) and now as well as the simple warnings I am also getting the following exceptions:

Task exception was never retrieved
future: <Task finished name='Task-200' coro=<BaseGraphQLTransportWSHandler.operation_task() done, defined at .../lib/python3.8/site-packages/strawberry/subscriptions/protocols/graphql_transport_ws/handlers.py:224> exception=KeyError('sub1')>
Traceback (most recent call last):
  File ".../lib/python3.8/site-packages/strawberry/subscriptions/protocols/graphql_transport_ws/handlers.py", line 232, in operation_task
    await self.handle_async_results(result_source, operation)
  File ".../lib/python3.8/site-packages/strawberry/subscriptions/protocols/graphql_transport_ws/handlers.py", line 258, in handle_async_results
    async for result in result_source:
  File ".../lib/python3.8/site-packages/graphql/execution/map_async_iterator.py", line 42, in __anext__
    await wait([aclose, anext], return_when=FIRST_COMPLETED)
  File "/usr/local/lib/python3.8/asyncio/tasks.py", line 426, in wait
    return await _wait(fs, timeout, return_when, loop)
  File "/usr/local/lib/python3.8/asyncio/tasks.py", line 534, in _wait
    await waiter
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File ".../lib/python3.8/site-packages/strawberry/subscriptions/protocols/graphql_transport_ws/handlers.py", line 237, in operation_task
    result_source = self.subscriptions[operation.id]
KeyError: 'sub1'

I think this is coming from the change in this recent commit. On line 301 of strawberry/subscriptions/protocols/graphql_transport_ws/handlers.py we pop the operation_id from the subscriptions list on cleanup but no longer await the task.cancel(). Then when we try to get this same operation_id on line 237 (after we've caught the CancelledError raised on line 271) we get this KeyError.

As I said, I do not see the warnings or exceptions when running against the code in PR #2703 so maybe this is worth looking at unless we want an intermediate fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants