Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception while processing completed tasks hangs the engine #481

Closed
kmazurek opened this issue Jun 18, 2021 · 1 comment
Closed

Exception while processing completed tasks hangs the engine #481

kmazurek opened this issue Jun 18, 2021 · 1 comment
Assignees
Labels
beta.2.patch.1 bug Something isn't working

Comments

@kmazurek
Copy link
Contributor

Description

OS: Ubuntu 20.04
yagna daemon version: 0.7.0
Python version: 3.8.5
yapapi library version: 0.6.1-alpha.0
yapapi branch: b0.6

Description of the issue:
When an exception is thrown in the async for loop of completed tasks (inside Golem's context) it causes the program to hang indefinitely.
The exception triggers the context manager __aexit__ function correctly, which in turn closes its AsyncExitStack:

yapapi/yapapi/engine.py

Lines 309 to 310 in d37b546

async def __aexit__(self, exc_type, exc_val, exc_tb):
await self._stack.aclose()

As a result of the above the _Engine's _shutdown method is invoked which waits for all jobs to be finished:
await asyncio.gather(*[job.finished.wait() for job in self._jobs])

This appears to be hanging indefinitely, as its waiting for the password cracking job, though yacat's example code is no longer running.
If we keep the program alive after 10 minutes there are logs about invoices being accepted by the requestor, followed by new proposals being received:

[2021-06-18T13:32:17.721+0200 INFO yapapi.executor] Golem is shutting down...
[2021-06-18T13:32:17.723+0200 INFO yapapi.summary] Task finished by provider 'provider-2', task data: 0
[2021-06-18T13:42:19.929+0200 INFO yapapi.summary] Received proposals from 2 providers so far
[2021-06-18T13:42:19.934+0200 INFO yapapi.summary] Received proposals from 2 providers so far
[2021-06-18T13:42:20.296+0200 INFO yapapi.summary] Accepted invoice from 'provider-1', amount: 0.005289805996944445
[2021-06-18T13:42:20.299+0200 INFO yapapi.summary] Accepted invoice from 'provider-2', amount: 0.02336818049527778
[2021-06-18T13:43:44.579+0200 INFO yapapi.summary] Received proposals from 2 providers so far
[2021-06-18T13:43:44.720+0200 INFO yapapi.summary] Received proposals from 2 providers so far

This is due to providers breaking the agreements due to no activities being created (fragment of provider agent logs below):

WARN  ya_provider::tasks::task_manager] Breaking agreement [7e57d06fd28880ffa42c4946d16578fc8d16475380aff5ea13d86cde48f19feb], reason: No activity created within 600s.

Since the program doesn't leave the _Engine's __aexit__ we never see a traceback of the original exception which caused the issue.

Steps To Reproduce

  1. Run yacat example throwing an arbitrary exception in async for loop inside Golem's context
  2. The example should fail after first password cracking task is finished
  3. The program will hang indefinitely, waiting for its jobs to finish

Logs and any additional context

yapapi_20210618_113147+0000.log
goth_20210618_074228+0000.zip
(goth logs are from a number of consecutive runs, correlating based on timestamps might be necessary)

@kmazurek
Copy link
Contributor Author

Resolved by #482

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
beta.2.patch.1 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants