Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yapapi doesn't remove temporary files, fails after creating 507th file and mishandles the failure #519

Closed
johny-b opened this issue Jul 2, 2021 · 3 comments · Fixed by #543

Comments

@johny-b
Copy link
Contributor

johny-b commented Jul 2, 2021

I use yapapi 1d28985 and yagna 0.7.1.

Reproduce the bug:

pip3 install git+https://github.com/golemfactory/yagna-requests.git@9972ada
git clone [email protected]:golemfactory/yagna-requests.git
cd yagna-requests
git checkout 9972ada
PYTHONPATH=~/yapapi:$PYTHONPATH python3 examples/calculator/run_calculator.py

Each time I run this I observe the following:

  1. Number of files in the /tmp/[some_directory] keeps growing. I'm pretty sure these are yapapi files because I added a print to yapapi that shows the same names:
diff --git a/yapapi/storage/gftp.py b/yapapi/storage/gftp.py
index 795ad28..904cb72 100644
--- a/yapapi/storage/gftp.py
+++ b/yapapi/storage/gftp.py
@@ -151,6 +151,7 @@ class __Process(jsonrpc_base.Server):
 @contextlib.contextmanager
 def _temp_file(temp_dir: Path) -> Iterator[Path]:
     file_name = temp_dir / str(uuid.uuid4())
+    print("YIELDING", file_name)
     yield file_name
     if file_name.exists():
         os.remove(file_name)

Content of these files matches files downloaded in ctx.download_file().

  1. When there are exactly 507 files created, yapapi terminates agreement(s) with traceback looking like this:
[2021-07-02T19:34:13.221+0200 DEBUG yapapi.rest] Destroying activity 1bcb59dd4dd740609bc6595899d47c2b on error:
Traceback (most recent call last):
  File "/home/jbetley/yapapi/yapapi/engine.py", line 516, in _worker
    await worker(agreement, activity, work_context)
  File "/home/jbetley/yapapi/yapapi/services.py", line 523, in _worker
    await self._engine.process_batches(agreement.id, activity, instance_batches)
  File "/home/jbetley/yapapi/yapapi/engine.py", line 555, in process_batches
    await batch.prepare()
  File "/home/jbetley/yapapi/yapapi/ctx.py", line 268, in prepare
    await step.prepare()
  File "/home/jbetley/yapapi/yapapi/ctx.py", line 90, in prepare
    self._src = await self.do_upload(self._storage)
  File "/home/jbetley/yapapi/yapapi/ctx.py", line 122, in do_upload
    return await storage.upload_file(self._src_path)
  File "/home/jbetley/yapapi/yapapi/storage/gftp.py", line 265, in upload_file
    links = await process.publish(files=[str(path)])
  File "/home/jbetley/yapapi/yapapi/storage/gftp.py", line 148, in send_message
    return message.parse_response(msg)
  File "/usr/local/lib/python3.8/dist-packages/jsonrpc_base/jsonrpc.py", line 216, in parse_response
    raise ProtocolError(code, message, data)
jsonrpc_base.jsonrpc.ProtocolError: (-32000, "Can't open file /tmp/tmp42rfofqk.", {'jsonrpc': '2.0', 'id': 6098065599679948863, 'error': {'code': -32000, 'message': "Can't open file /tmp/tmp42rfofqk."}})

I always have 507 files when the activities are destroyed (at least 3 last times I checked), but this might somehow depend also on the file size. My files are all 195 bytes, and 507 * 195 = almost_10^6.

  1. This error is not handled the expected way: service.state is forever running, despite the activity being destroyed.
@johny-b johny-b changed the title Yapapi doesn't remove temporary files, fails after creating 507th file and incorrectly handles the failure Yapapi doesn't remove temporary files, fails after creating 507th file and mishandles the failure Jul 2, 2021
@azawlocki azawlocki self-assigned this Jul 6, 2021
@azawlocki
Copy link
Contributor

Possibly related to golemfactory/yagna#1935

@azawlocki
Copy link
Contributor

azawlocki commented Jul 7, 2021

@johny-b There's a couple of different issues here

  1. yapapi does remove the temporary files, but only when Golem instance is shut down. We're going to delete them when they're not needed anymore (see remarks in Unpublish files published with gftp after transfer command is finished #531) but even before this is implemented this shouldn't be a problem -- unless your requestor app somehow manages to create a hundreds of thousands of those (one is created for each call to send_bytes / send_json).

  2. The problem is probably with too many open files: yapapi uses gftp to publish the files it needs to upload to providers, each such published file is kept open by gftp until it's closed with gftp close command -- which yapapi never does (again, see Unpublish files published with gftp after transfer command is finished #531). This is the most severe issue here.

  3. Handling the error due to too many open files (and other errors raised when interacting with gftp) is implemented in PR Handle errors raised in batch.prepare() in process_batches() #530 (already merged to b0.6), so your services should move to stopping state instead of being stuck running.

@johny-b
Copy link
Contributor Author

johny-b commented Jul 7, 2021

@azawlocki

OK, I understand. From my POV this #531 is important.

I can confirm that the third problem disappeared in b0.6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants