Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DiracX pods crashing #319

Closed
fstagni opened this issue Nov 14, 2024 · 1 comment · Fixed by #340
Closed

DiracX pods crashing #319

fstagni opened this issue Nov 14, 2024 · 1 comment · Fixed by #340

Comments

@fstagni
Copy link
Contributor

fstagni commented Nov 14, 2024

We are seen (repeatedly) the DiracX pod crashing with

2024-11-14 08:25:40,952 - INFO:     10.76.163.2:38662 - "GET /api/docs/ HTTP/1.1" 307 Temporary Redirect
2024-11-14 08:25:40,952 - INFO:     10.76.163.2:38650 - "GET /api/docs/ HTTP/1.1" 307 Temporary Redirect
2024-11-14 08:25:40,954 - INFO:     10.76.163.2:38664 - "GET /api/docs HTTP/1.1" 200 OK
2024-11-14 08:25:40,954 - INFO:     10.76.163.2:38670 - "GET /api/docs HTTP/1.1" 200 OK
2024-11-14 08:25:50,951 - INFO:     10.76.163.2:45892 - "GET /api/docs/ HTTP/1.1" 307 Temporary Redirect
2024-11-14 08:25:50,952 - INFO:     10.76.163.2:45880 - "GET /api/docs/ HTTP/1.1" 307 Temporary Redirect
2024-11-14 08:25:50,954 - INFO:     10.76.163.2:45902 - "GET /api/docs HTTP/1.1" 200 OK
2024-11-14 08:25:50,954 - INFO:     10.76.163.2:45916 - "GET /api/docs HTTP/1.1" 200 OK
2024-11-14 08:25:52,390 - INFO:     2001:1458:d00:14::16f:0 - "GET /api/auth/legacy-exchange?preferred_username=fstagni&scope=vo%3Adteam+group%3Adteam_pilot+property%3AGenericPilot+property%3ALimitedDelegation&expires_minutes=255157 HTTP/1.1" 500 Internal Server Error
2024-11-14 08:25:52,390 - ERROR:    Exception in ASGI application
  + Exception Group Traceback (most recent call last):
  |   File "/opt/conda/lib/python3.11/site-packages/starlette/_utils.py", line 76, in collapse_excgroups
  |     yield
  |   File "/opt/conda/lib/python3.11/site-packages/starlette/middleware/base.py", line 186, in __call__
  |     async with anyio.create_task_group() as task_group:
  |   File "/opt/conda/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 763, in __aexit__
  |     raise BaseExceptionGroup(
  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/opt/conda/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 401, in run_asgi
    |     result = await app(  # type: ignore[func-returns-value]
    |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/opt/conda/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
    |     return await self.app(scope, receive, send)
    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/opt/conda/lib/python3.11/site-packages/fastapi/applications.py", line 1054, in __call__
    |     await super().__call__(scope, receive, send)
    |   File "/opt/conda/lib/python3.11/site-packages/starlette/applications.py", line 113, in __call__
    |     await self.middleware_stack(scope, receive, send)
    |   File "/opt/conda/lib/python3.11/site-packages/starlette/middleware/errors.py", line 187, in __call__
    |     raise exc
    |   File "/opt/conda/lib/python3.11/site-packages/starlette/middleware/errors.py", line 165, in __call__
    |     await self.app(scope, receive, _send)
    |   File "/opt/conda/lib/python3.11/site-packages/opentelemetry/instrumentation/asgi/__init__.py", line 579, in __call__
    |     await self.app(scope, otel_receive, otel_send)
    |   File "/opt/conda/lib/python3.11/site-packages/starlette/middleware/cors.py", line 85, in __call__
    |     await self.app(scope, receive, send)
    |   File "/opt/conda/lib/python3.11/site-packages/starlette/middleware/base.py", line 185, in __call__
    |     with collapse_excgroups():
    |   File "/opt/conda/lib/python3.11/contextlib.py", line 158, in __exit__
    |     self.gen.throw(typ, value, traceback)
    |   File "/opt/conda/lib/python3.11/site-packages/starlette/_utils.py", line 82, in collapse_excgroups
    |     raise exc
    |   File "/opt/conda/lib/python3.11/site-packages/starlette/middleware/base.py", line 187, in __call__
    |     response = await self.dispatch_func(request, call_next)
    |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/opt/conda/lib/python3.11/site-packages/diracx/routers/__init__.py", line 475, in dispatch
    |     response = await call_next(request)
    |                ^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/opt/conda/lib/python3.11/site-packages/starlette/middleware/base.py", line 163, in call_next
    |     raise app_exc
    |   File "/opt/conda/lib/python3.11/site-packages/starlette/middleware/base.py", line 149, in coro
    |     await self.app(scope, receive_or_disconnect, send_no_error)
    |   File "/opt/conda/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
    |   File "/opt/conda/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    |     raise exc
    |   File "/opt/conda/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    |     await app(scope, receive, sender)
    |   File "/opt/conda/lib/python3.11/site-packages/starlette/routing.py", line 715, in __call__
    |     await self.middleware_stack(scope, receive, send)
    |   File "/opt/conda/lib/python3.11/site-packages/starlette/routing.py", line 735, in app
    |     await route.handle(scope, receive, send)
    |   File "/opt/conda/lib/python3.11/site-packages/starlette/routing.py", line 288, in handle
    |     await self.app(scope, receive, send)
    |   File "/opt/conda/lib/python3.11/site-packages/starlette/routing.py", line 76, in app
    |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
    |   File "/opt/conda/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    |     raise exc
    |   File "/opt/conda/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    |     await app(scope, receive, sender)
    |   File "/opt/conda/lib/python3.11/site-packages/starlette/routing.py", line 73, in app
    |     response = await f(request)
    |                ^^^^^^^^^^^^^^^^
    |   File "/opt/conda/lib/python3.11/site-packages/fastapi/routing.py", line 291, in app
    |     solved_result = await solve_dependencies(
    |                     ^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/opt/conda/lib/python3.11/site-packages/fastapi/dependencies/utils.py", line 640, in solve_dependencies
    |     solved = await run_in_threadpool(call, **solved_result.values)
    |              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/opt/conda/lib/python3.11/site-packages/starlette/concurrency.py", line 39, in run_in_threadpool
    |     return await anyio.to_thread.run_sync(func, *args)
    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/opt/conda/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    |     return await get_async_backend().run_sync_in_worker_thread(
    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/opt/conda/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
    |     return await future
    |            ^^^^^^^^^^^^
    |   File "/opt/conda/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 943, in run
    |     result = context.run(func, *args)
    |              ^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/opt/conda/lib/python3.11/site-packages/diracx/core/config/__init__.py", line 105, in read_config
    |     hexsha, modified = self.latest_revision()
    |                        ^^^^^^^^^^^^^^^^^^^^^^
    |   File "/opt/conda/lib/python3.11/site-packages/diracx/core/config/__init__.py", line 215, in latest_revision
    |     self._pull()
    |   File "/opt/conda/lib/python3.11/site-packages/cachetools/__init__.py", line 814, in wrapper
    |     v = method(self, *args, **kwargs)
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/opt/conda/lib/python3.11/site-packages/diracx/core/config/__init__.py", line 212, in _pull
    |     self.repo.remotes.origin.pull()
    |   File "/opt/conda/lib/python3.11/site-packages/git/remote.py", line 1120, in pull
    |     proc = self.repo.git.pull(
    |            ^^^^^^^^^^^^^^^^^^^
    |   File "/opt/conda/lib/python3.11/site-packages/git/cmd.py", line 986, in <lambda>
    |     return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)
    |                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/opt/conda/lib/python3.11/site-packages/git/cmd.py", line 1598, in _call_process
    |     return self.execute(call, **exec_kwargs)
    |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/opt/conda/lib/python3.11/site-packages/git/cmd.py", line 1262, in execute
    |     proc = safer_popen(
    |            ^^^^^^^^^^^^
    |   File "/opt/conda/lib/python3.11/subprocess.py", line 1026, in __init__
    |     self._execute_child(args, executable, preexec_fn, close_fds,
    |   File "/opt/conda/lib/python3.11/subprocess.py", line 1885, in _execute_child
    |     self.pid = _fork_exec(
    |                ^^^^^^^^^^^
    | BlockingIOError: [Errno 11] Resource temporarily unavailable
    +------------------------------------
@chaen
Copy link
Contributor

chaen commented Nov 28, 2024

Note for me not to forget: the number of processes steadily increases until it reaches the maximum. Probably due to GitPython
It's most probably the same as gitpython-developers/GitPython#421 but we are likely hitting another limit first.

chaen added a commit to chaen/diracx that referenced this issue Dec 5, 2024
chaen added a commit to chaen/diracx that referenced this issue Dec 5, 2024
chaen added a commit to chaen/diracx that referenced this issue Dec 5, 2024
@chaen chaen closed this as completed in #340 Dec 5, 2024
@chaen chaen closed this as completed in 5ef5b17 Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants