Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IRR Explorer front end occasionally hangs #193

Open
teunvink opened this issue Oct 14, 2022 · 8 comments
Open

IRR Explorer front end occasionally hangs #193

teunvink opened this issue Oct 14, 2022 · 8 comments
Labels
bug Something isn't working

Comments

@teunvink
Copy link
Member

Once in a while the IRR Explorer front end hangs, the only way to fix this is to kill the process and restart it. We need more information on what causes this (logs,stacktraces, memory/cpu usage, etc). We can collect the data in this issue, so we can find a cause (and solution).

@teunvink teunvink added the bug Something isn't working label Oct 14, 2022
@mxsasha
Copy link
Collaborator

mxsasha commented Oct 14, 2022

My first thoughts here are:

  • Is the process stuck at 100% CPU or waiting for IO?
  • Is memory consumption increasing while it is stuck?
  • If not CPU, what does strace say it's doing?

Logs may be a bit misleading - I don't know if we log a request upon receiving or upon answering it (which may not be reached). We can definitely add logging at request receive time though.

@mxsasha
Copy link
Collaborator

mxsasha commented Jan 9, 2023

I'll also add irrdnet/irrd#721 to this project so we can get Python tracebacks while the process is running.

@teunvink
Copy link
Member Author

We're still experiencing this at least once or twice a month. Last occurence was today. CPU load graphs show 100% CPU utilisation for a long time before the process was killed:
cpu-day

More statistics can be found here: https://irrexplorer.nlnog.net/munin/localhost/localhost/index.html

A suspicion we have, is that although we see 'breaking' in the output, the hanging process isn't aborted.

@mxsasha: do you have any additional steps on how to investigate this better next time it occurs?

@teunvink
Copy link
Member Author

teunvink commented Feb 3, 2023

Last entries before the latest crash:

INFO:     xx.xx.xx.xx:0 - "GET /api/sets/member-of/AS-ECXON HTTP/1.0" 200 OK
starting step 4 with 63 items to resolve, 51 already done
starting step 1 with 1 items to resolve, 1 already done
starting step 2 with 89 items to resolve, 1 already done
starting step 6 with 374 items to resolve, 1048 already done
breaking

@mxsasha
Copy link
Collaborator

mxsasha commented Feb 6, 2023

No clear answer yet, but ct does seem to be somewhere in the set resolving. #220 adds a few extra debug lines, and also responding to SIGUSR1 by printing a traceback. So if you do that on a hanging process, it should tell us where the process is hanging. And the new lines logged may have answers about why.

@teunvink
Copy link
Member Author

teunvink commented Jun 1, 2024

INFO:     a.b.c.d:0 - "GET /api/prefixes/asn/AS16839 HTTP/1.0" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/aiohttp/connector.py", line 980, in _wrap_create_connection
    return await self._loop.create_connection(*args, **kwargs)  # type: ignore[return-value]  # noqa
  File "uvloop/loop.pyx", line 2039, in create_connection
  File "uvloop/loop.pyx", line 2016, in uvloop.loop.Loop.create_connection
ConnectionRefusedError: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 435, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/starlette/middleware/cors.py", line 83, in __call__
    await self.app(scope, receive, send)
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/starlette/middleware/cors.py", line 83, in __call__                                                                                                                      [0/1913]
    await self.app(scope, receive, send)
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 57, in wrapped_app
    raise exc
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 46, in wrapped_app
    await app(scope, receive, sender)
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/starlette/routing.py", line 733, in __call__
    await route.handle(scope, receive, send)
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/starlette/routing.py", line 285, in handle
    await self.app(scope, receive, send)
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/starlette/routing.py", line 74, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 57, in wrapped_app
    raise exc
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/starlette/_exception_handler.py", line 46, in wrapped_app
    await app(scope, receive, sender)
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/starlette/routing.py", line 69, in app
    response = await func(request)
  File "/home/irrexplorer/irrexplorer/irrexplorer/api/queries.py", line 98, in prefixes_asn
    asn_prefixes = await PrefixCollector(request.app.state.database).asn_summary(
  File "/home/irrexplorer/irrexplorer/irrexplorer/api/collectors.py", line 57, in asn_summary
    aggregates = await self._collect_aggregate_prefixes_for_asn(asn)
  File "/home/irrexplorer/irrexplorer/irrexplorer/api/collectors.py", line 104, in _collect_aggregate_prefixes_for_asn
    routes_irrd, routes_bgp = await _execute_tasks(tasks)
  File "/home/irrexplorer/irrexplorer/irrexplorer/api/collectors.py", line 254, in _execute_tasks
    return await asyncio.gather(*tasks)
  File "/home/irrexplorer/irrexplorer/irrexplorer/backends/irrd.py", line 147, in query_asn
    result = await session.execute(GQL_QUERY_ASN, {"asn": asn})
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/gql/client.py", line 1629, in execute
    result = await self._execute(
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/gql/client.py", line 1535, in _execute
    result = await asyncio.wait_for(
  File "/usr/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/gql/transport/aiohttp.py", line 308, in execute
    async with self.session.post(self.url, ssl=self.ssl, **post_args) as resp:
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/aiohttp/client.py", line 1167, in __aenter__
    self._resp = await self._coro
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/aiohttp/client.py", line 562, in _request
    conn = await self._connector.connect(
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/aiohttp/connector.py", line 540, in connect
    proto = await self._create_connection(req, traces, timeout)
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/aiohttp/connector.py", line 901, in _create_connection
    _, proto = await self._create_direct_connection(req, traces, timeout)
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/aiohttp/connector.py", line 1209, in _create_direct_connection
    raise last_exc
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/aiohttp/connector.py", line 1178, in _create_direct_connection
    transp, proto = await self._wrap_create_connection(
  File "/home/irrexplorer/venv/lib/python3.10/site-packages/aiohttp/connector.py", line 988, in _wrap_create_connection
    raise client_error(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host localhost:4380 ssl:False [Connection refused]

@teunvink
Copy link
Member Author

teunvink commented Jun 1, 2024

It seems irrd had crashed:

root@irrexplorer0:~# ps aux|grep irrd
root     3966033  0.0  0.0   9080  2424 pts/0    S+   19:01   0:00 grep --color=auto irrd
```

Restarting irrd made irrexplorer function again, so it seems in this case ('internal server error was served') the problem was irrd was unavailable.

@mxsasha
Copy link
Collaborator

mxsasha commented Sep 30, 2024

Discussion: this is likely an issue that IRRexplorer does not handle IRRD query timeouts well and/or if IRRD closes the socket. IRRexplorer ends up looping forever on CPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants