Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely high CPU usage in workers #8642

Open
dominictory opened this issue Oct 9, 2024 · 14 comments
Open

Extremely high CPU usage in workers #8642

dominictory opened this issue Oct 9, 2024 · 14 comments
Assignees
Labels
bug use for describing something not working as expected
Milestone

Comments

@dominictory
Copy link

dominictory commented Oct 9, 2024

Description

Recently, we stopped all connectors/workers, then cleared them in the UI. I then restarted them to re-register. I noticed I wasn't fully utilising available resources on my system, so I created a 2nd ingestion platform (I also have a frontend platform) with 3 extra workers, totalling 6. This took me to close to fully utilising CPU (16 cores!), and bundles were being processed quickly. This was yesterday. I have since, this morning, noticed upwards of 1million bundles, workers failing, connectors going inactive, and load average in the 100s-1000s (was <16 yesterday). I was discussing the urlhaus payloads connector with someone, and they were seeing HUGE messages, so I disabled that connector. I am not seeing messages like that anymore, but CPU is still spiking, and I can't seem to track down what's causing it. I understand rules engine, and others, do not use the worker so it shouldn't be that? Is there a way I can better understand what's causing workers to be working SO hard all of a sudden? Or why ingestion is initially fast (between 15-30 bundles/sec) and then grinds to a halt eventually as CPU usage on the system spikes massively? When checking docker stats for workers, I am seeing 100s of PIDs for each (processes/threads created):

96bd264f2efc   opencti_worker2.3.m305r89vr2wirh6babi6nuzur                                          250.55%   558.7MiB / 62.88GiB   0.87%     147MB / 76.6MB    1.2MB / 7.48MB    560
4c7c5d7c3c76   opencti_worker.3.sxwccvjv71c2gnuo5rux690ng                                           257.02%   583.4MiB / 62.88GiB   0.91%     154MB / 79.9MB    1.17MB / 4.3MB    583
9a2a87a92ee4   opencti_worker2.1.3t1fulcmgih7158lljow0fqkg                                          224.01%   562.2MiB / 62.88GiB   0.87%     148MB / 77.1MB    1.99MB / 13.4MB   564
c10f7539b0d0   opencti_worker.2.pb1j6cksmilpbz9k8jb5bfkwr                                           82.82%    585.1MiB / 62.88GiB   0.91%     154MB / 79.7MB    1.25MB / 4.78MB   582
dde9cc1e770f   opencti_worker.1.qecr3ycm1i0o4qfm7tmu8y2vq                                           224.98%   585.7MiB / 62.88GiB   0.91%     158MB / 82.2MB    21.2MB / 6.56MB   583
74f978cb90dd   opencti_worker2.2.b6oe51itolc2ztpadpemby5ad                                          168.74%   573.4MiB / 62.88GiB   0.89%     148MB / 77.3MB    26.4MB / 7.75MB   566

Worker log sample:

INFO Message processed, thread terminated | timestamp=2024-10-09T08:48:00.579177Z name=worker taskName=null

INFO Message acknowledged | timestamp=2024-10-09T08:48:00.579605Z name=worker taskName=null attributes={"tag":1316}

INFO Processing a new message, launching a thread... | timestamp=2024-10-09T08:48:00.582989Z name=worker taskName=null attributes={"tag":1317}

INFO Creating stix_core_relationship | timestamp=2024-10-09T08:48:00.591413Z name=api taskName=null attributes={"relationship_type":"indicates","from_id":"indicator--2933c936-d2ea-588e-8ebc-b0e0fb289877","to_id":"malware--bd24bc6a-229a-5d93-ba8f-2f54d47dc121"}

INFO Report expectation | timestamp=2024-10-09T08:47:59.211169Z name=api taskName=null attributes={"work_id":"work_c1af30c7-b667-4167-9fcb-693972e61091_2024-10-08T12:00:45.342Z"}

INFO Report expectation | timestamp=2024-10-09T08:47:59.215432Z name=api taskName=null attributes={"work_id":"work_f6b839e6-4356-4c8b-ac04-5446146af1be_2024-10-08T11:07:10.357Z"}

INFO Creating stix_observable_relationship | timestamp=2024-10-09T08:47:59.235232Z name=api taskName=null attributes={"relationship_type":"obs_resolves-to","from_id":"b47cf288-e86d-4eff-8347-26e9e48640cd","to_id":"ipv4-addr--cb4f64d5-c081-5c00-a62a-f6e650a98fe0"}

INFO Message processed, thread terminated | timestamp=2024-10-09T08:47:59.345125Z name=worker taskName=null

INFO Message acknowledged | timestamp=2024-10-09T08:47:59.345502Z name=worker taskName=null attributes={"tag":301}

INFO Processing a new message, launching a thread... | timestamp=2024-10-09T08:47:59.348566Z name=worker taskName=null attributes={"tag":302}

INFO Importing an object | timestamp=2024-10-09T08:47:59.349569Z name=api taskName=null attributes={"type":"indicator","id":"indicator--582021d3-0328-523e-8170-44b9213593a8"}

INFO Creating External Reference | timestamp=2024-10-09T08:47:59.350089Z name=api taskName=null attributes={"source_name":"Abuse.ch URLhaus"}

INFO Message processed, thread terminated | timestamp=2024-10-09T08:47:59.366841Z name=worker taskName=null

INFO Report expectation | timestamp=2024-10-09T08:48:02.097043Z name=api taskName=null attributes={"work_id":"work_c1af30c7-b667-4167-9fcb-693972e61091_2024-10-08T12:00:45.342Z"}

INFO Message acknowledged | timestamp=2024-10-09T08:47:59.367103Z name=worker taskName=null attributes={"tag":1386}

INFO Processing a new message, launching a thread... | timestamp=2024-10-09T08:47:59.369791Z name=worker taskName=null attributes={"tag":1387}

INFO Creating stix_core_relationship | timestamp=2024-10-09T08:47:59.370594Z name=api taskName=null attributes={"relationship_type":"indicates","from_id":"indicator--77b9ab2f-ab24-55a1-b2b8-68bd6320fd45","to_id":"attack-pattern--2231c569-b57d-551c-9048-11a1b67f0a81"}

INFO Report expectation | timestamp=2024-10-09T08:48:00.559844Z name=api taskName=null attributes={"work_id":"work_3440de91-0e65-4b07-806a-ca953cfc7934_2024-10-09T08:47:07.039Z"}

INFO Message processed, thread terminated | timestamp=2024-10-09T08:48:00.670578Z name=worker taskName=null

INFO Message acknowledged | timestamp=2024-10-09T08:48:00.671084Z name=worker taskName=null attributes={"tag":84}

INFO Processing a new message, launching a thread... | timestamp=2024-10-09T08:48:00.674392Z name=worker taskName=null attributes={"tag":85}

INFO Creating stix_core_relationship | timestamp=2024-10-09T08:48:00.675942Z name=api taskName=null attributes={"relationship_type":"located-at","from_id":"location--89de54d0-9313-5d0c-a47d-a49dd90a0bbb","to_id":"location--e37643d5-bc35-5997-a2d8-a3e15ef0eb4a"}

INFO Report expectation | timestamp=2024-10-09T08:48:01.061805Z name=api taskName=null attributes={"work_id":"work_c1af30c7-b667-4167-9fcb-693972e61091_2024-10-08T12:00:45.342Z"}

INFO Getting connectors ... | timestamp=2024-10-09T08:48:01.504127Z name=api taskName=null

INFO Report expectation | timestamp=2024-10-09T08:48:01.521832Z name=api taskName=null attributes={"work_id":"work_3440de91-0e65-4b07-806a-ca953cfc7934_2024-10-09T08:47:07.039Z"}

INFO Message processed, thread terminated | timestamp=2024-10-09T08:48:02.028025Z name=worker taskName=null

INFO Message processed, thread terminated | timestamp=2024-10-09T08:48:02.774602Z name=worker taskName=null

INFO Message acknowledged | timestamp=2024-10-09T08:48:02.775421Z name=worker taskName=null attributes={"tag":1317}

INFO Processing a new message, launching a thread... | timestamp=2024-10-09T08:48:02.782645Z name=worker taskName=null attributes={"tag":1318}

INFO Message processed, thread terminated | timestamp=2024-10-09T08:48:00.088242Z name=worker taskName=null

INFO Message acknowledged | timestamp=2024-10-09T08:48:00.088496Z name=worker taskName=null attributes={"tag":1379}

INFO Processing a new message, launching a thread... | timestamp=2024-10-09T08:48:00.091380Z name=worker taskName=null attributes={"tag":1380}

INFO Creating stix_core_relationship | timestamp=2024-10-09T08:48:00.092195Z name=api taskName=null attributes={"relationship_type":"indicates","from_id":"indicator--2933c936-d2ea-588e-8ebc-b0e0fb289877","to_id":"malware--c085cef8-5d50-54c0-a58a-f02b70a4c83b"}

INFO Report expectation | timestamp=2024-10-09T08:48:00.909092Z name=api taskName=null attributes={"work_id":"work_c1af30c7-b667-4167-9fcb-693972e61091_2024-10-08T12:00:45.342Z"}

INFO Message processed, thread terminated | timestamp=2024-10-09T08:48:01.047236Z name=worker taskName=null

INFO Message acknowledged | timestamp=2024-10-09T08:48:01.047501Z name=worker taskName=null attributes={"tag":1380}

INFO Processing a new message, launching a thread... | timestamp=2024-10-09T08:48:01.050718Z name=worker taskName=null attributes={"tag":1381}

INFO Creating stix_core_relationship | timestamp=2024-10-09T08:48:01.052131Z name=api taskName=null attributes={"relationship_type":"indicates","from_id":"indicator--2933c936-d2ea-588e-8ebc-b0e0fb289877","to_id":"attack-pattern--fa7c2996-72b0-58d6-a487-093765e37059"}

ERROR Error pinging the API | timestamp=2024-10-09T07:43:32.936556Z name=worker exc_info=Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py", line 466, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py", line 1095, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.12/site-packages/urllib3/connection.py", line 730, in connect
    sock_and_verified = _ssl_wrap_socket_and_match_hostname(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/urllib3/connection.py", line 909, in _ssl_wrap_socket_and_match_hostname
    ssl_sock = ssl_wrap_socket(
               ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/urllib3/util/ssl_.py", line 469, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/urllib3/util/ssl_.py", line 513, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/ssl.py", line 455, in wrap_socket
    return self.sslsocket_class._create(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/ssl.py", line 1041, in _create
    self.do_handshake()
  File "/usr/local/lib/python3.12/ssl.py", line 1319, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLEOFError: [SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1000)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py", line 789, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py", line 490, in _make_request
    raise new_e
urllib3.exceptions.SSLError: [SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1000)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py", line 843, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/urllib3/util/retry.py", line 519, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1000)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/opencti-worker/worker.py", line 58, in ping
    self.api.query(
  File "/usr/local/lib/python3.12/site-packages/pycti/api/opencti_api_client.py", line 337, in query
    r = self.session.post(
        ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/requests/sessions.py", line 637, in post
    return self.request("POST", url, data=data, json=json, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/requests/adapters.py", line 698, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1000)'))) taskName=null attributes={"reason":"HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1000)')))","headers":"{'User-Agent': 'pycti/6.3.1', 'Authorization': 'Bearer 4e95798b-21a6-4571-819f-d974f2c2abea'}"}

INFO Creating stix_core_relationship | timestamp=2024-10-09T08:48:03.002037Z name=api taskName=null attributes={"relationship_type":"indicates","from_id":"indicator--2933c936-d2ea-588e-8ebc-b0e0fb289877","to_id":"attack-pattern--4daaeb43-d795-5e83-97ec-3c83164a50c3"}

ERROR Error pinging the API | timestamp=2024-10-09T07:43:47.775595Z name=worker exc_info=Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py", line 466, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py", line 1095, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.12/site-packages/urllib3/connection.py", line 730, in connect
    sock_and_verified = _ssl_wrap_socket_and_match_hostname(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/urllib3/connection.py", line 909, in _ssl_wrap_socket_and_match_hostname
    ssl_sock = ssl_wrap_socket(
               ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/urllib3/util/ssl_.py", line 469, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/urllib3/util/ssl_.py", line 513, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/ssl.py", line 455, in wrap_socket
    return self.sslsocket_class._create(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/ssl.py", line 1041, in _create
    self.do_handshake()
  File "/usr/local/lib/python3.12/ssl.py", line 1319, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLEOFError: [SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1000)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py", line 789, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py", line 490, in _make_request
    raise new_e
urllib3.exceptions.SSLError: [SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1000)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py", line 843, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/urllib3/util/retry.py", line 519, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1000)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/opencti-worker/worker.py", line 58, in ping
    self.api.query(
  File "/usr/local/lib/python3.12/site-packages/pycti/api/opencti_api_client.py", line 337, in query
    r = self.session.post(
        ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/requests/sessions.py", line 637, in post
    return self.request("POST", url, data=data, json=json, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/requests/adapters.py", line 698, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1000)'))) taskName=null attributes={"reason":"HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1000)')))","headers":"{'User-Agent': 'pycti/6.3.1', 'Authorization': 'Bearer 4e95798b-21a6-4571-819f-d974f2c2abea'}"}

tamp": "2024-10-09T07:43:47.782489Z", "level": "ERROR", "name": "worker", "message": "Error pinging the API", "exc_info": "urllib3.exceptions.SSLError: EOF occurred in violation of protocol (_ssl.c:2406)\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.12/site-packages/requests/adapters.py\", line 667, in send\n    resp = conn.urlopen(\n           ^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py\", line 843, in urlopen\n    retries = retries.increment(\n              ^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/urllib3/util/retry.py\", line 519, in increment\n    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]\n    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nurllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2406)')))\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/opt/opencti-worker/worker.py\", line 58, in ping\n    self.api.query(\n  File \"/usr/local/lib/python3.12/site-packages/pycti/api/opencti_api_client.py\", line 337, in query\n    r = self.session.post(\n        ^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/requests/sessions.py\", line 637, in post\n    return self.request(\"POST\", url, data=data, json=json, **kwargs)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/requests/sessions.py\", line 589, in request\n    resp = self.send(prep, **send_kwargs)\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/requests/sessions.py\", line 703, in send\n    r = adapter.send(request, **kwargs)\n        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.12/site-packages/requests/adapters.py\", line 698, in send\n    raise SSLError(e, request=request)\nrequests.exceptions.SSLError: HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2406)')))", "taskName": null, "attributes": {"reason": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2406)')))", "headers": "{'User-Agent': 'pycti/6.3.1', 'Authorization': 'Bearer 4e95798b-21a6-4571-819f-d974f2c2abea'}"}}

image

image

image

Environment

  1. OS : Ubuntu 22.04 LTS
  2. OpenCTI version: 6.3.1
  3. Other environment details: 1x frontend + 2x ingestion platforms (3 workers each, 6 total)
  4. System resources: 16c 64g
@dominictory dominictory added bug use for describing something not working as expected needs triage use to identify issue needing triage from Filigran Product team labels Oct 9, 2024
@MaxwellDPS
Copy link

MaxwellDPS commented Oct 10, 2024

Appears related to #8629

@nino-filigran nino-filigran added needs more info Intel needed about the use case and removed needs triage use to identify issue needing triage from Filigran Product team labels Oct 14, 2024
@nino-filigran
Copy link

Currently assessing on our side if we can manage to reproduce (hence the status change)

@richard-julien
Copy link
Member

Maybe related to missing queue in the rabbit. Do you remember doing some operation/cleanup/maintenance on the rabbitmq?

@dominictory
Copy link
Author

Maybe related to missing queue in the rabbit. Do you remember doing some operation/cleanup/maintenance on the rabbitmq?

I saw another similar thread, and you had a potential solution of the below:

  • Stop all connectors/workers
  • Delete connectors in OpenCTI UI
  • Restart connectors/workers

This resolved the issue initially but ingestion slowed right down within hours, yet resource usage remained high across all workers. Other than that, I didn't do anything else with rabbitmq.

@richard-julien
Copy link
Member

Do you have an RSS/TAXII/CSV feed configured?

@dominictory
Copy link
Author

Do you have an RSS/TAXII/CSV feed configured?

I have 6 CSV feeds configured. I did check rabbitmq queues after the above and there was nothing there. Should I have done something about the CSV feeds as well, as the delete option was greyed out?

@dominictory
Copy link
Author

Oddly, only 1 of 6 workers are currently connected according to the UI, yet CPU is maxed out due to worker.py (x6)

@SamuelHassine SamuelHassine added this to the Release 6.3.6 milestone Oct 14, 2024
@SamuelHassine SamuelHassine added solved use to identify issue that has been solved (must be linked to the solving PR) and removed needs more info Intel needed about the use case labels Oct 14, 2024
@dominictory
Copy link
Author

@richard-julien @nino-filigran @MaxwellDPS just wanted to update you, I thought 6.3.6 resolved my issue but the issue has returned. Massive CPU usage and, according to the UI, the workers are not even connected.

09:34:30 up 5 days, 1:37, 1 user, load average: 3057.61, 2718.51, 2652.28

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1262703 root      20   0 2719092   1.3g   5864 S 309.9   2.0   1270:53 python3 worker.py
1262485 root      20   0 3026020   1.5g   5864 S 271.3   2.4   1279:50 python3 worker.py
 546560 cplcadm+  20   0   87.7g  36.0g   3.8g S 238.6  57.2   5403:12 /usr/share/elasticsearch/jdk/+
1262702 root      20   0 3102504   1.6g   5856 S 230.0   2.5   1269:26 python3 worker.py
1258587 root      20   0 4316364   2.4g   5932 S 109.2   3.9   1566:36 python3 worker.py
1264420 root      20   0   21.5g 738464  18924 R 101.7   1.1   1071:40 node build/back.js
1263506 root      20   0   22.1g   1.4g  19112 R  97.7   2.2   1019:54 node build/back.js
1259811 root      20   0 3340292   1.7g   5936 S  76.9   2.7   1558:49 python3 worker.py

image

@richard-julien
Copy link
Member

Can you check the logs of the worker the consume a lot of CPU?

@richard-julien richard-julien self-assigned this Oct 16, 2024
@dominictory
Copy link
Author

Can you check the logs of the worker the consume a lot of CPU?

I have lots of logs for the worker(s). Let me know if you want me to dig out any in particular:

      1 "message": "('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))"
      1 "message": "AMQP connection workflow failed: AMQPConnectionWorkflowFailed: 1 exceptions in all; last exception - AMQPConnectorSocketConnectError: ConnectionResetError(104, 'Connection reset by peer'); first exception - None."
      1 "message": "AMQPConnectionWorkflow - reporting failure: AMQPConnectionWorkflowFailed: 1 exceptions in all; last exception - AMQPConnectorSocketConnectError: ConnectionResetError(104, 'Connection reset by peer'); first exception - None"
      1 "message": "AMQPConnector - reporting failure: AMQPConnectorSocketConnectError: ConnectionResetError(104, 'Connection reset by peer')"
      1 "message": "Connection workflow failed: AMQPConnectionWorkflowFailed: 1 exceptions in all; last exception - AMQPConnectorSocketConnectError: ConnectionResetError(104, 'Connection reset by peer'); first exception - None"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc5372d450>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc53751f90>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc53753390>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc537d4190>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc537d5bd0>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc537d74d0>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc55d65590>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc57a9fb10>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc5d454910>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc5d456210>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc622f2fd0>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc652dc2d0>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc6a354b90>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc6a5b4690>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc6a5b5e50>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc6a5b7ed0>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc6b64dbd0>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc6b64e210>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc6e035450>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc6e037250>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc74a7b390>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc77ac9810>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc7ab0f4d0>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc7ab98a50>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc7ab991d0>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc7ab99810>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc7b804050>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc7b804690>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc7c64f250>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc7fb6a850>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc7fb6bb10>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc865fa350>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc89b01310>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc89b2f250>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc8b0b8190>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc8b0bb4d0>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc913ed450>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc914482d0>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc9a1dde50>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc9a282210>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcca25082d0>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcca2508550>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcca250a490>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcca250aad0>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcca7aef9d0>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcca9219090>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcca92196d0>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fccae85b4d0>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fccbb7d7890>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fccbf0a2d50>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fccc1ff9f90>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fccc2f84690>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f230a182120>: Failed to establish a new connection: [Errno 111] Connection refused'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f23689c1d30>: Failed to establish a new connection: [Errno 111] Connection refused'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f3a8dd35d30>: Failed to establish a new connection: [Errno 111] Connection refused'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f9223075d30>: Failed to establish a new connection: [Errno 111] Connection refused'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f9c8136dd30>: Failed to establish a new connection: [Errno 111] Connection refused'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fbe26f55d30>: Failed to establish a new connection: [Errno 111] Connection refused'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7feca3dd9d30>: Failed to establish a new connection: [Errno 111] Connection refused'))"
      1 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7ff8f20cdd30>: Failed to establish a new connection: [Errno 111] Connection refused'))"
      1 "message": "Socket failed to connect: <socket.socket fd=751, family=2, type=1, proto=6, laddr=('172.29.0.22', 32964)>; error=104 (Connection reset by peer)"
      1 "message": "TCP Connection attempt failed: ConnectionResetError(104, 'Connection reset by peer'); dest=(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, 'rabbitmq', ('172.29.0.51', 5672))"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=116, family=2, type=1, proto=6, laddr=('172.29.0.22', 58676)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=121, family=2, type=1, proto=6, laddr=('172.29.0.22', 34248)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=146, family=2, type=1, proto=6, laddr=('172.29.0.22', 38798)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=1924, family=2, type=1, proto=6, laddr=('172.29.0.22', 40104)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=220, family=2, type=1, proto=6, laddr=('172.29.0.22', 34590)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=256, family=2, type=1, proto=6, laddr=('172.29.0.22', 50172)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=269, family=2, type=1, proto=6, laddr=('172.29.0.22', 57498)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=272, family=2, type=1, proto=6, laddr=('172.29.0.22', 43842)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=272, family=2, type=1, proto=6, laddr=('172.29.0.22', 58706)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=414, family=2, type=1, proto=6, laddr=('172.29.0.22', 49648)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=435, family=2, type=1, proto=6, laddr=('172.29.0.22', 42350)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=454, family=2, type=1, proto=6, laddr=('172.29.0.22', 49686)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=46, family=2, type=1, proto=6, laddr=('172.29.0.22', 43986)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=475, family=2, type=1, proto=6, laddr=('172.29.0.22', 42374)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=489, family=2, type=1, proto=6, laddr=('172.29.0.22', 40636)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=509, family=2, type=1, proto=6, laddr=('172.29.0.22', 48774)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=527, family=2, type=1, proto=6, laddr=('172.29.0.22', 33268)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=615, family=2, type=1, proto=6, laddr=('172.29.0.22', 59710)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=634, family=2, type=1, proto=6, laddr=('172.29.0.22', 60384)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=647, family=2, type=1, proto=6, laddr=('172.29.0.22', 57594)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=71, family=2, type=1, proto=6, laddr=('172.29.0.22', 54652)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=718, family=2, type=1, proto=6, laddr=('172.29.0.22', 40440)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=796, family=2, type=1, proto=6, laddr=('172.29.0.22', 50578)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=812, family=2, type=1, proto=6, laddr=('172.29.0.22', 50038)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=876, family=2, type=1, proto=6, laddr=('172.29.0.22', 38786)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=91, family=2, type=1, proto=6, laddr=('172.29.0.22', 55522)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=956, family=2, type=1, proto=6, laddr=('172.29.0.22', 42598)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._consume() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=981, family=2, type=1, proto=6, laddr=('172.29.0.22', 42592)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=1014, family=2, type=1, proto=6, laddr=('172.29.0.22', 53896)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=105, family=2, type=1, proto=6, laddr=('172.29.0.22', 38404)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=108, family=2, type=1, proto=6, laddr=('172.29.0.22', 54662)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=109, family=2, type=1, proto=6, laddr=('172.29.0.22', 53378)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=1127, family=2, type=1, proto=6, laddr=('172.29.0.22', 56856)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=1175, family=2, type=1, proto=6, laddr=('172.29.0.22', 42082)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=122, family=2, type=1, proto=6, laddr=('172.29.0.22', 57462)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=128, family=2, type=1, proto=6, laddr=('172.29.0.22', 43098)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=134, family=2, type=1, proto=6, laddr=('172.29.0.22', 58684)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=135, family=2, type=1, proto=6, laddr=('172.29.0.22', 60890)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=1359, family=2, type=1, proto=6, laddr=('172.29.0.22', 41248)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=139, family=2, type=1, proto=6, laddr=('172.29.0.22', 34254)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=143, family=2, type=1, proto=6, laddr=('172.29.0.22', 53170)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=1438, family=2, type=1, proto=6, laddr=('172.29.0.22', 42548)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=150, family=2, type=1, proto=6, laddr=('172.29.0.22', 60900)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=158, family=2, type=1, proto=6, laddr=('172.29.0.22', 34180)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=163, family=2, type=1, proto=6, laddr=('172.29.0.22', 34270)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=166, family=2, type=1, proto=6, laddr=('172.29.0.22', 53394)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=167, family=2, type=1, proto=6, laddr=('172.29.0.22', 57478)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=169, family=2, type=1, proto=6, laddr=('172.29.0.22', 34202)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=175, family=2, type=1, proto=6, laddr=('172.29.0.22', 57758)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=1815, family=2, type=1, proto=6, laddr=('172.29.0.22', 52904)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=184, family=2, type=1, proto=6, laddr=('172.29.0.22', 41572)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=191, family=2, type=1, proto=6, laddr=('172.29.0.22', 53402)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=195, family=2, type=1, proto=6, laddr=('172.29.0.22', 55536)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=199, family=2, type=1, proto=6, laddr=('172.29.0.22', 53138)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=205, family=2, type=1, proto=6, laddr=('172.29.0.22', 34566)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=209, family=2, type=1, proto=6, laddr=('172.29.0.22', 35424)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=210, family=2, type=1, proto=6, laddr=('172.29.0.22', 34574)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=215, family=2, type=1, proto=6, laddr=('172.29.0.22', 41578)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=224, family=2, type=1, proto=6, laddr=('172.29.0.22', 53144)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=225, family=2, type=1, proto=6, laddr=('172.29.0.22', 57554)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=230, family=2, type=1, proto=6, laddr=('172.29.0.22', 57568)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=235, family=2, type=1, proto=6, laddr=('172.29.0.22', 57582)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=251, family=2, type=1, proto=6, laddr=('172.29.0.22', 50156)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=257, family=2, type=1, proto=6, laddr=('172.29.0.22', 47624)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=264, family=2, type=1, proto=6, laddr=('172.29.0.22', 45344)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=266, family=2, type=1, proto=6, laddr=('172.29.0.22', 43834)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=272, family=2, type=1, proto=6, laddr=('172.29.0.22', 53160)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=278, family=2, type=1, proto=6, laddr=('172.29.0.22', 45348)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=281, family=2, type=1, proto=6, laddr=('172.29.0.22', 53168)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=283, family=2, type=1, proto=6, laddr=('172.29.0.22', 35816)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=288, family=2, type=1, proto=6, laddr=('172.29.0.22', 35818)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=294, family=2, type=1, proto=6, laddr=('172.29.0.22', 59364)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=297, family=2, type=1, proto=6, laddr=('172.29.0.22', 45354)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=304, family=2, type=1, proto=6, laddr=('172.29.0.22', 58718)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=334, family=2, type=1, proto=6, laddr=('172.29.0.22', 54018)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=336, family=2, type=1, proto=6, laddr=('172.29.0.22', 53188)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=345, family=2, type=1, proto=6, laddr=('172.29.0.22', 55108)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=349, family=2, type=1, proto=6, laddr=('172.29.0.22', 58724)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=354, family=2, type=1, proto=6, laddr=('172.29.0.22', 50776)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=365, family=2, type=1, proto=6, laddr=('172.29.0.22', 38808)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=366, family=2, type=1, proto=6, laddr=('172.29.0.22', 41582)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=369, family=2, type=1, proto=6, laddr=('172.29.0.22', 48734)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=38, family=2, type=1, proto=6, laddr=('172.29.0.22', 50608)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=389, family=2, type=1, proto=6, laddr=('172.29.0.22', 55788)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=390, family=2, type=1, proto=6, laddr=('172.29.0.22', 60382)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=412, family=2, type=1, proto=6, laddr=('172.29.0.22', 34832)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=417, family=2, type=1, proto=6, laddr=('172.29.0.22', 44208)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=43, family=2, type=1, proto=6, laddr=('172.29.0.22', 50624)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=43, family=2, type=1, proto=6, laddr=('172.29.0.22', 55502)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=444, family=2, type=1, proto=6, laddr=('172.29.0.22', 59310)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=446, family=2, type=1, proto=6, laddr=('172.29.0.22', 44212)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=446, family=2, type=1, proto=6, laddr=('172.29.0.22', 56426)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=458, family=2, type=1, proto=6, laddr=('172.29.0.22', 35442)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=464, family=2, type=1, proto=6, laddr=('172.29.0.22', 34842)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=470, family=2, type=1, proto=6, laddr=('172.29.0.22', 49700)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=476, family=2, type=1, proto=6, laddr=('172.29.0.22', 48754)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=478, family=2, type=1, proto=6, laddr=('172.29.0.22', 49708)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=480, family=2, type=1, proto=6, laddr=('172.29.0.22', 49862)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=494, family=2, type=1, proto=6, laddr=('172.29.0.22', 42390)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=503, family=2, type=1, proto=6, laddr=('172.29.0.22', 59854)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=52, family=2, type=1, proto=6, laddr=('172.29.0.22', 34234)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=526, family=2, type=1, proto=6, laddr=('172.29.0.22', 56442)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=529, family=2, type=1, proto=6, laddr=('172.29.0.22', 50166)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=531, family=2, type=1, proto=6, laddr=('172.29.0.22', 42402)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=532, family=2, type=1, proto=6, laddr=('172.29.0.22', 35284)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=541, family=2, type=1, proto=6, laddr=('172.29.0.22', 56452)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=543, family=2, type=1, proto=6, laddr=('172.29.0.22', 57564)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=55, family=2, type=1, proto=6, laddr=('172.29.0.22', 57444)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=554, family=2, type=1, proto=6, laddr=('172.29.0.22', 38616)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=572, family=2, type=1, proto=6, laddr=('172.29.0.22', 42404)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=575, family=2, type=1, proto=6, laddr=('172.29.0.22', 37234)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=582, family=2, type=1, proto=6, laddr=('172.29.0.22', 56058)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=590, family=2, type=1, proto=6, laddr=('172.29.0.22', 42390)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=596, family=2, type=1, proto=6, laddr=('172.29.0.22', 48788)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=607, family=2, type=1, proto=6, laddr=('172.29.0.22', 42420)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=62, family=2, type=1, proto=6, laddr=('172.29.0.22', 34240)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=645, family=2, type=1, proto=6, laddr=('172.29.0.22', 42430)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=65, family=2, type=1, proto=6, laddr=('172.29.0.22', 38328)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=662, family=2, type=1, proto=6, laddr=('172.29.0.22', 34850)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=670, family=2, type=1, proto=6, laddr=('172.29.0.22', 42446)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=677, family=2, type=1, proto=6, laddr=('172.29.0.22', 58062)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=68, family=2, type=1, proto=6, laddr=('172.29.0.22', 43044)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=689, family=2, type=1, proto=6, laddr=('172.29.0.22', 53388)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=71, family=2, type=1, proto=6, laddr=('172.29.0.22', 55520)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=73, family=2, type=1, proto=6, laddr=('172.29.0.22', 57744)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=731, family=2, type=1, proto=6, laddr=('172.29.0.22', 34064)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=745, family=2, type=1, proto=6, laddr=('172.29.0.22', 60286)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=75, family=2, type=1, proto=6, laddr=('172.29.0.22', 38348)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=798, family=2, type=1, proto=6, laddr=('172.29.0.22', 42460)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=891, family=2, type=1, proto=6, laddr=('172.29.0.22', 52472)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      1 "message": "_AsyncBaseTransport._produce() failed, aborting connection: error=ConnectionResetError(104, 'Connection reset by peer'); sock=<socket.socket fd=95, family=2, type=1, proto=6, laddr=('172.29.0.22', 38386)>; Caller's stack:\nTraceback (most recent call last):\n  File \"
      2 "message": "('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))"
      2 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fcc7fb68690>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      2 "message": "HTTPSConnectionPool(host='opencti-data', port=4434): Max retries exceeded with url: /graphql (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fccae85bb10>, 'Connection to opencti-data timed out. (connect timeout=300)'))"
      3 "message": "Creating Label"
      4 "message": "AMQP connection workflow failed: AMQPConnectionWorkflowFailed: 1 exceptions in all; last exception - AMQPConnectorAMQPHandshakeError: AMQPConnectorStackTimeout(\"
      4 "message": "AMQP connection workflow failed: AMQPConnectionWorkflowFailed: 1 exceptions in all; last exception - AMQPConnectorAMQPHandshakeError: IncompatibleProtocolError: The protocol returned by the server is not supported: ('StreamLostError: (\"
      4 "message": "AMQPConnectionWorkflow - reporting failure: AMQPConnectionWorkflowFailed: 1 exceptions in all; last exception - AMQPConnectorAMQPHandshakeError: AMQPConnectorStackTimeout(\"
      4 "message": "AMQPConnectionWorkflow - reporting failure: AMQPConnectionWorkflowFailed: 1 exceptions in all; last exception - AMQPConnectorAMQPHandshakeError: IncompatibleProtocolError: The protocol returned by the server is not supported: ('StreamLostError: (\"
      4 "message": "AMQPConnector - reporting failure: AMQPConnectorAMQPHandshakeError: AMQPConnectorStackTimeout(\"
      4 "message": "AMQPConnector - reporting failure: AMQPConnectorAMQPHandshakeError: IncompatibleProtocolError: The protocol returned by the server is not supported: ('StreamLostError: (\"
      4 "message": "AMQPConnectorStackTimeout"
      4 "message": "Connection workflow failed: AMQPConnectionWorkflowFailed: 1 exceptions in all; last exception - AMQPConnectorAMQPHandshakeError: AMQPConnectorStackTimeout(\"
      4 "message": "Connection workflow failed: AMQPConnectionWorkflowFailed: 1 exceptions in all; last exception - AMQPConnectorAMQPHandshakeError: IncompatibleProtocolError: The protocol returned by the server is not supported: ('StreamLostError: (\"
      4 "message": "IncompatibleProtocolError"
      4 "message": "Probably incompatible Protocol Versions"
      4 "message": "Timeout while setting up AMQP to 'rabbitmq'/(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, 'rabbitmq', ('172.29.0.51', 5672)); ssl=False"
      5 "message": "AMQP connection workflow failed: AMQPConnectionWorkflowFailed: 1 exceptions in all; last exception - AMQPConnectorAMQPHandshakeError: ProbableAccessDeniedError: Client was disconnected at a connection stage indicating a probable denial of access to the specified virtual host: ('StreamLostError: (\"
      5 "message": "AMQPConnectionWorkflow - reporting failure: AMQPConnectionWorkflowFailed: 1 exceptions in all; last exception - AMQPConnectorAMQPHandshakeError: ProbableAccessDeniedError: Client was disconnected at a connection stage indicating a probable denial of access to the specified virtual host: ('StreamLostError: (\"
      5 "message": "AMQPConnector - reporting failure: AMQPConnectorAMQPHandshakeError: ProbableAccessDeniedError: Client was disconnected at a connection stage indicating a probable denial of access to the specified virtual host: ('StreamLostError: (\"
      5 "message": "Connection closed while tuning the connection indicating a probable permission error when accessing a virtual host"
      5 "message": "Connection workflow failed: AMQPConnectionWorkflowFailed: 1 exceptions in all; last exception - AMQPConnectorAMQPHandshakeError: ProbableAccessDeniedError: Client was disconnected at a connection stage indicating a probable denial of access to the specified virtual host: ('StreamLostError: (\"
      5 "message": "ProbableAccessDeniedError"
      5 "message": "Update action expectations"
      6 "message": "AMQP connection workflow failed: AMQPConnectionWorkflowFailed: 1 exceptions in all; last exception - AMQPConnectorAMQPHandshakeError: ProbableAuthenticationError: Client was disconnected at a connection stage indicating a probable authentication error: ('StreamLostError: (\"
      6 "message": "AMQPConnectionWorkflow - reporting failure: AMQPConnectionWorkflowFailed: 1 exceptions in all; last exception - AMQPConnectorAMQPHandshakeError: ProbableAuthenticationError: Client was disconnected at a connection stage indicating a probable authentication error: ('StreamLostError: (\"
      6 "message": "AMQPConnector - reporting failure: AMQPConnectorAMQPHandshakeError: ProbableAuthenticationError: Client was disconnected at a connection stage indicating a probable authentication error: ('StreamLostError: (\"
      6 "message": "Connection closed while authenticating indicating a probable authentication error"
      6 "message": "Connection workflow failed: AMQPConnectionWorkflowFailed: 1 exceptions in all; last exception - AMQPConnectorAMQPHandshakeError: ProbableAuthenticationError: Client was disconnected at a connection stage indicating a probable authentication error: ('StreamLostError: (\"
      6 "message": "ProbableAuthenticationError"
      7 "message": "AMQP connection workflow failed: AMQPConnectionWorkflowFailed: 1 exceptions in all; last exception - AMQPConnectorSocketConnectError: TimeoutError(\"
      7 "message": "AMQPConnectionWorkflow - reporting failure: AMQPConnectionWorkflowFailed: 1 exceptions in all; last exception - AMQPConnectorSocketConnectError: TimeoutError(\"
      7 "message": "AMQPConnector - reporting failure: AMQPConnectorSocketConnectError: TimeoutError(\"
      7 "message": "Connection workflow failed: AMQPConnectionWorkflowFailed: 1 exceptions in all; last exception - AMQPConnectorSocketConnectError: TimeoutError(\"
      8 "message": "AMQPConnectionError"
     18 "message": "ConnectTimeout"
     27 "message": "Error in _create_connection()."
     32 "message": "Creating Intrusion-Set"
     33 "message": "Listing Vocabularies with filters"
     76 "message": "ValueError"
     93 "message": "Creating Malware"
     93 "message": "Uploading a file in Stix-Domain-Object"
     96 "message": "Unexpected connection close detected: AMQPHeartbeatTimeout: ('No activity or too many missed heartbeats in the last 60 seconds',)"
     96 "message": "Uploading a file in Stix-Cyber-Observable"
    104 "message": "Cannot generate external reference"
    116 "message": "Unexpected connection close detected: StreamLostError: (\"
    131 "message": "Error executing import"
    131 "message": "connection_lost: StreamLostError: (\"
    151 "message": "Creating Malware analysis"
    159 "message": "Cannot report expectation"
    189 "message": "Creating Report"
    200 "message": "Getting connectors ..."
    238 "message": "Message reprocess for lock rejection"
    248 "message": "A connection error occurred"
    248 "message": "Message reprocess for request timed out"
    333 "message": "Message reprocess for missing reference"
    337 "message": "A connection error or timeout occurred"
    408 "message": "Creating stix_observable_relationship"
    436 "message": "Creating Attack-Pattern"
    436 "message": "Creating Note"
   1006 "message": "Listing Labels with filters"
   1010 "message": "Creating Location"
   1048 "message": "Thread for queue terminated"
   1096 "message": "Creating Vulnerability"
   1276 "message": "Thread for queue started"
   1303 "message": "Starting PingAlive thread"
   1309 "message": "Thread for queue not alive, creating a new one..."
   1370 "message": "Creating Identity"
   1386 "message": "Health check (platform version)..."
   5751 "message": "Creating External Reference"
   8494 "message": "Creating Indicator"
  14169 "message": "Importing an object"
  19498 "message": "Creating stix_core_relationship"
  33094 "message": "Creating Stix-Cyber-Observable"
  65587 "message": "Message acknowledged"
  65588 "message": "Message processed, thread terminated"
  65638 "message": "Report expectation"
  65652 "message": "Processing a new message, launching a thread..."
  85773 "message": "Error pinging the API"

@richard-julien
Copy link
Member

Looks like you have a lot of problem connecting the worker to the rabbitmq.

Like

  • Timeout while setting up AMQP to 'rabbitmq'/(<AddressFamily.AF_INET: 2>,
  • AMQPConnectorAMQPHandshakeError: IncompatibleProtocolError: The protocol returned by the server is not supported

Its really difficult to know whats going on here as looks like a connectivity issue with the rabbitmq.
We I think doesnt manage well this kind of failure and the result is a high CPU consumption in the worker.
I will take a look and try to reproduce to limit the impact but the only thing I will be able to do is to prevent a massive cpu usage but it will not resolve the problem of your worker not able to process messages.

@SamuelHassine SamuelHassine removed the solved use to identify issue that has been solved (must be linked to the solving PR) label Oct 17, 2024
@dominictory
Copy link
Author

Looks like you have a lot of problem connecting the worker to the rabbitmq.

Like

  • Timeout while setting up AMQP to 'rabbitmq'/(<AddressFamily.AF_INET: 2>,
  • AMQPConnectorAMQPHandshakeError: IncompatibleProtocolError: The protocol returned by the server is not supported

Its really difficult to know whats going on here as looks like a connectivity issue with the rabbitmq. We I think doesnt manage well this kind of failure and the result is a high CPU consumption in the worker. I will take a look and try to reproduce to limit the impact but the only thing I will be able to do is to prevent a massive cpu usage but it will not resolve the problem of your worker not able to process messages.

Thanks Julien. A small development; I disabled TLS for the ingestion platforms, which has meant that workers have stayed connected, and CPU usage has been acceptable. That said, ingestion did eventually become very slow. Now, node build/back.js seems to be using the most CPU, but as I say load average is acceptable. I noticed that connectors switch between active/inactive, and I thought this might be as server capacity was maxed out, however it doesn't seem so. There are moments where bundles processed/sec goes up to 3-4, but then it comes back down quickly. The Redis stream seems to be relatively slow, and I have trimming set to default currently. There is 5G/64G memory available on the server, and all 8G of swap was used, so I'm looking at reducing memory footprint for Elastic slightly as maybe Redis needs more memory.

Server capacity
480.13 / 500 Mo

10:20:40 up 6 days, 2:24, 1 user, load average: 10.55, 11.15, 11.05

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
4166064 root      20   0   28.8g   8.0g  17544 R 350.3  12.8 696:05.02 node build/back.js
 546560 cplcadm+  20   0   88.0g  33.2g   2.0g S 342.8  52.7  11904:15 /usr/share/elasticsearch/jdk/bin/java -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -Djava.security.manager+
3215186 root      20   0   27.5g   6.7g  16476 R 104.6  10.6   1687:32 node build/back.js
1251916 root      20   0   21.2g 416264  15252 S  59.9   0.6 208:12.56 node build/back.js
3184091 root      20   0 1601532 305824   3756 S  12.5   0.5 129:10.09 python3 worker.py
3185627 root      20   0 1202880 243116   3836 S  10.9   0.4 104:41.63 python3 worker.py
3186390 root      20   0 1164204 236864   3780 S   9.9   0.4  99:19.20 python3 worker.py
3201469 root      20   0 1155816 229244   3840 S   9.9   0.3 100:08.60 python3 worker.py
 507167 999       20   0 5317756 916692  15224 S   9.5   1.4 343:46.36 /opt/erlang/lib/erlang/erts-14.2.5.4/bin/beam.smp -W w -MBas ageffcbf -MHas ageffcbf -MBlmbcs 512 -MHlmbcs 512 -MMmcs 30 -pc unicode -P +
3183890 root      20   0 1549960 289752   3792 S   9.5   0.4 126:30.81 python3 worker.py
3183990 root      20   0 1543452 269284   3792 S   7.2   0.4 127:50.18 python3 worker.py

image
image

@richard-julien
Copy link
Member

Can you try to down the number of worker to 1 and check the CPU usage of the node?
If the CPU usage back to standard maybe for the number of queues and workers you want to have you need to go to a cluster deployment with multiple instances of opencti to support the number of workers you have. (https://docs.opencti.io/latest/deployment/clustering/)

@dominictory
Copy link
Author

Can you try to down the number of worker to 1 and check the CPU usage of the node? If the CPU usage back to standard maybe for the number of queues and workers you want to have you need to go to a cluster deployment with multiple instances of opencti to support the number of workers you have. (https://docs.opencti.io/latest/deployment/clustering/)

You're probably right about clustering. I have so far only done this with the platform (1x frontend, 2x ingestion). Like I say though, CPU usage is currently at an acceptable level. I will try with 1 or 2 less workers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug use for describing something not working as expected
Projects
None yet
Development

No branches or pull requests

5 participants