Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suricata service can be stuck for hours if suricata didn't start #217

Open
kam193 opened this issue Apr 22, 2024 · 5 comments
Open

Suricata service can be stuck for hours if suricata didn't start #217

kam193 opened this issue Apr 22, 2024 · 5 comments
Assignees
Labels
assess We still haven't decided if this will be worked on or not bug Something isn't working

Comments

@kam193
Copy link

kam193 commented Apr 22, 2024

Describe the bug
Recently, I've recognized a few times that Suricata service stopped processing files. After analysing what's going on, I've found that Suricata is not running, and the service seems to have no timeout for the initialization. I'm not sure why Suricata isn't running, I haven't found any tracks in logs, neither from the service nor the Suricata inside the container. Restarting container is enough to repair it.

The behaviour of the service looks a little unhealthy as it's trying to reach Suricata in an infinite loop, throwing RecoverableError what prevents scaler from stepping in and recreating the service.

To Reproduce
Steps to reproduce the behavior:

Unfortunately, I don't know an easy way. I assume killing the Suricata process or modifying the service not to start Suricata at all could reproduce the behaviour.

Expected behavior
I would like the service to have a timeout / limit for Suricata to start, and just give up throwing a normal error. Using RecoverableError for the time of the initialization is perfectly fine, but should be stopped after some time.

Screenshots
If applicable, add screenshots to help explain your problem.

Environment (please complete the following information if pertinent):

  • Assemblyline Version: 4.5.0.18, Suricata service 4.5.0.5

Additional context
The service container has logs like below. Note the time - between the start and last sample are two hours of throwing just RecoverableError.

{"@timestamp": "2024-04-22 14:30:04,319", "event": { "module": "assemblyline", "dataset": "assemblyline.service.suricata" }, "host": { "ip": "x.x.x.x", "hostname": "f2680f09c0a4" }, "log": { "level": "INFO", "logger": "assemblyline.service.suricata" }, "process": { "pid": "1" }, "message": "Launching Suricata: /usr/local/bin/suricata -vvvv -c /etc/suricata/suricata.yaml --unix-socket=/var/run/suricata/suricata.socket --pidfile /var/run/suricata/suricata.pid --set logging.outputs.1.file.filename=/var/log/suricata/suricata.log"}
{"@timestamp": "2024-04-22 14:30:04,320", "event": { "module": "assemblyline", "dataset": "assemblyline.service.suricata" }, "host": { "ip": "x.x.x.x", "hostname": "f2680f09c0a4" }, "log": { "level": "INFO", "logger": "assemblyline.service.suricata" }, "process": { "pid": "1" }, "message": "Suricata not started yet: [Errno 2] No such file or directory"}
...
{"@timestamp": "2024-04-22 14:30:18,322", "event": { "module": "assemblyline", "dataset": "assemblyline.service.suricata" }, "host": { "ip": "x.x.x.x", "hostname": "f2680f09c0a4" }, "log": { "level": "INFO", "logger": "assemblyline.service.suricata" }, "process": { "pid": "1" }, "message": "Ruleset 0: 71348 rules loaded"}
{"@timestamp": "2024-04-22 14:30:18,322", "event": { "module": "assemblyline", "dataset": "assemblyline.service.suricata" }, "host": { "ip": "x.x.x.x", "hostname": "f2680f09c0a4" }, "log": { "level": "WARNING", "logger": "assemblyline.service.suricata" }, "process": { "pid": "1" }, "message": "Ruleset 0: 3458 rules failed to load.This can be due to duplication of rules among muliple rulesets being loaded."}
{"@timestamp": "2024-04-22 14:30:18,323", "event": { "module": "assemblyline", "dataset": "assemblyline.service.suricata" }, "host": { "ip": "x.x.x.x", "hostname": "f2680f09c0a4" }, "log": { "level": "INFO", "logger": "assemblyline.service.suricata" }, "process": { "pid": "1" }, "message": "Suricata started with service version: 4.5.0.5"}
{"@timestamp": "2024-04-22 14:30:18,353", "event": { "module": "assemblyline", "dataset": "assemblyline.odm" }, "host": { "ip": "x.x.x.x", "hostname": "f2680f09c0a4" }, "log": { "level": "WARNING", "logger": "assemblyline.odm" }, "process": { "pid": "1" }, "message": "The following parameters were ignored from object 'Service': monitored_keys"}
{"@timestamp": "2024-04-22 14:30:18,378", "event": { "module": "assemblyline", "dataset": "assemblyline.dispatching.client" }, "host": { "ip": "x.x.x.x", "hostname": "f2680f09c0a4" }, "log": { "level": "INFO", "logger": "assemblyline.dispatching.client" }, "process": { "pid": "1" }, "message": "[3BH0jBEinXRrBLTJlPFgnj/704e5e5b3234433c01fcfd1b20a306e77e985038120492dc53965c3edd38a4ea] Suricata:f2680f09c0a4 task found"}
{"@timestamp": "2024-04-22 14:30:18,396", "event": { "module": "assemblyline", "dataset": "assemblyline.service.suricata" }, "host": { "ip": "x.x.x.x", "hostname": "f2680f09c0a4" }, "log": { "level": "INFO", "logger": "assemblyline.service.suricata" }, "process": { "pid": "1" }, "message": "[3BH0jBEinXRrBLTJlPFgnj] New task received"}
{"@timestamp": "2024-04-22 14:30:18,396", "event": { "module": "assemblyline", "dataset": "assemblyline.service.suricata" }, "host": { "ip": "x.x.x.x", "hostname": "f2680f09c0a4" }, "log": { "level": "INFO", "logger": "assemblyline.service.suricata" }, "process": { "pid": "1" }, "message": "[3BH0jBEinXRrBLTJlPFgnj] Downloading file: 704e5e5b3234433c01fcfd1b20a306e77e985038120492dc53965c3edd38a4ea"}
{"@timestamp": "2024-04-22 14:30:18,409", "event": { "module": "assemblyline", "dataset": "assemblyline.service.suricata" }, "host": { "ip": "x.x.x.x", "hostname": "f2680f09c0a4" }, "log": { "level": "INFO", "logger": "assemblyline.service.suricata" }, "process": { "pid": "1" }, "message": "[3BH0jBEinXRrBLTJlPFgnj] Starting task for file: 704e5e5b3234433c01fcfd1b20a306e77e985038120492dc53965c3edd38a4ea (network/tcpdump)"}
{"@timestamp": "2024-04-22 14:31:19,037", "event": { "module": "assemblyline", "dataset": "assemblyline.service.suricata" }, "host": { "ip": "x.x.x.x", "hostname": "f2680f09c0a4" }, "log": { "level": "INFO", "logger": "assemblyline.service.suricata" }, "process": { "pid": "1" }, "message": "Recoverable Service Error (3BH0jBEinXRrBLTJlPFgnj/704e5e5b3234433c01fcfd1b20a306e77e985038120492dc53965c3edd38a4ea) [Errno 104] Connection reset by peer:   File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/assemblyline_v4_service/common/base.py\", line 181, in handle_task\n    self.execute(request)\n  File \"/opt/al_service/suricata_/suricata_.py\", line 512, in execute\n    raise RecoverableError(e)\nRecoverableError: [Errno 104] Connection reset by peer"}
{"@timestamp": "2024-04-22 14:31:19,038", "event": { "module": "assemblyline", "dataset": "assemblyline.service.suricata" }, "host": { "ip": "x.x.x.x", "hostname": "f2680f09c0a4" }, "log": { "level": "INFO", "logger": "assemblyline.service.suricata" }, "process": { "pid": "1" }, "message": "[3BH0jBEinXRrBLTJlPFgnj] Saving error to: /tmp/3BH0jBEinXRrBLTJlPFgnj_704e5e5b3234433c01fcfd1b20a306e77e985038120492dc53965c3edd38a4ea_error.json"}
{"@timestamp": "2024-04-22 14:31:19,042", "event": { "module": "assemblyline", "dataset": "assemblyline.service.suricata" }, "host": { "ip": "x.x.x.x", "hostname": "f2680f09c0a4" }, "log": { "level": "INFO", "logger": "assemblyline.service.suricata" }, "process": { "pid": "1" }, "message": "There are no new signatures. (1713789546.8030632 >= 1713789546.8030632)"}
{"@timestamp": "2024-04-22 14:31:19,042", "event": { "module": "assemblyline", "dataset": "assemblyline.service.suricata" }, "host": { "ip": "x.x.x.x", "hostname": "f2680f09c0a4" }, "log": { "level": "INFO", "logger": "assemblyline.service.suricata" }, "process": { "pid": "1" }, "message": "[3BH0jBEinXRrBLTJlPFgnj] Task completed with errors"}
{"@timestamp": "2024-04-22 14:31:19,042", "event": { "module": "assemblyline", "dataset": "assemblyline.tasking_client" }, "host": { "ip": "x.x.x.x", "hostname": "f2680f09c0a4" }, "log": { "level": "INFO", "logger": "assemblyline.tasking_client" }, "process": { "pid": "1" }, "message": "[3BH0jBEinXRrBLTJlPFgnj] f2680f09c0a4 - Suricata failed to complete task "}
...
{"@timestamp": "2024-04-22 14:33:13,160", "event": { "module": "assemblyline", "dataset": "assemblyline.service.suricata" }, "host": { "ip": "x.x.x.x", "hostname": "f2680f09c0a4" }, "log": { "level": "INFO", "logger": "assemblyline.service.suricata" }, "process": { "pid": "1" }, "message": "Suricata not started yet: [Errno 111] Connection refused"}
{"@timestamp": "2024-04-22 14:33:23,160", "event": { "module": "assemblyline", "dataset": "assemblyline.service.suricata" }, "host": { "ip": "x.x.x.x", "hostname": "f2680f09c0a4" }, "log": { "level": "INFO", "logger": "assemblyline.service.suricata" }, "process": { "pid": "1" }, "message": "Suricata not started yet: [Errno 111] Connection refused"}
{"@timestamp": "2024-04-22 14:33:23,161", "event": { "module": "assemblyline", "dataset": "assemblyline.service.suricata" }, "host": { "ip": "x.x.x.x", "hostname": "f2680f09c0a4" }, "log": { "level": "INFO", "logger": "assemblyline.service.suricata" }, "process": { "pid": "1" }, "message": "Recoverable Service Error (3BH0jBEinXRrBLTJlPFgnj/1ffd5e0894eef1ef357e5554ada335827398568b8287331221dff8abe4c42622) RetryError[Attempts: 15, Value: False]:   File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/assemblyline_v4_service/common/base.py\", line 181, in handle_task\n    self.execute(request)\n  File \"/opt/al_service/suricata_/suricata_.py\", line 476, in execute\n    self.start_suricata_if_necessary()\n  File \"/opt/al_service/suricata_/suricata_.py\", line 145, in start_suricata_if_necessary\n    raise RecoverableError(e)\nRecoverableError: RetryError[Attempts: 15, Value: False]"}
...
{"@timestamp": "2024-04-22 16:25:03,645", "event": { "module": "assemblyline", "dataset": "assemblyline.service.suricata" }, "host": { "ip": "x.x.x.x", "hostname": "f2680f09c0a4" }, "log": { "level": "INFO", "logger": "assemblyline.service.suricata" }, "process": { "pid": "1" }, "message": "Suricata not started yet: [Errno 111] Connection refused"}
{"@timestamp": "2024-04-22 16:25:03,645", "event": { "module": "assemblyline", "dataset": "assemblyline.service.suricata" }, "host": { "ip": "x.x.x.x", "hostname": "f2680f09c0a4" }, "log": { "level": "INFO", "logger": "assemblyline.service.suricata" }, "process": { "pid": "1" }, "message": "Recoverable Service Error (77rC62QgvSPJeHvVdIp6dY/3382b9a14870049b854563106c33caa3da560b215a60e175e0aedfeff27a562b) RetryError[Attempts: 15, Value: False]:   File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/assemblyline_v4_service/common/base.py\", line 181, in handle_task\n    self.execute(request)\n  File \"/opt/al_service/suricata_/suricata_.py\", line 476, in execute\n    self.start_suricata_if_necessary()\n  File \"/opt/al_service/suricata_/suricata_.py\", line 145, in start_suricata_if_necessary\n    raise RecoverableError(e)\nRecoverableError: RetryError[Attempts: 15, Value: False]"}
{"@timestamp": "2024-04-22 16:25:03,646", "event": { "module": "assemblyline", "dataset": "assemblyline.service.suricata" }, "host": { "ip": "x.x.x.x", "hostname": "f2680f09c0a4" }, "log": { "level": "INFO", "logger": "assemblyline.service.suricata" }, "process": { "pid": "1" }, "message": "[77rC62QgvSPJeHvVdIp6dY] Saving error to: /tmp/77rC62QgvSPJeHvVdIp6dY_3382b9a14870049b854563106c33caa3da560b215a60e175e0aedfeff27a562b_error.json"}

The /var/log/suricata/suricata.log is finished at 14:30 with a fail of loading some signatures, what isn't a reason to stop Suricata, and looks the same as when everything works.

@kam193 kam193 added assess We still haven't decided if this will be worked on or not bug Something isn't working labels Apr 22, 2024
@cccs-rs
Copy link
Contributor

cccs-rs commented May 28, 2024

Can you confirm if this is still happening on more recent versions of the Suricata service?

@kam193
Copy link
Author

kam193 commented May 28, 2024

I've just disabled my workaround (script looking for stuck Suricata services every 30 minutes, it was impossible to work without it) - I'll come back in a few days with results.

@kam193
Copy link
Author

kam193 commented May 30, 2024

I'm confirming this is still an issue - with Suricata 4.5.0.10 and AL v26

@cccs-rs
Copy link
Contributor

cccs-rs commented Jun 6, 2024

Want to try testing with the latest stable?

The service was updated to use Suricata 8.0-dev and a flag was added to the command to run as daemon which I'm hoping should resolve the issue mentioned (unsure if that flag was available in previous versions).

@kam193
Copy link
Author

kam193 commented Jun 6, 2024

So... I'm not really able to test it, as on my server the newest Suricata (service v16 or even v15) doesn't even start:

{"@timestamp": "2024-06-06 20:59:55,903", "event": { "module": "assemblyline", "dataset": "assemblyline.service.suricata" }, "host": { "ip": "x.x.x.x", "hostname": "6879c90012be" }, "log": { "level": "ERROR", "logger": "assemblyline.service.suricata" }, "process": { "pid": "1" }, "message": "Exiting:\nTraceback (most recent call last):\n  File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/assemblyline_core/server_base.py\", line 100, in run\n    self.try_run()\n  File \"/var/lib/assemblyline/.local/lib/python3.11/site-packages/assemblyline_v4_service/run_privileged_service.py\", line 148, in try_run\n    self.service_tool_version = self.service.get_tool_version()\n                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/opt/al_service/suricata_/suricata_.py\", line 114, in get_tool_version\n    return f\"{self.get_suricata_version()}.r{self.rules_hash}\"\n              ^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/opt/al_service/suricata_/suricata_.py\", line 107, in get_suricata_version\n    return safe_str(subprocess.check_output([\"suricata\", \"-V\"]).strip().replace(b\"This is Suricata version \", b\"\"))\n                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.11/subprocess.py\", line 466, in check_output\n    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,\n           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.11/subprocess.py\", line 571, in run\n    raise CalledProcessError(retcode, process.args,\nsubprocess.CalledProcessError: Command '['suricata', '-V']' died with <Signals.SIGILL: 4>.\n"}

What is caused by:

kernel: traps: suricata[2946516] trap invalid opcode ip:56509c8df61c sp:7ffd368f99e0 error:0 in suricata[56509c8c5000+5e1000]

Indicating incompatible binary instruction... Strange.

To be honest, even when I believe that looking for the root cause is important, my suggestion is to rather prepare for cases like surprisingly dead suricata process. When it's not the main process in the docker container, it could be killed for any reason (even manually - good for testing), and the container will live further. I'd add a check in the error handling to see if the process is still running or exited.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assess We still haven't decided if this will be worked on or not bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants