occassionally freezes the whole system #5

allergicapple · 2024-05-05T19:18:02Z

This service happens to freeze the whole system for several seconds, with few seconds in-between before freezing everything up again.

The time between freezes can be used to switch to a tty console and reboot the system in the hope the next time it won't freeze again.

Of course this is not acceptable so I stopped and disabled the uksmd service for good.

There are entries in the systemd journal which repeat:

$ sudo journalctl

[...]
May 05 19:25:58 cachyos systemd[1]: uksmd.service: Watchdog timeout (limit 30s)!
May 05 19:25:58 cachyos systemd[1]: uksmd.service: Killing process 829 (uksmd) with signal SIGABRT.
May 05 19:25:58 cachyos systemd[1]: uksmd.service: Main process exited, code=killed, status=6/ABRT
May 05 19:25:58 cachyos systemd[1]: uksmd.service: Failed with result 'watchdog'.
May 05 19:25:58 cachyos systemd[1]: uksmd.service: Scheduled restart job, restart counter is at 1.
May 05 19:25:58 cachyos systemd[1]: Starting Userspace KSM helper daemon...

ptr1337 · 2024-05-05T19:39:23Z

This service happens to freeze the whole system for several seconds, with few seconds in-between before freezing everything up again.

The time between freezes can be used to switch to a tty console and reboot the system in the hope the next time it won't freeze again.

Of course this is not acceptable so I stopped and disabled the uksmd service for good.

There are entries in the systemd journal which repeat:
$ sudo journalctl

[...]
May 05 19:25:58 cachyos systemd[1]: uksmd.service: Watchdog timeout (limit 30s)!
May 05 19:25:58 cachyos systemd[1]: uksmd.service: Killing process 829 (uksmd) with signal SIGABRT.
May 05 19:25:58 cachyos systemd[1]: uksmd.service: Main process exited, code=killed, status=6/ABRT
May 05 19:25:58 cachyos systemd[1]: uksmd.service: Failed with result 'watchdog'.
May 05 19:25:58 cachyos systemd[1]: uksmd.service: Scheduled restart job, restart counter is at 1.
May 05 19:25:58 cachyos systemd[1]: Starting Userspace KSM helper daemon...

This is really weird and wondering me. Could you maybe share more informations how this happend and also on which hardware?

@pfactum Do you have a idea how to debug this?

allergicapple · 2024-05-05T20:07:23Z

Yes, it's a bit strange. I am not sure but I think this happenes since a few weeks and maybe once or twice a week, very unpredictable.

pfactum · 2024-05-05T20:53:01Z

Watchdog timeout means uksmd was not able to inform systemd that it's alive (https://codeberg.org/pf-kernel/uksmd/src/commit/ec2bfd88585d7b900baaaede1f57566c95e8c506/uksmd.c#L420), and systemd kills it forcibly.

Normally uksmd should send pings every 15 seconds (https://codeberg.org/pf-kernel/uksmd/src/commit/ec2bfd88585d7b900baaaede1f57566c95e8c506/uksmd.c#L501). If it doesn't, it either doesn't get scheduled, or it is stuck somewhere, maybe while traversing /proc.

Should this re-occur, build uksmd with debug symbols and get a coredump before systemd kills it again. Or at least collect /proc/<uksmd_PID>/stack. Or check strace.

The service itself should not cause system freezes. It's rather something is going on on the kernel side. For that, at least check for blocked tasks (echo w | sudo tee /proc/sysrq-trigger, and then dmesg/journalctl -kb), and maybe perf top.

allergicapple · 2024-05-06T06:11:00Z

Thanks for joining in,

like I said, the whole system locks up. When the watchdog barks, the service is killed and it becomes responsive again, until the service is restarted, then the cycle repeats.
That's what I interpret the situation as.
For the period the system is locked up, no interaction is possible, not even Num Lock reacts.
Can something be analyzed after the fact?

pfactum · 2024-05-06T06:24:07Z

I don't think so, but you can also try to collect a vmcore via kdump.

allergicapple · 2024-05-06T09:47:36Z

I'd advocate for closing this issue. I have no problem with not using uksmd and it seems to ponly affect my setup.
If it appears for someone else, we have this ticket for reference.

ventureoo mentioned this issue Jun 13, 2024

Replace uksmd with built-in KSM support in systemd CachyOS/CachyOS-Settings#64

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

occassionally freezes the whole system #5

occassionally freezes the whole system #5

allergicapple commented May 5, 2024

ptr1337 commented May 5, 2024

allergicapple commented May 5, 2024

pfactum commented May 5, 2024

allergicapple commented May 6, 2024

pfactum commented May 6, 2024

allergicapple commented May 6, 2024

occassionally freezes the whole system #5

occassionally freezes the whole system #5

Comments

allergicapple commented May 5, 2024

ptr1337 commented May 5, 2024

allergicapple commented May 5, 2024

pfactum commented May 5, 2024

allergicapple commented May 6, 2024

pfactum commented May 6, 2024

allergicapple commented May 6, 2024