You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From time to time vdsm ends up in a deadlock being completely unresponsive in OST. The problem is observed on el8stream and it's always host-0 that is affected. While OST reports it as a 'test_use_ovn_provider' failure, a quick look at the 'vdsm.log' shows that the problem happens earlier - while in 'vdsm.log' we can see entries up until some point in time, 'messages' and other log files show that the host was up for about 8 more minutes.
After attaching to 'vdsm' process with gdb we can see all the threads waiting on some locks.
Even after analyzing a couple of such failures it's hard to pinpoint one specific thing that causes this problem, but the logs always end in storage parts of vdsm.
Version-Release number of selected component (if applicable):
Latest vdsm version.
How reproducible:
Rarely, ~1 in 10 runs.
Steps to Reproduce:
Run basic suite master on elstream
Check if it failed on 'test_use_ovn_provider'
Check if 'vdsm.log' entries end a couple of minutes earlier than those from i.e. 'messages'
Actual results:
vdsm ends up in a deadlock.
Expected results:
vdsm continues to operate normally
From time to time vdsm ends up in a deadlock being completely unresponsive in OST. The problem is observed on el8stream and it's always host-0 that is affected. While OST reports it as a 'test_use_ovn_provider' failure, a quick look at the 'vdsm.log' shows that the problem happens earlier - while in 'vdsm.log' we can see entries up until some point in time, 'messages' and other log files show that the host was up for about 8 more minutes.
After attaching to 'vdsm' process with gdb we can see all the threads waiting on some locks.
The deadlock timing always aligns with the SD detach/reattach tests:
https://github.com/oVirt/ovirt-system-tests/blob/master/basic-suite-master/test-scenarios/test_007_sd_reattach.py
Even after analyzing a couple of such failures it's hard to pinpoint one specific thing that causes this problem, but the logs always end in storage parts of vdsm.
Version-Release number of selected component (if applicable):
Latest vdsm version.
How reproducible:
Rarely, ~1 in 10 runs.
Steps to Reproduce:
Actual results:
vdsm ends up in a deadlock.
Expected results:
vdsm continues to operate normally
Original bz: https://bugzilla.redhat.com/show_bug.cgi?id=2111187
The text was updated successfully, but these errors were encountered: