You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Initializing OpenVINO on a specific setup fails with SIGSEGV in
operator() at src/inference/src/os/lin/lin_system_conf.cpp:408 [0x7f27ff4c2596 /lib/libopenvino.so.2400+0xac2596]
parse_cache_info_linux at src/inference/src/os/lin/lin_system_conf.cpp:503 [0x7f27ff4bb02b /lib/libopenvino.so.2400+0xabb02b]
CPU at src/inference/src/os/lin/lin_system_conf.cpp:200 [0x7f27ff4bb02b /lib/libopenvino.so.2400+0xabb02b]
cpu_info at src/inference/src/system_conf.cpp:180 [0x7f27ff4c4c96 /lib/libopenvino.so.2400+0xac4c96]
...
Stack trace from 2024.0 version, I don't see any signs that something has changed in latest version.
This happens because openvino uses availability of /sys/devices/system/cpu/cpu<N>/cache/index0/shared_cpu_list file as a sign that N+1 cores exist - if file does not exist, openvino assumes that there are N cores on machine. This may be not true if core is temporarily disabled via cpu<N>/online toggle - CPU N+1 may be available.
After that a neighbor list is read for each core, and all its neighbors are updated. If core N-1 has neighbor N+1, SEGFAULT occurs when the code tries to get info structure for core N+1 here because there are only N structures in array.
Step-by-step reproduction
cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list: you'll see 0-3, for example
Choose a core inside this range, not the min/max one: 1, for example
Turn off this core: echo 0 | sudo tee /sys/devices/system/cpu/cpu1/online
Call any function that initializes device info, for example Core::get_available_devices
Relevant log output
No response
Issue submission checklist
I'm reporting an issue. It's not a question.
I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
There is reproducer code and related data files such as images, videos, models, etc.
The text was updated successfully, but these errors were encountered:
FYI: caught the same problem in a bit different scenario: on machines with 256+ cores sometimes only 255 of them work because of x2APIC issues, like this one https://community.amd.com/t5/server-processors/dual-socket-epyc-7702-64-cores-shows-254-cpu-online-1-cpu/m-p/350409. Usually you get cores 0-254, with smpboot: native_cpu_up: bad cpu 255 in dmesg. Now, on one our server it is somehow core 239, not 255, which results in online cpu list 0-238,240-255 - a hole in cpu list triggering this bug.
OpenVINO Version
2024.0
Operating System
Ubuntu 20.04 (LTS)
Device used for inference
CPU
Framework
None
Model used
No response
Issue description
Initializing OpenVINO on a specific setup fails with SIGSEGV in
Stack trace from 2024.0 version, I don't see any signs that something has changed in latest version.
This happens because openvino uses availability of
/sys/devices/system/cpu/cpu<N>/cache/index0/shared_cpu_list
file as a sign that N+1 cores exist - if file does not exist, openvino assumes that there are N cores on machine. This may be not true if core is temporarily disabled viacpu<N>/online
toggle - CPU N+1 may be available.After that a neighbor list is read for each core, and all its neighbors are updated. If core N-1 has neighbor N+1, SEGFAULT occurs when the code tries to get info structure for core N+1 here because there are only N structures in array.
Step-by-step reproduction
cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list
: you'll see0-3
, for exampleecho 0 | sudo tee /sys/devices/system/cpu/cpu1/online
Core::get_available_devices
Relevant log output
No response
Issue submission checklist
The text was updated successfully, but these errors were encountered: