Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: SIGSEGV in parse_cache_info_linux if some cores are disabled #26889

Open
3 tasks done
vient opened this issue Oct 2, 2024 · 3 comments
Open
3 tasks done

[Bug]: SIGSEGV in parse_cache_info_linux if some cores are disabled #26889

vient opened this issue Oct 2, 2024 · 3 comments
Assignees
Labels
bug Something isn't working category: CPU OpenVINO CPU plugin support_request

Comments

@vient
Copy link

vient commented Oct 2, 2024

OpenVINO Version

2024.0

Operating System

Ubuntu 20.04 (LTS)

Device used for inference

CPU

Framework

None

Model used

No response

Issue description

Initializing OpenVINO on a specific setup fails with SIGSEGV in

operator() at src/inference/src/os/lin/lin_system_conf.cpp:408                                      [0x7f27ff4c2596      /lib/libopenvino.so.2400+0xac2596]
parse_cache_info_linux at src/inference/src/os/lin/lin_system_conf.cpp:503                          [0x7f27ff4bb02b      /lib/libopenvino.so.2400+0xabb02b]
CPU at src/inference/src/os/lin/lin_system_conf.cpp:200                                             [0x7f27ff4bb02b      /lib/libopenvino.so.2400+0xabb02b]
cpu_info at src/inference/src/system_conf.cpp:180                                                   [0x7f27ff4c4c96      /lib/libopenvino.so.2400+0xac4c96]
...

Stack trace from 2024.0 version, I don't see any signs that something has changed in latest version.

This happens because openvino uses availability of /sys/devices/system/cpu/cpu<N>/cache/index0/shared_cpu_list file as a sign that N+1 cores exist - if file does not exist, openvino assumes that there are N cores on machine. This may be not true if core is temporarily disabled via cpu<N>/online toggle - CPU N+1 may be available.
After that a neighbor list is read for each core, and all its neighbors are updated. If core N-1 has neighbor N+1, SEGFAULT occurs when the code tries to get info structure for core N+1 here because there are only N structures in array.

Step-by-step reproduction

  1. cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list: you'll see 0-3, for example
  2. Choose a core inside this range, not the min/max one: 1, for example
  3. Turn off this core: echo 0 | sudo tee /sys/devices/system/cpu/cpu1/online
  4. Call any function that initializes device info, for example Core::get_available_devices

Relevant log output

No response

Issue submission checklist

  • I'm reporting an issue. It's not a question.
  • I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
  • There is reproducer code and related data files such as images, videos, models, etc.
@vient vient added bug Something isn't working support_request labels Oct 2, 2024
@andrei-kochin andrei-kochin added the category: CPU OpenVINO CPU plugin label Oct 3, 2024
@wangleis
Copy link
Contributor

wangleis commented Oct 7, 2024

hi @vient, Thanks for your report. Support for closed core is not enabled yet. Ticket CVS-154222 is created to follow up.

@vient
Copy link
Author

vient commented Oct 15, 2024

FYI: caught the same problem in a bit different scenario: on machines with 256+ cores sometimes only 255 of them work because of x2APIC issues, like this one https://community.amd.com/t5/server-processors/dual-socket-epyc-7702-64-cores-shows-254-cpu-online-1-cpu/m-p/350409. Usually you get cores 0-254, with smpboot: native_cpu_up: bad cpu 255 in dmesg. Now, on one our server it is somehow core 239, not 255, which results in online cpu list 0-238,240-255 - a hole in cpu list triggering this bug.

github-merge-queue bot pushed a commit that referenced this issue Dec 18, 2024
### Details:
 - *support offline CPU in Linux*
 - *Ignore SOC Ecore of MTL*
 - *enable Ecore of LNL*

### Tickets:
 - *CVS-154222*
-
*[issues-26889](#26889
@wangleis
Copy link
Contributor

@vient Could you please try master branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working category: CPU OpenVINO CPU plugin support_request
Projects
None yet
Development

No branches or pull requests

4 participants