Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash at clGetDeviceIDs in OpenVINO 2022.3 #17854

Closed
venki-thiyag opened this issue Jun 2, 2023 · 30 comments
Closed

Crash at clGetDeviceIDs in OpenVINO 2022.3 #17854

venki-thiyag opened this issue Jun 2, 2023 · 30 comments
Assignees
Labels

Comments

@venki-thiyag
Copy link
Contributor

System information (version)
  • OpenVINO Source=> Custom built 2022.3 for Electron node addon
  • OpenVINO Version=> 2022.3
  • Operating System / Platform => Windows 10
  • Compiler => Visual Studio
  • Problem classification => Crash
  • Device use: => GPU and OpenCL
  • Framework => Electron
Detailed description

Crash callstack:

 	OpenCL.dll!00007ffefe564658()	Unknown
>	[Inline Frame] openvino_intel_gpu_plugin.dll!cl::Platform::getDevices(unsigned __int64) Line 2580	C++
 	openvino_intel_gpu_plugin.dll!cldnn::ocl::ocl_device_detector::create_device_list() Line 205	C++
 	openvino_intel_gpu_plugin.dll!cldnn::ocl::ocl_device_detector::get_available_devices(void * user_context, void * user_device, int ctx_device_id, int target_tile_id) Line 160	C++
 	openvino_intel_gpu_plugin.dll!cldnn::device_query::device_query(cldnn::engine_types engine_type, cldnn::runtime_types runtime_type, void * user_context, void * user_device, int ctx_device_id, int target_tile_id) Line 25	C++
 	openvino_intel_gpu_plugin.dll!ov::intel_gpu::Plugin::Plugin() Line 132	C++
 	[Inline Frame] openvino_intel_gpu_plugin.dll!std::_Construct_in_place(ov::intel_gpu::Plugin &) Line 158	C++
 	[Inline Frame] openvino_intel_gpu_plugin.dll!std::_Ref_count_obj2<ov::intel_gpu::Plugin>::{ctor}() Line 2029	C++
 	[Inline Frame] openvino_intel_gpu_plugin.dll!std::make_shared() Line 2748	C++
 	openvino_intel_gpu_plugin.dll!CreatePluginEngine(std::shared_ptr<InferenceEngine::IInferencePlugin> & plugin) Line 1011	C++
 	openvino.dll!ov::CoreImpl::GetCPPPluginByName(const std::string & pluginName) Line 1168	C++
 	openvino.dll!ov::CoreImpl::GetMetric(const std::string & deviceName, const std::string & name, const std::map<std::string,ov::Any,std::less<std::string>,std::allocator<std::pair<std::string const ,ov::Any>>> & options) Line 980	C++
 	openvino.dll!ov::CoreImpl::GetAvailableDevices() Line 1073	C++
 	openvino.dll!ov::Core::get_available_devices() Line 2120	C++
 	RCVNativeVBG.dll!VBG::OpenVinoProcessing::loadModel(const std::wstring & modelPath) Line 39	C++
 	[Inline Frame] RCVNativeVBG.dll!VBG::IVBGMLProcessing::loadBindModel(const std::wstring &) Line 164	C++
 	RCVNativeVBG.dll!initInternalOpenVino(VBG::ConfigParser * ptrConfigParser) Line 291	C++
 	[Inline Frame] RCVNativeVBG.dll!std::invoke(void(*)(VBG::ConfigParser *) &&) Line 1534	C++
 	RCVNativeVBG.dll!std::thread::_Invoke<std::tuple<void (__cdecl*)(VBG::ConfigParser *),VBG::ConfigParser *>,0,1>(void * _RawVals) Line 56	C++
 	ucrtbase.dll!00007fff76aa1bb2()	Unknown
 	kernel32.dll!00007fff78ac7034()	Unknown
 	ntdll.dll!00007fff78c02651()	Unknown

Crashing function inside OpenVINO, filename: opencl.hpp, line no: 2580

        **cl_int err = ::clGetDeviceIDs(object_, type, 0, NULL, &n);**
        if (err != CL_SUCCESS  && err != CL_DEVICE_NOT_FOUND) {
            return detail::errHandler(err, __GET_DEVICE_IDS_ERR);
        }

Attaching dxdiag of one the system where issue was observed, looks like issue is observed on all systems with OpenCL 2.2.8
DxDiag.txt

OpenVINO was custom compiled on 2022.3, this was done to use/consume from Electron application, there were some path issues which required some fixing.

Most likely issue seems to due to incompatible OpenCL version, where ion my system OpenCL version 3.0.3

@venki-thiyag venki-thiyag added bug Something isn't working support_request labels Jun 2, 2023
@venki-thiyag venki-thiyag changed the title Crash at clGetDeviceIDs in OpenVINO Crash at clGetDeviceIDs in OpenVINO 2022.3 Jun 2, 2023
@ilya-lavrenov ilya-lavrenov added the category: GPU OpenVINO GPU plugin label Jun 2, 2023
@p-durandin
Copy link
Contributor

@venki-thiyag please send detailed info about driver installed in system and results of hello_query_device.exe

@venki-thiyag
Copy link
Contributor Author

DxDiag (1).txt
attached DxDiag details from system. We need to reach out to customer for hello_query_device.exe, which will take some time

@venki-thiyag
Copy link
Contributor Author

hello_query_device.exe.2232.zip
When running hello_query_device on customer's machine it is crashing, crash dump is attached. Please note that this is custom build, below is the callstack using symbols:

 	OpenCL.dll!clGetDeviceIDs(_cl_platform_id * platform, unsigned __int64 device_type, unsigned int num_entries, _cl_device_id * * devices, unsigned int * num_devices) Line 99	C
 	[Inline Frame] openvino_intel_gpu_plugin.dll!cl::Platform::getDevices(unsigned __int64) Line 2580	C++
>	openvino_intel_gpu_plugin.dll!cldnn::ocl::ocl_device_detector::create_device_list() Line 205	C++
 	openvino_intel_gpu_plugin.dll!cldnn::ocl::ocl_device_detector::get_available_devices(void * user_context, void * user_device, int ctx_device_id, int target_tile_id) Line 160	C++
 	openvino_intel_gpu_plugin.dll!cldnn::device_query::device_query(cldnn::engine_types engine_type, cldnn::runtime_types runtime_type, void * user_context, void * user_device, int ctx_device_id, int target_tile_id) Line 25	C++
 	openvino_intel_gpu_plugin.dll!ov::intel_gpu::Plugin::Plugin() Line 132	C++
 	[Inline Frame] openvino_intel_gpu_plugin.dll!std::_Construct_in_place(ov::intel_gpu::Plugin &) Line 158	C++
 	[Inline Frame] openvino_intel_gpu_plugin.dll!std::_Ref_count_obj2<ov::intel_gpu::Plugin>::{ctor}() Line 2029	C++
 	[Inline Frame] openvino_intel_gpu_plugin.dll!std::make_shared() Line 2748	C++
 	openvino_intel_gpu_plugin.dll!CreatePluginEngine(std::shared_ptr<InferenceEngine::IInferencePlugin> & plugin) Line 1011	C++
 	[Frames may be missing, no binary loaded for openvino.dll]	
 	openvino.dll!00007ff9b1c27ef0()	Unknown

Crash location is same as the one that I reported.
Please suggest next steps, openvino using CPU works fine, but need to know on what basis we can skip GPU and use CPU.

@venki-thiyag
Copy link
Contributor Author

open_cl_crash.zip
Another crash in open_cl

@venki-thiyag
Copy link
Contributor Author

@vladimir-paramuzov @p-durandin
Any update on this issue?

@p-durandin
Copy link
Contributor

p-durandin commented Jun 6, 2023

Looks strange, do you have tested OpenVINO without modifications or on other machine? We have no meet similar problems on our TGL machines.
The additional test can be performed using clinfo on Windows, but it will be required to build from sources

@venki-thiyag
Copy link
Contributor Author

Question: OpenCL version varies from one system to another, OpenVINO built with default OpenCL is expected to work on all these machines or OpenCL binaries needs to be distributed along with built OpenVINO binaries?

@venki-thiyag
Copy link
Contributor Author

Looks strange, do you have tested OpenVINO without modifications or on other machine? We have no meet similar problems on our TGL machines. The additional test can be performed using clinfo on Windows, but it will be required to build from sources

I am unable to find clinfo in binaries built from OpenVINO source, from where can I get this tool?

@ilya-lavrenov
Copy link
Contributor

ilya-lavrenov commented Jun 7, 2023

Question: OpenCL version varies from one system to another, OpenVINO built with default OpenCL is expected to work on all these machines or OpenCL binaries needs to be distributed along with built OpenVINO binaries?

OpenVINO links against OpenCL ICD loader and it's ABI is stable:

$ ldd bin/intel64/Release/libopenvino_intel_gpu_plugin.so
        ...
        libOpenCL.so.1 => /lib/x86_64-linux-gnu/libOpenCL.so.1 (0x00007f2e129a6000)

OpenCL 2.0 standard is used, which is available everywhere.

set(INTEL_GPU_TARGET_OCL_VERSION "200" CACHE STRING "Target version of OpenCL which should be used by GPU plugin")

I am unable to find clinfo in binaries built from OpenVINO source, from where can I get this tool?
It's not an OpenVINO tool, it's standard OpenCL tool:

$ apt-file search clinfo
clinfo: /usr/bin/clinfo

So, you can use apt-get install clinfo

@venki-thiyag
Copy link
Contributor Author

For Windows Platform where to get?

@venki-thiyag
Copy link
Contributor Author

I built OpenVINO 2022.3.0 from branch releases/2022/3 crash is observed when running hello_query_device.exe. I am trying to get the crash dump from test user.
But I am sure it should be same as the one that I reported earlier.
Any ideas on why this can happen?

@venki-thiyag
Copy link
Contributor Author

Backtrace is given above, pasting the same here:

 	OpenCL.dll!clGetDeviceIDs(_cl_platform_id * platform, unsigned __int64 device_type, unsigned int num_entries, _cl_device_id * * devices, unsigned int * num_devices) Line 99	C
 	[Inline Frame] openvino_intel_gpu_plugin.dll!cl::Platform::getDevices(unsigned __int64) Line 2580	C++
>	openvino_intel_gpu_plugin.dll!cldnn::ocl::ocl_device_detector::create_device_list() Line 205	C++
 	openvino_intel_gpu_plugin.dll!cldnn::ocl::ocl_device_detector::get_available_devices(void * user_context, void * user_device, int ctx_device_id, int target_tile_id) Line 160	C++
 	openvino_intel_gpu_plugin.dll!cldnn::device_query::device_query(cldnn::engine_types engine_type, cldnn::runtime_types runtime_type, void * user_context, void * user_device, int ctx_device_id, int target_tile_id) Line 25	C++
 	openvino_intel_gpu_plugin.dll!ov::intel_gpu::Plugin::Plugin() Line 132	C++
 	[Inline Frame] openvino_intel_gpu_plugin.dll!std::_Construct_in_place(ov::intel_gpu::Plugin &) Line 158	C++
 	[Inline Frame] openvino_intel_gpu_plugin.dll!std::_Ref_count_obj2<ov::intel_gpu::Plugin>::{ctor}() Line 2029	C++
 	[Inline Frame] openvino_intel_gpu_plugin.dll!std::make_shared() Line 2748	C++
 	openvino_intel_gpu_plugin.dll!CreatePluginEngine(std::shared_ptr<InferenceEngine::IInferencePlugin> & plugin) Line 1011	C++
 	[Frames may be missing, no binary loaded for openvino.dll]	
 	openvino.dll!00007ff9b1c27ef0()	Unknown

@venki-thiyag
Copy link
Contributor Author

Please note that it doesn't happen on all system, only on few systems this issue is seen.

@ilya-lavrenov
Copy link
Contributor

@venki-thiyag have you tried 2023.0 release? There we have improved device detection logic.
The fix is not ported to 2022.3 yet #17333. If the fix helps, we can try to merge this asap to have it in 2022.3.1 release.

@venki-thiyag
Copy link
Contributor Author

@ilya-lavrenov Not yet tried, will try it ASAP.
I think following change is the one which might fix the issue:
https://github.com/openvinotoolkit/openvino/pull/17333/files#diff-510cc648b48cf0f22ae37a6fa2a7ed37c4d8fcbc228bd123db42787be4c24424L193

@venki-thiyag
Copy link
Contributor Author

venki-thiyag commented Jun 10, 2023

@ilya-lavrenov Issue is still seen with 2023.0 version, callstack is the following:

` OpenCL.dll!00007ff9e7f34658() Unknown

openvino_intel_gpu_plugin.dll!cl::Platform::getDevices(unsigned __int64 type, std::vector<cl::Device,std::allocatorcl::Device> * devices) Line 2625 C++
openvino_intel_gpu_plugin.dll!cldnn::ocl::ocl_device_detector::create_device_list() Line 207 C++
openvino_intel_gpu_plugin.dll!cldnn::ocl::ocl_device_detector::get_available_devices(void * user_context, void * user_device, int ctx_device_id, int target_tile_id) Line 156 C++
openvino_intel_gpu_plugin.dll!cldnn::device_query::device_query(cldnn::engine_types engine_type, cldnn::runtime_types runtime_type, void * user_context, void * user_device, int ctx_device_id, int target_tile_id) Line 25 C++
openvino_intel_gpu_plugin.dll!ov::intel_gpu::Plugin::Plugin() Line 146 C++
[Inline Frame] openvino_intel_gpu_plugin.dll!std::_Construct_in_place(ov::intel_gpu::Plugin &) Line 158 C++
[Inline Frame] openvino_intel_gpu_plugin.dll!std::_Ref_count_obj2ov::intel_gpu::Plugin::{ctor}() Line 2029 C++
[Inline Frame] openvino_intel_gpu_plugin.dll!std::make_shared() Line 2748 C++
openvino_intel_gpu_plugin.dll!CreatePluginEngine(std::shared_ptrov::IPlugin & plugin) Line 987 C++
openvino.dll!ov::CoreImpl::get_plugin(const std::string & pluginName) Line 443 C++
openvino.dll!ov::CoreImpl::GetMetric(const std::string & deviceName, const std::string & name, const std::map<std::string,ov::Any,std::lessstd::string,std::allocator<std::pair<std::string const ,ov::Any>>> & options) Line 196 C++
openvino.dll!ov::CoreImpl::get_available_devices() Line 696 C++
openvino.dll!ov::CoreImpl::GetAvailableDevices() Line 205 C++
openvino.dll!ov::Core::get_available_devices() Line 254 C++
RCVNativeVBG.dll!VBG::OpenVinoProcessing::loadModel(const std::wstring & modelPath) Line 56 C++
RCVNativeVBG.dll!VBG::IVBGMLProcessing::loadBindModel(const std::wstring & modelPath) Line 164 C++
`

Crash dump and symbol files is attached at https://drive.google.com/file/d/1a1-a1XNcpF7NqCTg5bHb0SYESvmhiLUZ/view?usp=sharing

@vladimir-paramuzov
Copy link
Contributor

@venki-thiyag

I am unable to find clinfo in binaries built from OpenVINO source, from where can I get this tool?

clinfo can be built from sources (https://github.com/Oblomov/clinfo) for windows. Could you check if it works well in your setup?

@avitial avitial removed the bug Something isn't working label Jun 13, 2023
@venki-thiyag
Copy link
Contributor Author

venki-thiyag commented Jun 16, 2023

@vladimir-paramuzov I was able to compile clinfo from sources, following steps were followed:

  1. Installed Intel SDK
  2. Changed lib to LIBS = OpenCL.lib

Post this clinfo executable got generated. On my system clinfo app was working fine. But on users machine where crash was observed only the following output was seen:

PS C:\temp\clinfo> .\clinfo.exe

Number of platforms                               1

PS C:\temp\clinfo>

This looks like a crash and most likely pointing to same callstack as previous one (have asked for callstack from customer).

Any next steps?

@ilya-lavrenov
Copy link
Contributor

This looks like a crash and most likely pointing to same callstack as previous one (have asked for callstack from customer).

It means issue not with OpenVINO, but rather with installation / setup on your machine.
Such issues are out of our scope.

@venki-thiyag
Copy link
Contributor Author

@ilya-lavrenov if we use only CPU, then crash is not seen. Ideally OpenVINO GPU should report as not supported device and fallback to CPU mode, but should not crash.
Crash is seen on set of machines.
Is there any check that is being missed?

@venki-thiyag
Copy link
Contributor Author

venki-thiyag commented Jun 16, 2023

List of machine models where the problem is seen:

Dell Vostro 5890 i5-10400
Dell Latitude 5420 i7-1165G7

Will keep updating as we get more results from customers.

@ilya-lavrenov
Copy link
Contributor

Ideally OpenVINO GPU should report as not supported device and fallback to CPU mode, but should not crash.

Use AUTO device for this, it serves for such needs.
Issue is not with Intel GPU plugin, but with OpenCL on your machine, because even clinfo does not work.

@venki-thiyag
Copy link
Contributor Author

ok, will try with AUTO device, but my suspecion is tha this too will fail, as AUTO logic too will try to get available devices and most likely cause crash.

@ilya-lavrenov
Copy link
Contributor

@songbell do we have a code inside auto that can catch SEGFAULT and fallback to CPU?

@songbell
Copy link
Contributor

songbell commented Jun 19, 2023

@songbell do we have a code inside auto that can catch SEGFAULT and fallback to CPU?

segfault is out of scope, we can catch inference exceptions from gpu, but this crash looks like in early create plugin engine stage, so I suppose auto will not help in this case.

@venki-thiyag
Copy link
Contributor Author

@songbell any other suggestions here? Are we missing any additional check related to OpenCL GPU capability on system?

@songbell
Copy link
Contributor

e missing any additional check related to Open
@vladimir-paramuzov can we throw exception with ocl crash in signal handler? or is there other check in ocl early stage we can do instead of running into segfault?

@avitial
Copy link
Contributor

avitial commented Jul 10, 2023

@songbell @vladimir-paramuzov gentle ping.

@peterchen-intel
Copy link
Contributor

@vladimir-paramuzov Can GPU support "OpenVINO GPU should report as not supported device and fallback to CPU mode, but should not crash." in this scenario as @ilya-lavrenov mentioned?

@vladimir-paramuzov
Copy link
Contributor

@peterchen-intel, @songbell, @avitial Looks like technically we can add a custom handler for signals, but it seems that it may be not a good idea, at least for linux:

  1. SIGSEGV means that a program might be in a corrupted state
  2. POSIX standard says that The behavior of a process is undefined after it returns normally from a signal-catching function for a SIGBUS, SIGFPE, SIGILL, or SIGSEGV signal that was not generated by kill(), sigqueue(), raise()
  3. Signal handlers seem to be set for the whole process which might impact other components (even if we reset handler to default after attempt to detect devices - a parallel thread can be impacted)

For windows it seems that Structured exception handling api may allow to safely continue process after catching certain signal types, but there are also noncontinuable exceptions, not sure which one is signaled in this particular case. However I haven't learned this SEH win api deeply, so not sure if it has any side effects similar to posix signals.

From my point of view, it's better to update drivers or manually disable GPU plugin (remove dll for example) on such systems rather than have a risk of unwanted side effects due to recovery attempt. But if someone knows a good and safe (and ideally portable) way to recover after segmentation fault, feel free to create PR, contributions are welcomed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants