Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: NPU compile: L0 zeFenceHostSynchronize result: ZE_RESULT_ERROR_UNKNOWN #27099

Closed
3 tasks done
Zctoylm0927 opened this issue Oct 17, 2024 · 4 comments
Closed
3 tasks done
Assignees
Labels

Comments

@Zctoylm0927
Copy link

OpenVINO Version

2024.3

Operating System

Ubuntu 20.04 (LTS)

Device used for inference

NPU

Framework

PyTorch

Model used

torch.nn.MultiheadAttention

Issue description

I have handwritten a Transformer model that includes three parts: self-attention, cross-attention, and MLP. It can run on the NPU, but when I run only the cross-attention part, the following problem occurs.

RuntimeError: Exception from src/inference/src/cpp/infer_request.cpp:223:
Exception from src/plugins/intel_npu/src/backend/include/zero_utils.hpp:21:
L0 zeFenceHostSynchronize result: ZE_RESULT_ERROR_UNKNOWN, code 0x7ffffffe - an action is required to complete the desired operation

Step-by-step reproduction

my cross_attention code is here:

class cross_block(nn.Module):
    def __init__(self, hidden_size=1200, num_heads=16):
        super(cross_block, self).__init__()
        self.head_dim = hidden_size // num_heads
        self.dim = hidden_size
        self.d_model = hidden_size
        self.num_heads = num_heads 
        
        self.mha = nn.MultiheadAttention(embed_dim=self.d_model, num_heads=self.num_heads)
    
    def cross_attn(self, q, k, v):
        N,B,C = q.shape
        x, output_weights  = self.mha(q, k, v)
        x = x.view(2, N//2, C) # just for testing
        return x
        
    def forward(self, q: torch.Tensor, k: torch.Tensor, v: torch.Tensor) -> torch.Tensor:
        return self.cross_attn(q, k, v)

And followed by my convert code:

example_input = {
    "q": torch.randn(q_shape),
    "k": torch.randn(k_shape),
    "v": torch.randn(v_shape),
}

model = cross_block()
print("--------after model-------")
model = ov.convert_model(model, input=[[1920, 1, 1200], [300, 1, 1200], [300, 1, 1200]], example_input=example_input)
ov.save_model(model, CROSS_OV_PATH)
print("--------after convert-------")
compiled_model = core.compile_model(model, device_name="NPU") #check
print("--------after compile-------")

When I try to use the ov cross block, the problem occurs:

t = compiled_model(example_input)

But I use the original model, there is no such problem. And here is my cross block xml.

cross.xml.txt

Relevant log output

Traceback (most recent call last):
  File "/home/mla/model.py", line 50, in <module>
    t = compiled_model(example_input)
  File "/home/xxx/anaconda3/envs/env1/lib/python3.10/site-packages/openvino/runtime/ie_api.py", line 388, in __call__
    return self._infer_request.infer(
  File "/home/xxx/anaconda3/envs/env1/lib/python3.10/site-packages/openvino/runtime/ie_api.py", line 132, in infer
    return OVDict(super().infer(_data_dispatch(
RuntimeError: Exception from src/inference/src/cpp/infer_request.cpp:223:
Exception from src/plugins/intel_npu/src/backend/include/zero_utils.hpp:21:
L0 zeFenceHostSynchronize result: ZE_RESULT_ERROR_UNKNOWN, code 0x7ffffffe - an action is required to complete the desired operation

Issue submission checklist

  • I'm reporting an issue. It's not a question.
  • I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
  • There is reproducer code and related data files such as images, videos, models, etc.
@Zctoylm0927 Zctoylm0927 added bug Something isn't working support_request labels Oct 17, 2024
@andrei-kochin andrei-kochin added the category: NPU OpenVINO NPU plugin label Oct 17, 2024
@avitial
Copy link
Contributor

avitial commented Oct 25, 2024

@Zctoylm0927 thanks for reaching out, do you observe the same behavior on the latest 2024.4 release or nightly release? If you can please share minimal sample reproducer and IR model. Also provide the NPU driver version you are using.

@avitial avitial removed the bug Something isn't working label Oct 28, 2024
@Zctoylm0927
Copy link
Author

Thanks for reply. I have tried 2024.4 release,
image
And still the same mistake.
image
I only shared the xml file before, now I upload the bin file together.
cross.zip
I think my NPU driver version is v1.6.0 cause it matches release date.

> ls -ll | grep libnpu_driver_compiler.so -rw-r--r-- 1 root root 94700456 8月 15 01:06 libnpu_driver_compiler.so

Btw, I don't know how to check the npu driver version information. How can I check it?

@avitial
Copy link
Contributor

avitial commented Dec 20, 2024

@Zctoylm0927 sorry for missing your update, thanks for providing the entire model. It seems the issue you reported has been addressed, as of OpenVINO 2024.5/2024.6 and the latest NPU driver (v1.10.0) the issue does not reproduce.

Please have a try in upgrading to the latest OpenVINO release and NPU driver on your end and see if the issue is resolved.

If the issue persists perhaps it could be a coding issue in your application, as the benchmark_app executes the model and inference without issue.

$ benchmark_app -m cross/cross.xml -d NPU -t 5
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
[ INFO ] Build ................................. 2024.6.0-17404-4c0f47d2335-releases/2024/6
[ INFO ]
[ INFO ] Device info:
[ INFO ] NPU
[ INFO ] Build ................................. 2024.6.0-17404-4c0f47d2335-releases/2024/6
[ INFO ]
[ INFO ]
[Step 3/11] Setting device configuration
[ WARNING ] Performance hint was not explicitly specified in command line. Device(NPU) performance hint will be set to PerformanceMode.THROUGHPUT.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 2.96 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Model inputs:
[ INFO ]     x (node: x) : f32 / [...] / [2,3840,1152]
[ INFO ]     y (node: y) : f32 / [...] / [1,240,1152]
[ INFO ] Model outputs:
[ INFO ]     ***NO_NAME*** (node: aten::add/Add) : f32 / [...] / [2,3840,1152]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Model batch size: 1
[Step 6/11] Configuring input of the model
[ INFO ] Model inputs:
[ INFO ]     x (node: x) : f32 / [C,H,W] / [2,3840,1152]
[ INFO ]     y (node: y) : f32 / [C,H,W] / [1,240,1152]
[ INFO ] Model outputs:
[ INFO ]     ***NO_NAME*** (node: aten::add/Add) : f32 / [...] / [2,3840,1152]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 128.30 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ]   DEVICE_ID:
[ INFO ]   ENABLE_CPU_PINNING: False
[ INFO ]   EXECUTION_DEVICES: NPU
[ INFO ]   EXECUTION_MODE_HINT: ExecutionMode.PERFORMANCE
[ INFO ]   INFERENCE_PRECISION_HINT: <Type: 'float16'>
[ INFO ]   LOADED_FROM_CACHE: False
[ INFO ]   MODEL_PRIORITY: Priority.MEDIUM
[ INFO ]   NETWORK_NAME: Model0
[ INFO ]   NPU_COMPILATION_MODE_PARAMS:
[ INFO ]   OPTIMAL_NUMBER_OF_INFER_REQUESTS: 4
[ INFO ]   PERFORMANCE_HINT: PerformanceMode.THROUGHPUT
[ INFO ]   PERFORMANCE_HINT_NUM_REQUESTS: 1
[ INFO ]   PERF_COUNT: False
[...]
[ INFO ] First inference took 114.55 ms
[Step 11/11] Dumping statistics report
[ INFO ] Execution Devices:NPU
[ INFO ] Count:            96 iterations
[ INFO ] Duration:         5239.40 ms
[ INFO ] Latency:
[ INFO ]    Median:        212.04 ms
[ INFO ]    Average:       214.74 ms
[ INFO ]    Min:           171.43 ms
[ INFO ]    Max:           352.77 ms
[ INFO ] Throughput:   18.32 FPS

@avitial avitial self-assigned this Dec 20, 2024
@avitial
Copy link
Contributor

avitial commented Dec 24, 2024

Closing this as it seems issue has been addressed. Feel free to reopen and ask additional questions related to this topic.

@avitial avitial closed this as completed Dec 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants