[Crash][pp_liteseg] CUDNN error(9), CUDNN_STATUS_NOT_SUPPORTED #50853

engineer1109 · 2023-02-24T01:31:56Z

bug描述 Describe the Bug

The model "pp_liteseg_stdc1_cityscapes_1024x512_scale0.5_160k" from PaddleSeg crashes now.
It is running on Paddle Inference for pure C++.
I am sure the model will not crash one or month ago.

Model Link
链接: https://pan.baidu.com/s/1Z6N4XW3r11ln7qoUML9qGQ?pwd=gpi3 提取码: gpi3

crash point conv2d_fusion.
phi::dynload::cudnnConvolutionBiasActivationForward
conv_fusion_kernel.cu:593

Paddle Infer Config:

    paddle_infer::Config config;
    config.SetModel(m_pdmodelPath, m_pdiparamsPath);
    config.EnableUseGpu(100, 0, paddle::AnalysisConfig::Precision::kFloat32);
    config.EnableCUDNN();
    m_predictor = paddle_infer::CreatePredictor(config);

Crash GLOG_v=4

I0224 09:24:40.566617 328095 conv_fusion_kernel.cu:403] Compute ConvFusionOp with cuDNN: data_format=NCHW compute_format=NCHW
I0224 09:24:40.566874 328095 operator.cc:286] Place(gpu:0) Op(conv2d_fusion), inputs:{Bias[fuse_conv_bn/conv2d_eltwise_y_in/30:float[2]({})(Place(gpu:0))], Filter[conv2d_43.w_0:float[2, 4, 3, 3]({})(Place(gpu:0))], Input[concat_6.tmp_0:float[1, 4, 16, 32]({})(Place(gpu:0))], ResidualData[]}, outputs:{Output[relu_31.tmp_0:float[1, 2, 16, 32]({})(Place(gpu:0))]}.
I0224 09:24:40.566902 328095 naive_executor.cc:61] 140736645738496 run Op(conv2d_fusion), inputs:{Bias[fuse_conv_bn/conv2d_eltwise_y_in/31:float[1]({})(Place(gpu:0))], Filter[conv2d_44.w_0:float[1, 2, 3, 3]({})(Place(gpu:0))], Input[relu_31.tmp_0:float[1, 2, 16, 32]({})(Place(gpu:0))], ResidualData[]}, outputs:{Output[sigmoid_0.tmp_0:[0]({})()]}. on scope 0x5555943c8c30
I0224 09:24:40.566920 328095 operator.cc:219] Place(gpu:0) Op(conv2d_fusion), inputs:{Bias[fuse_conv_bn/conv2d_eltwise_y_in/31:float[1]({})(Place(gpu:0))], Filter[conv2d_44.w_0:float[1, 2, 3, 3]({})(Place(gpu:0))], Input[relu_31.tmp_0:float[1, 2, 16, 32]({})(Place(gpu:0))], ResidualData[]}, outputs:{Output[sigmoid_0.tmp_0:[0]({})()]}.
I0224 09:24:40.566931 328095 cuda_info.cc:250] SetDeviceId 0
I0224 09:24:40.566947 328095 operator.cc:2208] op type:conv2d_fusion, expected_kernel_key:{data_type[float]; data_layout[Undefined(AnyLayout)]; place[Place(gpu:0)]; library_type[CUDNN]}
I0224 09:24:40.566962 328095 operator.cc:3162] Done inputs
I0224 09:24:40.566965 328095 operator.cc:3169] Output Outputs not found
I0224 09:24:40.566967 328095 operator.cc:3225] Done outputs
I0224 09:24:40.566973 328095 operator.cc:3466] Done attributes
I0224 09:24:40.566975 328095 operator.cc:3043] Runtime attr `is_test` is passed to GPUDNNContext.
I0224 09:24:40.566979 328095 operator.cc:3043] Runtime attr `fuse_relu_before_depthwise_conv` is passed to GPUDNNContext.
I0224 09:24:40.566983 328095 operator.cc:3043] Runtime attr `use_addto` is passed to GPUDNNContext.
I0224 09:24:40.566987 328095 operator.cc:3043] Runtime attr `workspace_size_MB` is passed to GPUDNNContext.
I0224 09:24:40.566990 328095 operator.cc:3043] Runtime attr `exhaustive_search` is passed to GPUDNNContext.
I0224 09:24:40.566992 328095 operator.cc:3516] Done runtime attributes
I0224 09:24:40.566994 328095 operator.cc:3546] Done runtime extra inputs
I0224 09:24:40.567003 328095 conv_fusion_kernel.cu:403] Compute ConvFusionOp with cuDNN: data_format=NCHW compute_format=NCHW
I0224 09:24:40.568835 328095 op_call_stack.cc:62] ExternalError: CUDNN error(9), CUDNN_STATUS_NOT_SUPPORTED. 
  [Hint: Please search for the error code(9) on website (https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnStatus_t) to get Nvidia's official solution and advice about CUDNN Error.] (at /media/wjl/D2/github/fork/7/Paddle/paddle/phi/kernels/fusion/gpu/conv_fusion_kernel.cu:593)
terminate called after throwing an instance of 'phi::enforce::EnforceNotMet'
  what():  In user code:

    File "export.py", line 144, in <module>
      main(args)
    File "export.py", line 123, in main
      paddle.jit.save(new_net, save_path)
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/dygraph/jit.py", line 631, in wrapper
      func(layer, path, input_spec, **configs)
    File "/home/wjl/.local/lib/python3.8/site-packages/decorator.py", line 232, in fun
      return caller(func, *(extras + args), **kw)
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
      return wrapped_func(*args, **kwargs)
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/dygraph/base.py", line 51, in __impl__
      return func(*args, **kwargs)
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/dygraph/jit.py", line 860, in save
      concrete_program = static_func.concrete_program_specify_input_spec(
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 527, in concrete_program_specify_input_spec
      concrete_program, _ = self.get_concrete_program(
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 436, in get_concrete_program
      concrete_program, partial_program_layer = self._program_cache[cache_key]
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 801, in __getitem__
      self._caches[item_id] = self._build_once(item)
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 785, in _build_once
      concrete_program = ConcreteProgram.from_func_spec(
    File "/home/wjl/.local/lib/python3.8/site-packages/decorator.py", line 232, in fun
      return caller(func, *(extras + args), **kw)
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
      return wrapped_func(*args, **kwargs)
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/dygraph/base.py", line 51, in __impl__
      return func(*args, **kwargs)
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 733, in from_func_spec
      outputs = static_func(*inputs)
    File "export.py", line 74, in forward
      outs = self.net(x)
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
      return self._dygraph_call_func(*inputs, **kwargs)
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
      outputs = self.forward(*inputs, **kwargs)
    File "/home/wjl/github/PaddleSeg/paddleseg/models/pp_liteseg.py", line 114, in forward
      feats_head = self.ppseg_head(feats_selected)  # [..., x8, x16, x32]
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
      return self._dygraph_call_func(*inputs, **kwargs)
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
      outputs = self.forward(*inputs, **kwargs)
    File "/home/wjl/github/PaddleSeg/paddleseg/models/pp_liteseg.py", line 191, in forward
      high_feat = arm(low_feat, high_feat)
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
      return self._dygraph_call_func(*inputs, **kwargs)
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
      outputs = self.forward(*inputs, **kwargs)
    File "/home/wjl/github/PaddleSeg/paddleseg/models/layers/tensor_fusion.py", line 75, in forward
      out = self.fuse(x, y)
    File "/home/wjl/github/PaddleSeg/paddleseg/models/layers/tensor_fusion.py", line 182, in fuse
      atten = F.sigmoid(self.conv_xy_atten(atten))
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
      return self._dygraph_call_func(*inputs, **kwargs)
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
      outputs = self.forward(*inputs, **kwargs)
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/dygraph/container.py", line 98, in forward
      input = layer(input)
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
      return self._dygraph_call_func(*inputs, **kwargs)
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
      outputs = self.forward(*inputs, **kwargs)
    File "/home/wjl/github/PaddleSeg/paddleseg/models/layers/layer_libs.py", line 107, in forward
      x = self._conv(x)
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/dygraph/layers.py", line 930, in __call__
      return self._dygraph_call_func(*inputs, **kwargs)
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/dygraph/layers.py", line 915, in _dygraph_call_func
      outputs = self.forward(*inputs, **kwargs)
    File "/usr/local/lib/python3.8/dist-packages/paddle/nn/layer/conv.py", line 666, in forward
      out = F.conv._conv_nd(
    File "/usr/local/lib/python3.8/dist-packages/paddle/nn/functional/conv.py", line 168, in _conv_nd
      helper.append_op(
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/layer_helper.py", line 44, in append_op
      return self.main_program.current_block().append_op(*args, **kwargs)
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/framework.py", line 3615, in append_op
      op = Operator(
    File "/usr/local/lib/python3.8/dist-packages/paddle/fluid/framework.py", line 2635, in __init__
      for frame in traceback.extract_stack():

    ExternalError: CUDNN error(9), CUDNN_STATUS_NOT_SUPPORTED. 
      [Hint: Please search for the error code(9) on website (https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnStatus_t) to get Nvidia's official solution and advice about CUDNN Error.] (at /media/wjl/D2/github/fork/7/Paddle/paddle/phi/kernels/fusion/gpu/conv_fusion_kernel.cu:593)
      [operator < conv2d_fusion > error]

Code is on develop 605242a
System Ubuntu 20.04
GCC 9.4.0
CUDA 11.7
CUDNN 8.7.0

其他补充信息 Additional Supplementary Information

No response

The text was updated successfully, but these errors were encountered:

paddle-bot · 2023-02-24T01:32:00Z

您好，我们已经收到了您的问题，会安排技术人员尽快解答您的问题，请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时，您也可以通过查看官网API文档、常见问题、历史Issue、AI社区来寻求解答。祝您生活愉快～

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the API，FAQ，Github Issue and AI community to get the answer.Have a nice day!

engineer1109 · 2023-02-24T01:37:26Z

模型是PaddleSeg导出来的，还是百度自研的模型

SunNy820828449 · 2023-02-24T11:38:51Z

是不是你本地cudnn的版本不合适，这个应该不是模型的问题。

engineer1109 · 2023-02-24T11:46:05Z

是不是你本地cudnn的版本不合适，这个应该不是模型的问题。

当然不是模型的问题，是inference库代码有问题

不太可能是cudnn不合适这种问题，直接源码编译。以前的commit没有这个问题，环境没有变。cudnn 8.7还不够高吗？其他模型没问题。

engineer1109 · 2023-02-26T15:51:31Z

@SunNy820828449
链接: https://pan.baidu.com/s/19clK3-C4t6JCYmDKqKcPPg 密码: cmoa
已经可以确定是近期IR优化代码存在BUG。
config.SwitchIrOptim(false);
开启此代码，关闭IR优化，就可以正常运行。反之，则会出现上面的崩溃。

SunNy820828449 · 2023-02-27T06:04:17Z

我已经把问题反馈给inference的同学了

2054686334 · 2023-03-17T06:03:05Z

paddlepaddle-gpu==2.4.2
cuda==11.7.1
cudnn==8.8.0
python310
使用PP-Human时遇到相同报错

sunjinghua · 2023-03-20T02:35:14Z

paddlepaddle-gpu==2.4.2
cuda==11.6
cudnn==8.7
python3.7
也存在这个问题

yanghebao · 2023-04-25T03:51:02Z

paddlepaddle-gpu==2.4.1
cuda==11.6
cudnn==8.8
python==3.9
config.switch_ir_optim(True) 时存在这个问题

engineer1109 · 2023-05-22T08:51:08Z

更多的网络出现，picodet也有 @luotao1

engineer1109 · 2023-05-22T08:52:00Z

更新CUDA 12.1 CUDNN 8.9 一样也有

jimmyflycv · 2023-07-12T10:30:07Z

I have got 2 environments, the first one outputs this error.
paddle-bfloat 0.1.7
paddleocr 2.6.1.3
paddlepaddle-gpu 2.3.2.post112
cuda 11.3
cudnn 8.9

the second one wont output this error.
paddle-bfloat 0.1.7
paddleocr 2.6.1.3
paddlepaddle-gpu 2.3.2.post112
cuda 11.3
cudnn 8.2

engineer1109 · 2023-07-12T10:32:47Z

@jzhang533 这个问题谁能解一下，堆了半年了

jzhang533 · 2023-07-12T11:13:46Z

我试试看能不能找到人。

yuanlehome · 2023-07-13T02:46:03Z

我在跟进中，看看能不能复现并解决～

yuanlehome · 2023-07-13T08:26:08Z

目前看是cudnn >= 8.7的bug，8.6及以下都没问题。修复pr #55407 稍后会合入2.5分支～

engineer1109 · 2023-07-14T07:12:33Z

Thanks for fixed.

liangbaikaizzzZZZ · 2023-09-26T08:30:51Z

paddlepaddle-gpu==2.5.1
cuda==11.6
cudnn==8.7
python3.8
还是存在这个问题

engineer1109 · 2023-09-27T04:32:21Z

@liangbaikaizzzZZZ release不稳定，用develop试试

xiemeilong · 2023-09-27T09:24:42Z

同样的问题：
paddlepaddle-gpu==2.5.1
cuda==12.2
cudnn==8.9
python3.10

engineer1109 · 2023-09-28T02:39:54Z

@xiemeilong 都说了 release不稳定，用develop

WangShengFeng1 · 2024-03-10T12:55:34Z

目前看是cudnn >= 8.7的bug，8.6及以下都没问题。修复pr #55407 稍后会合入2.5分支～

请问，我是在aistodio平台上跑的，我该怎样降低cuddn版本

Ultraman6 · 2024-09-06T12:30:51Z

所以到底是怎么解决的？用2.5及以上的paddle又会出现不能与pytorch同时运行的bug

engineer1109 added status/new-issue 新建 type/bug-report 报bug labels Feb 24, 2023

paddle-bot bot added the PFCC Paddle Framework Contributor Club，https://github.com/PaddlePaddle/community/tree/master/pfcc label Feb 24, 2023

engineer1109 changed the title ~~[PaddleSeg]~~ [PaddleSeg] CUDNN error(9), CUDNN_STATUS_NOT_SUPPORTED Feb 24, 2023

engineer1109 changed the title ~~[PaddleSeg] CUDNN error(9), CUDNN_STATUS_NOT_SUPPORTED~~ [Crash][pp_liteseg] CUDNN error(9), CUDNN_STATUS_NOT_SUPPORTED Feb 24, 2023

paddle-bot bot added status/following-up 跟进中 and removed status/new-issue 新建 labels Mar 2, 2023

yuanlehome mentioned this issue Jul 13, 2023

[BugFix] Fix issue-50853: CUDNN error(9), CUDNN_STATUS_NOT_SUPPORTED #55407

Merged

engineer1109 closed this as completed Jul 14, 2023

paddle-bot bot added status/close 已关闭 and removed status/following-up 跟进中 labels Jul 14, 2023

yuanlehome mentioned this issue Jul 17, 2023

[Cherry-Pick] [BugFix] Fix issue-50853: CUDNN error(9), CUDNN_STATUS_NOT_SUPPORTED #55412

Merged

engineer1109 mentioned this issue Sep 27, 2023

cuda和cudnn，建议版本会报错CUDNN error(9), CUDNN_STATUS_NOT_SUPPORTED. #57750

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Crash][pp_liteseg] CUDNN error(9), CUDNN_STATUS_NOT_SUPPORTED #50853

[Crash][pp_liteseg] CUDNN error(9), CUDNN_STATUS_NOT_SUPPORTED #50853

engineer1109 commented Feb 24, 2023

paddle-bot bot commented Feb 24, 2023

engineer1109 commented Feb 24, 2023

SunNy820828449 commented Feb 24, 2023

engineer1109 commented Feb 24, 2023

engineer1109 commented Feb 26, 2023

SunNy820828449 commented Feb 27, 2023

2054686334 commented Mar 17, 2023

sunjinghua commented Mar 20, 2023

yanghebao commented Apr 25, 2023

engineer1109 commented May 22, 2023

engineer1109 commented May 22, 2023

jimmyflycv commented Jul 12, 2023 •

edited

Loading

engineer1109 commented Jul 12, 2023

jzhang533 commented Jul 12, 2023

yuanlehome commented Jul 13, 2023 •

edited

Loading

yuanlehome commented Jul 13, 2023 •

edited

Loading

engineer1109 commented Jul 14, 2023

liangbaikaizzzZZZ commented Sep 26, 2023

engineer1109 commented Sep 27, 2023

xiemeilong commented Sep 27, 2023 •

edited

Loading

engineer1109 commented Sep 28, 2023

WangShengFeng1 commented Mar 10, 2024

Ultraman6 commented Sep 6, 2024

[Crash][pp_liteseg] CUDNN error(9), CUDNN_STATUS_NOT_SUPPORTED #50853

[Crash][pp_liteseg] CUDNN error(9), CUDNN_STATUS_NOT_SUPPORTED #50853

Comments

engineer1109 commented Feb 24, 2023

bug描述 Describe the Bug

其他补充信息 Additional Supplementary Information

paddle-bot bot commented Feb 24, 2023

engineer1109 commented Feb 24, 2023

SunNy820828449 commented Feb 24, 2023

engineer1109 commented Feb 24, 2023

engineer1109 commented Feb 26, 2023

SunNy820828449 commented Feb 27, 2023

2054686334 commented Mar 17, 2023

sunjinghua commented Mar 20, 2023

yanghebao commented Apr 25, 2023

engineer1109 commented May 22, 2023

engineer1109 commented May 22, 2023

jimmyflycv commented Jul 12, 2023 • edited Loading

engineer1109 commented Jul 12, 2023

jzhang533 commented Jul 12, 2023

yuanlehome commented Jul 13, 2023 • edited Loading

yuanlehome commented Jul 13, 2023 • edited Loading

engineer1109 commented Jul 14, 2023

liangbaikaizzzZZZ commented Sep 26, 2023

engineer1109 commented Sep 27, 2023

xiemeilong commented Sep 27, 2023 • edited Loading

engineer1109 commented Sep 28, 2023

WangShengFeng1 commented Mar 10, 2024

Ultraman6 commented Sep 6, 2024

jimmyflycv commented Jul 12, 2023 •

edited

Loading

yuanlehome commented Jul 13, 2023 •

edited

Loading

yuanlehome commented Jul 13, 2023 •

edited

Loading

xiemeilong commented Sep 27, 2023 •

edited

Loading