Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pipeline部署模型,出现lod报错,_get_bbox_result无法返回bbox_results #1902

Closed
ClassmateXiaoyu opened this issue Jan 4, 2023 · 6 comments

Comments

@ClassmateXiaoyu
Copy link

环境
CUDA 11.7
cudnn 8.4.1
显卡:GTX 1070 
python 3.8.13
PaddlePaddle 2.4.1.post117
paddle-serving-server-gpu 0.9.0
paddle_serving_app 0.9.0

用paddleX训练的PPYOLOv2模型,通过python -m paddle_serving_client.convert --dirname  --model_filename  --params_filename  --serving_server serving_server --serving_client serving_client命令将inference模型转为了server模型。
发现一个问题,同一个模型用不同的方式部署后,会出现lod报错。具体如下:
1、当我用pipeline方式部署,fetch_dict中没有fetch_name.lod这个键,fetch_dict:  {'save_infer_model/scale_0.tmp_1': array([[  0.        ,   0.85202295, 216.68979   ,  64.207535  ,        436.6143    , 332.37054   ]], dtype=float32)}。
也就是没有lod信息,client与server通讯时,出现报错
Traceback (most recent call last):
  File "/root/anaconda3/envs/paddle38/lib/python3.8/site-packages/paddle_serving_server/pipeline/error_catch.py", line 97, in wrapper
    res = func(*args, **kw)
  File "/root/anaconda3/envs/paddle38/lib/python3.8/site-packages/paddle_serving_server/pipeline/operator.py", line 1179, in postprocess_help
    postped_data, prod_errcode, prod_errinfo = self.postprocess(
  File "pipeline_web_service_linux.py", line 72, in postprocess
    self.img_postprocess(
  File "/root/anaconda3/envs/paddle38/lib/python3.8/site-packages/paddle_serving_app/reader/image_reader.py", line 426, in __call__
    bbox_result = self._get_bbox_result(image_with_bbox, fetch_name,
  File "/root/anaconda3/envs/paddle38/lib/python3.8/site-packages/paddle_serving_app/reader/image_reader.py", line 344, in _get_bbox_result
    lod = [fetch_map[fetch_name + '.lod']]
KeyError: 'save_infer_model/scale_0.tmp_1.lod'
Classname: Op._run_postprocess.<locals>.postprocess_help
FunctionName: postprocess_help

2、当我用非pipeline方式部署时,fetch_map则有fetch_name.lod这个键,fetch_map:{'save_infer_model/scale_0.tmp_1': array([[0.0000000e+00, 6.3646980e-02, 5.2615891e+00, 1.2278875e+02,
        1.6876831e+02, 3.5357916e+02],
       [0.0000000e+00, 4.2369448e-02, 6.6680511e+01, 6.9318405e+01,
        6.0023975e+02, 5.3855756e+02],
       [0.0000000e+00, 1.8086428e-02, 1.2872772e+02, 1.4232706e+02,
        2.9876392e+02, 3.3751181e+02],
       [0.0000000e+00, 1.5854711e-02, 1.8734198e+02, 3.1824486e+01,
        3.5457477e+02, 1.9962274e+02],
       [0.0000000e+00, 1.5454855e-02, 2.1284140e+02, 1.9946268e+02,
        3.9645621e+02, 4.0698849e+02],
       [0.0000000e+00, 1.4058443e-02, 1.5301871e+02, 2.4853967e+02,
        3.2183228e+02, 4.3125073e+02],
       [0.0000000e+00, 1.2545503e-02, 1.1664839e+02, 2.4064153e+02,
        3.0317432e+02, 4.3767188e+02],
       [0.0000000e+00, 1.1161749e-02, 3.8942078e+01, 1.3401808e+02,
        1.8269760e+02, 3.4691406e+02],
       [0.0000000e+00, 1.0988280e-02, 1.4913477e+02, 1.8804048e+02,
        3.2895029e+02, 3.5642706e+02],
       [0.0000000e+00, 1.0884989e-02, 1.5156635e+02, 2.1480481e+02,
        3.2716016e+02, 3.9296497e+02]], dtype=float32), 'save_infer_model/scale_0.tmp_1.lod': array([ 0, 10])}。
client与server通讯时则没有报错,可以正常返回预测结果。
{'result': [{'bbox': [5.261589050292969, 122.78874969482422, 164.50672149658203, 231.79041290283203], 'category_id': 0, 'score': 0.06364697962999344}]}

请问技术同学,为何同一个模型,不同部署方式,会出现lod缺失的问题,这个问题该如何处理呀,谢谢!
@ClassmateXiaoyu
Copy link
Author

系统环境centos 7.9

@fanruifeng
Copy link

你好 这个问题解决了嘛 目前我也是这个情况

@ClassmateXiaoyu
Copy link
Author

你好 这个问题解决了嘛 目前我也是这个情况

暂未解决,我还未想到如何解决,官方技术同学也还未答复这个问题

@fanruifeng
Copy link

好的 方便加个QQ嘛 1125729232 交流下; 我目前用 非pipeline方式部署时, 我服务端服务也能正常启动, 但是在客户端处理的时候,接口返回异常 {'err_no': 10000, 'err_msg': 'Log_id: 10000 Raise_msg: transpose_0.tmp_0 ClassName: Op._run_postprocess..postprocess_help FunctionName: postprocess_help', 'key': [], 'value': [], 'tensors': []}

@wjplove8
Copy link

#1635 (comment)

好的 方便加个QQ嘛 1125729232 交流下; 我目前用 非pipeline方式部署时, 我服务端服务也能正常启动, 但是在客户端处理的时候,接口返回异常 {'err_no': 10000, 'err_msg': 'Log_id: 10000 Raise_msg: transpose_0.tmp_0 ClassName: Op._run_postprocess..postprocess_help FunctionName: postprocess_help', 'key': [], 'value': [], 'tensors': []}

你好 这个问题解决了嘛 目前我也是这个情况

@HuiHuiSun
Copy link

你好 这个问题解决了嘛 目前我也是这个情况

暂未解决,我还未想到如何解决,官方技术同学也还未答复这个问题

你好,请问这个问题现在解决了吗?

@paddle-bot paddle-bot bot closed this as completed Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants