Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用paddleserving压测出现一些问题 #872

Closed
QingYuan-L opened this issue Nov 3, 2020 · 4 comments
Closed

使用paddleserving压测出现一些问题 #872

QingYuan-L opened this issue Nov 3, 2020 · 4 comments
Assignees
Labels

Comments

@QingYuan-L
Copy link

env:ubuntu1804,显卡1650,千兆网,brpc方式,pp-yolo
1.单线程请求,无报错,qps:5
2.5个线程,错误率18%
全部报错都为

E1103 17:58:25.655791  6378 general_model.cpp:561] failed call predictor with req: insts { tensor_array { float_data: -1.278791 float_data: -1.2959157 float_data: -1.3130405 float_data: -1.3301653 float_data: -1.3301653 float_data: -1.34729 float_data: -1.34729 float_data: -1.34729 float_data: -1.34729 float_data: -1.34729 float_data: -1.3301653 float_data: -1.3301653 float_data: -1.34729 float_data: -1.3644147 float_data: -1.3986642 float_data: -1.3130405 float_data: -1.2274168 float_data: -1.1589177 float_data: -1.0732939 float_data: -1.0219196 float_data: -1.0219196 float_data: -1.0390444 float_data: -1.1246682 float_data: -1.1931672 float_data: -1.2616662 .....(此处省略,都是图片数据)
2020-11-03 17:58:25,662-ERROR: Exception on /start [POST]

3.10个线程,错误率32%,一段时间后直接崩溃,除了上面的报错,崩溃报错为

2020-11-03 17:58:26,181-INFO: 192.168.1.195 - - [03/Nov/2020 17:58:26] "POST /start HTTP/1.1" 200 -
W1103 17:58:26.188755 14849 init.cc:226] Warning: PaddlePaddle catches a failure signal, it may not work properly
W1103 17:58:26.188767 14849 init.cc:228] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle
W1103 17:58:26.188771 14849 init.cc:231] The detail failure signal is:

W1103 17:58:26.188772 14849 init.cc:234] *** Aborted at 1604397506 (unix time) try "date -d @1604397506" if you are using GNU date ***
W1103 17:58:26.192802 14849 init.cc:234] PC: @                0x0 (unknown)
W1103 17:58:26.192942 14849 init.cc:234] *** SIGSEGV (@0x18) received by PID 14831 (TID 0x7fdeb28dc700) from PID 24; stack trace: ***
W1103 17:58:26.193902 14849 init.cc:234]     @     0x7fdf19ba48a0 (unknown)
W1103 17:58:26.194393 14849 init.cc:234]     @     0x7fdeb586659e brpc::Controller::Call::OnComplete()
W1103 17:58:26.194905 14849 init.cc:234]     @     0x7fdeb58669a1 brpc::Controller::EndRPC()
W1103 17:58:26.195415 14849 init.cc:234]     @     0x7fdeb5867ea4 brpc::Controller::OnVersionedRPCReturned()
W1103 17:58:26.195947 14849 init.cc:234]     @     0x7fdeb588f08f brpc::policy::ProcessRpcResponse()
W1103 17:58:26.196414 14849 init.cc:234]     @     0x7fdeb582493a brpc::ProcessInputMessage()
W1103 17:58:26.196831 14849 init.cc:234]     @     0x7fdeb5825e7d brpc::InputMessenger::OnNewMessages()
W1103 17:58:26.197309 14849 init.cc:234]     @     0x7fdeb585782d brpc::Socket::ProcessEvent()
W1103 17:58:26.197836 14849 init.cc:234]     @     0x7fdeb5938825 bthread::TaskGroup::task_runner()
W1103 17:58:26.198354 14849 init.cc:234]     @     0x7fdeb5949561 bthread_make_fcontext
@github-actions
Copy link

github-actions bot commented Nov 3, 2020

Message that will be displayed on users' first issue

@bjjwwang
Copy link
Collaborator

env:ubuntu1804,显卡1650,千兆网,brpc方式,pp-yolo
1.单线程请求,无报错,qps:5
2.5个线程,错误率18%
全部报错都为

E1103 17:58:25.655791  6378 general_model.cpp:561] failed call predictor with req: insts { tensor_array { float_data: -1.278791 float_data: -1.2959157 float_data: -1.3130405 float_data: -1.3301653 float_data: -1.3301653 float_data: -1.34729 float_data: -1.34729 float_data: -1.34729 float_data: -1.34729 float_data: -1.34729 float_data: -1.3301653 float_data: -1.3301653 float_data: -1.34729 float_data: -1.3644147 float_data: -1.3986642 float_data: -1.3130405 float_data: -1.2274168 float_data: -1.1589177 float_data: -1.0732939 float_data: -1.0219196 float_data: -1.0219196 float_data: -1.0390444 float_data: -1.1246682 float_data: -1.1931672 float_data: -1.2616662 .....(此处省略,都是图片数据)
2020-11-03 17:58:25,662-ERROR: Exception on /start [POST]

3.10个线程,错误率32%,一段时间后直接崩溃,除了上面的报错,崩溃报错为

2020-11-03 17:58:26,181-INFO: 192.168.1.195 - - [03/Nov/2020 17:58:26] "POST /start HTTP/1.1" 200 -
W1103 17:58:26.188755 14849 init.cc:226] Warning: PaddlePaddle catches a failure signal, it may not work properly
W1103 17:58:26.188767 14849 init.cc:228] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle
W1103 17:58:26.188771 14849 init.cc:231] The detail failure signal is:

W1103 17:58:26.188772 14849 init.cc:234] *** Aborted at 1604397506 (unix time) try "date -d @1604397506" if you are using GNU date ***
W1103 17:58:26.192802 14849 init.cc:234] PC: @                0x0 (unknown)
W1103 17:58:26.192942 14849 init.cc:234] *** SIGSEGV (@0x18) received by PID 14831 (TID 0x7fdeb28dc700) from PID 24; stack trace: ***
W1103 17:58:26.193902 14849 init.cc:234]     @     0x7fdf19ba48a0 (unknown)
W1103 17:58:26.194393 14849 init.cc:234]     @     0x7fdeb586659e brpc::Controller::Call::OnComplete()
W1103 17:58:26.194905 14849 init.cc:234]     @     0x7fdeb58669a1 brpc::Controller::EndRPC()
W1103 17:58:26.195415 14849 init.cc:234]     @     0x7fdeb5867ea4 brpc::Controller::OnVersionedRPCReturned()
W1103 17:58:26.195947 14849 init.cc:234]     @     0x7fdeb588f08f brpc::policy::ProcessRpcResponse()
W1103 17:58:26.196414 14849 init.cc:234]     @     0x7fdeb582493a brpc::ProcessInputMessage()
W1103 17:58:26.196831 14849 init.cc:234]     @     0x7fdeb5825e7d brpc::InputMessenger::OnNewMessages()
W1103 17:58:26.197309 14849 init.cc:234]     @     0x7fdeb585782d brpc::Socket::ProcessEvent()
W1103 17:58:26.197836 14849 init.cc:234]     @     0x7fdeb5938825 bthread::TaskGroup::task_runner()
W1103 17:58:26.198354 14849 init.cc:234]     @     0x7fdeb5949561 bthread_make_fcontext

这个大概是多久会出现这个问题,会不会是负载过高,还是说负载比较低的适合也会出现。

@TeslaZhao TeslaZhao self-assigned this Nov 13, 2020
@TeslaZhao
Copy link
Collaborator

您好,你在压测过程中内存、FD指标正常吗,是否可以提供client端的代码?

@TeslaZhao
Copy link
Collaborator

预测过程中是否出现 Fail to pthread_key_create: Resource temporarily unavailable 错误?

@paddle-bot paddle-bot bot closed this as completed Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants