Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High latency of REST locally #803

Closed
lennon310 opened this issue Aug 14, 2019 · 27 comments · Fixed by #806
Closed

High latency of REST locally #803

lennon310 opened this issue Aug 14, 2019 · 27 comments · Fixed by #806
Assignees

Comments

@lennon310
Copy link
Contributor

I'm trying to profiling the seldon client locally. I was using seldon-core-microservice <MyModel> REST to serve the model. I was using XGBoost model with ~ 30 features and 200 rows. And my script is

@profile
def predict_localhost() -> None:
    sc = SeldonClient()
    p = sc.microservice(
        method="predict",
        payload_type="ndarray",
        data=nparray,
        names=names,
        transport=REST
    )

if __name__ == "__main__":
    for i in range(100):
        predict_localhost()

The profiling result shows:

*** PROFILER RESULTS ***
predict_localhost (client.py:36)
function called 100 times

         16729181 function calls (14395219 primitive calls) in 10.947 seconds

   Ordered by: cumulative time, internal time, call count
   List reduced from 675 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100    0.000    0.000   10.948    0.109 client.py:36(predict_localhost)
      100    0.001    0.000   10.945    0.109 seldon_client.py:323(microservice)
      100    0.215    0.002   10.943    0.109 seldon_client.py:479(microservice_api_rest_seldon_message)
      100    0.000    0.000    5.884    0.059 api.py:104(post)
      100    0.001    0.000    5.884    0.059 api.py:16(request)
      100    0.001    0.000    5.862    0.059 sessions.py:466(request)
      100    0.002    0.000    4.914    0.049 sessions.py:617(send)
      100    0.001    0.000    4.896    0.049 adapters.py:394(send)
      100    0.002    0.000    4.830    0.048 connectionpool.py:449(urlopen)
      100    0.002    0.000    4.811    0.048 connectionpool.py:320(_make_request)
      100    0.001    0.000    4.686    0.047 client.py:1287(getresponse)
      100    0.002    0.000    4.681    0.047 client.py:290(begin)
      700    0.002    0.000    4.655    0.007 {method 'readline' of '_io.BufferedReader' objects}
      100    0.002    0.000    4.654    0.047 client.py:257(_read_status)
      205    0.001    0.000    4.653    0.023 socket.py:572(readinto)
      205    4.651    0.023    4.651    0.023 {method 'recv_into' of '_socket.socket' objects}
      100    0.001    0.000    3.108    0.031 utils.py:67(seldon_message_to_json)
      100    0.007    0.000    3.046    0.030 json_format.py:99(MessageToJson)
      100    0.001    0.000    3.039    0.030 json_format.py:191(ToJsonString)
  300/100    0.002    0.000    1.709    0.017 json_format.py:195(_MessageToJsonObject)
  200/100    0.002    0.000    1.708    0.017 json_format.py:206(_RegularMessageToJsonObject)
528500/100    0.472    0.000    1.707    0.017 json_format.py:271(_FieldToJsonObject)
20100/100    0.020    0.000    1.702    0.017 json_format.py:349(_ListValueMessageToJsonObject)
20100/100    0.338    0.000    1.701    0.017 json_format.py:351(<listcomp>)
560000/20000    0.511    0.000    1.686    0.000 json_format.py:333(_ValueMessageToJsonObject)
      100    0.271    0.003    1.540    0.015 utils.py:233(array_to_grpc_datadef)
      200    0.007    0.000    1.460    0.007 __init__.py:183(dumps)
      200    0.061    0.000    1.452    0.007 encoder.py:182(encode)
20100/100    0.048    0.000    1.268    0.013 utils.py:279(array_to_list_value)
   604700    0.102    0.000    1.252    0.000 encoder.py:412(_iterencode)
    20000    0.100    0.000    1.168    0.000 well_known_types.py:817(extend)
1208800/604700    0.201    0.000    1.150    0.000 encoder.py:333(_iterencode_dict)
   540000    0.353    0.000    1.069    0.000 well_known_types.py:814(append)
1203300/603300    0.520    0.000    0.949    0.000 encoder.py:277(_iterencode_list)
      100    0.001    0.000    0.778    0.008 sessions.py:426(prepare_request)
      100    0.001    0.000    0.755    0.008 models.py:307(prepare)
      100    0.001    0.000    0.707    0.007 models.py:452(prepare_body)
      200    0.001    0.000    0.704    0.004 models.py:82(_encode_params)
      200    0.001    0.000    0.702    0.004 parse.py:846(urlencode)
      200    0.001    0.000    0.700    0.004 parse.py:807(quote_plus)

As you can see the network has a latency of ~ 50 ms (see the api from connectionpool.py), and data conversion contributes ~ 30 ms (MessageToJson from protobuf). That makes our prediction take ~ 100 + ms, which is too slow.

@ukclivecox
Copy link
Contributor

I think there are a couple of initial comments:

  • Can you try with master as that should have a route through predict which does not do the conversion to protoBuffer.
    if is_proto:
    (features, meta, datadef, data_type) = extract_request_parts(request)
    client_response = client_predict(user_model, features, datadef.names, meta=meta)
    return construct_response(user_model, False, request, client_response)
    else:
    (features, meta, datadef, data_type) = extract_request_parts_json(request)
    client_response = client_predict(user_model, features, datadef.names, meta=meta)
    print(client_response)
    return construct_response_json(user_model, False, request, client_response)
  • I see you are using ndarray. Are you able to use one of the tensor varieties?

Finally, gRPC will be much faster. See this blog post: https://www.intel.ai/inference-performance-boost-with-seldon-on-intel-xeon-scalable-processors/#gs.vrh9uz and figure 1.

@lennon310
Copy link
Contributor Author

Thanks! @cliveseldon. I will try with the latest commit soon.

I benchmarked with gRPC as well. It was half of the REST latency but still a bit high. We hope the total latency could be ~ 10 ms.

Yeah because predict requires Numpy array, so I just passed the same data type in client. How faster would you expect with tensor type?

@lennon310
Copy link
Contributor Author

@cliveseldon I install the latest commit of seldon-core and re-ran the profiling test, but didn't see a change:

*** PROFILER RESULTS ***
predict_localhost (client.py:36)
function called 100 times

         16749327 function calls (14415360 primitive calls) in 11.578 seconds

   Ordered by: cumulative time, internal time, call count
   List reduced from 657 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100    0.002    0.000   11.579    0.116 client.py:36(predict_localhost)
      100    0.002    0.000   11.575    0.116 seldon_client.py:351(microservice)
      100    0.217    0.002   11.572    0.116 seldon_client.py:507(microservice_api_rest_seldon_message)
      100    0.000    0.000    6.705    0.067 api.py:104(post)
      100    0.001    0.000    6.705    0.067 api.py:16(request)
      100    0.001    0.000    6.684    0.067 sessions.py:466(request)
      100    0.002    0.000    5.783    0.058 sessions.py:617(send)
      100    0.001    0.000    5.765    0.058 adapters.py:394(send)
      100    0.002    0.000    5.703    0.057 connectionpool.py:449(urlopen)
      100    0.002    0.000    5.685    0.057 connectionpool.py:320(_make_request)
      100    0.001    0.000    5.570    0.056 client.py:1287(getresponse)
      100    0.002    0.000    5.566    0.056 client.py:290(begin)
      700    0.002    0.000    5.541    0.008 {method 'readline' of '_io.BufferedReader' objects}
      100    0.002    0.000    5.540    0.055 client.py:257(_read_status)
      194    0.001    0.000    5.539    0.029 socket.py:572(readinto)
      194    5.537    0.029    5.537    0.029 {method 'recv_into' of '_socket.socket' objects}
      100    0.001    0.000    2.977    0.030 utils.py:67(seldon_message_to_json)
      100    0.006    0.000    2.920    0.029 json_format.py:99(MessageToJson)
      100    0.001    0.000    2.914    0.029 json_format.py:191(ToJsonString)
  300/100    0.002    0.000    1.677    0.017 json_format.py:195(_MessageToJsonObject)
  200/100    0.002    0.000    1.676    0.017 json_format.py:206(_RegularMessageToJsonObject)
528500/100    0.466    0.000    1.674    0.017 json_format.py:271(_FieldToJsonObject)
20100/100    0.019    0.000    1.668    0.017 json_format.py:349(_ListValueMessageToJsonObject)
20100/100    0.334    0.000    1.667    0.017 json_format.py:351(<listcomp>)
560000/20000    0.508    0.000    1.652    0.000 json_format.py:333(_ValueMessageToJsonObject)
      100    0.277    0.003    1.484    0.015 utils.py:233(array_to_grpc_datadef)
      200    0.006    0.000    1.361    0.007 __init__.py:183(dumps)
      200    0.055    0.000    1.355    0.007 encoder.py:182(encode)
20100/100    0.048    0.000    1.207    0.012 utils.py:279(array_to_list_value)
   604700    0.095    0.000    1.168    0.000 encoder.py:412(_iterencode)
    20000    0.100    0.000    1.109    0.000 well_known_types.py:817(extend)
1208800/604700    0.188    0.000    1.073    0.000 encoder.py:333(_iterencode_dict)
   540000    0.344    0.000    1.009    0.000 well_known_types.py:814(append)
1203300/603300    0.503    0.000    0.885    0.000 encoder.py:277(_iterencode_list)
      100    0.001    0.000    0.736    0.007 sessions.py:426(prepare_request)
      100    0.001    0.000    0.714    0.007 models.py:307(prepare)
      100    0.001    0.000    0.669    0.007 models.py:452(prepare_body)
      200    0.001    0.000    0.666    0.003 models.py:82(_encode_params)
      200    0.001    0.000    0.664    0.003 parse.py:846(urlencode)
      200    0.001    0.000    0.662    0.003 parse.py:807(quote_plus)

The code path is still the same ---- My guess is the seldon-method class in your link produced trivial latency.

@ukclivecox
Copy link
Contributor

Can we clarify which lines are the issues is it:
100 0.006 0.000 2.920 0.029 json_format.py:99(MessageToJson)

@lennon310
Copy link
Contributor Author

@cliveseldon With SeldonClient.microservice(), I tracked the code path, it actually still goes to seldon_client instead of seldon_method. And this line seems not to be bypassed unless I missed something or should have used some other way with SeldonClient.

@lennon310
Copy link
Contributor Author

@cliveseldon ok looks like the code goes to seldon_method.
I added a profiler here, and re-ran the script.
Here is the profiler result:

*** PROFILER RESULTS ***
predict (/Users/zhunxu/seldon-core/python/seldon_core/seldon_methods.py:10)
function called 100 times

         13917526 function calls (12735226 primitive calls) in 6.524 seconds

   Ordered by: cumulative time, internal time, call count
   List reduced from 185 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100    0.060    0.001    6.524    0.065 seldon_methods.py:10(predict)
      100    0.000    0.000    4.262    0.043 utils.py:400(extract_request_parts_json)
     4700    0.681    0.000    2.941    0.001 {built-in method numpy.array}
      100    0.000    0.000    2.926    0.029 utils.py:422(extract_request_parts)
      100    0.001    0.000    2.924    0.029 utils.py:118(get_data_from_proto)
      100    0.001    0.000    2.923    0.029 utils.py:146(grpc_datadef_to_array)
      300    0.001    0.000    2.591    0.009 json_format.py:418(ParseDict)
      200    0.002    0.000    2.591    0.013 utils.py:15(json_to_seldon_message)
  700/300    0.003    0.000    2.590    0.009 json_format.py:449(ConvertMessage)
  500/300    0.011    0.000    2.588    0.009 json_format.py:468(_ConvertFieldValuePair)
40200/200    0.680    0.000    2.552    0.013 json_format.py:634(_ConvertListValueMessage)
1120000/40000    0.860    0.000    2.508    0.000 json_format.py:617(_ConvertValueMessage)
  1763200    1.292    0.000    2.188    0.000 well_known_types.py:821(__getitem__)
      100    0.004    0.000    1.439    0.014 utils.py:304(construct_response_json)
  1702700    0.552    0.000    0.896    0.000 well_known_types.py:735(_GetStructValue)
  5462405    0.640    0.000    0.640    0.000 {built-in method builtins.isinstance}
      100    0.003    0.000    0.552    0.006 user_model.py:122(client_predict)
      100    0.002    0.000    0.548    0.005 HHMlflowModel.py:20(predict)
      100    0.000    0.000    0.400    0.004 HHMlflowModel.py:33(_predict)
      100    0.391    0.004    0.400    0.004 core.py:1207(predict)
  1140000    0.378    0.000    0.378    0.000 {method 'add' of 'google.protobuf.pyext._message.RepeatedCompositeContainer' objects}
  1723400    0.350    0.000    0.350    0.000 {method 'WhichOneof' of 'google.protobuf.pyext._message.CMessage' objects}
      100    0.012    0.000    0.211    0.002 {built-in method builtins.print}
      100    0.000    0.000    0.198    0.002 arrayprint.py:1499(_array_str_implementation)
      100    0.001    0.000    0.198    0.002 arrayprint.py:518(array2string)
      100    0.001    0.000    0.196    0.002 arrayprint.py:463(wrapper)
      100    0.001    0.000    0.195    0.002 arrayprint.py:480(_array2string)
      100    0.000    0.000    0.125    0.001 arrayprint.py:707(_formatArray)
20100/100    0.028    0.000    0.125    0.001 arrayprint.py:716(recurser)
      100    0.000    0.000    0.120    0.001 utils.py:67(seldon_message_to_json)
      100    0.061    0.001    0.119    0.001 HHMlflowModel.py:39(_data_prep_prehac)
      100    0.000    0.000    0.113    0.001 json_format.py:99(MessageToJson)
      100    0.000    0.000    0.112    0.001 json_format.py:191(ToJsonString)
    20000    0.041    0.000    0.074    0.000 arrayprint.py:945(__call__)
   121000    0.057    0.000    0.070    0.000 well_known_types.py:811(__len__)
      100    0.001    0.000    0.070    0.001 arrayprint.py:411(_get_format_function)
      100    0.000    0.000    0.069    0.001 arrayprint.py:367(<lambda>)
      100    0.001    0.000    0.068    0.001 arrayprint.py:834(__init__)
      100    0.008    0.000    0.068    0.001 arrayprint.py:861(fillFormat)
  500/200    0.001    0.000    0.062    0.000 json_format.py:195(_MessageToJsonObject)

extracting the request still takes 30 - 40 ms.

@ukclivecox
Copy link
Contributor

Yes. I think the parse the proto is the issue:

request_proto = json_to_seldon_message(request_raw)
(features, meta, datadef, data_type) = extract_request_parts(request_proto)
if data_type == "jsonData":
features = request_raw["jsonData"]
return features, meta, datadef, data_type

@axsaucedo can you look at this?

@ukclivecox
Copy link
Contributor

ukclivecox commented Aug 14, 2019

I think we should be able to optimize this without the parse the Proto.

@axsaucedo
Copy link
Contributor

For sure, as discussed this morning, this was done for validation, but given the proto is also processed in the SeldonEngine prior to this, it's possible to remove this. The work to be done here will be to remove the current proto conversions from the Python wrapper. This should remove this overhead.

@lennon310
Copy link
Contributor Author

Thanks @cliveseldon and @axsaucedo .
As the profiler shows, the proto conversion part takes 1/3~1/2 time of the total latency, so the network request/transmission latency is still an issue, any thought on this part?

(Note we are using Ambassador as gateway for now, and I talked to Ryan, there is a problem that we cannot use gRPC, see here and here. So REST could be the only option for us for now)

@axsaucedo
Copy link
Contributor

Thanks for the heads up @lennon310. We've added #803 with the fixes for this PR. It basically removes the conversion into Proto (except for TFTensor), which would remove overhead. Before landing it would be great if you could give it a try to see if it addresses the latency issues.

@lennon310
Copy link
Contributor Author

OK I will re-run the profiler once this is merged and update the result here. Thanks!

@lennon310
Copy link
Contributor Author

lennon310 commented Aug 16, 2019

I pulled @axsaucedo commit locally and re-ran the profiler,
REST:
Total time:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100    0.002    0.000    6.829    0.068 client.py:45(predict_localhost)
      100    0.002    0.000    6.825    0.068 seldon_client.py:351(microservice)
      100    0.195    0.002    6.822    0.068 seldon_client.py:507(microservice_api_rest_seldon_message)
      100    0.001    0.000    2.710    0.027 utils.py:69(seldon_message_to_json)
      100    0.006    0.000    2.658    0.027 json_format.py:99(MessageToJson)
      100    0.001    0.000    2.652    0.027 json_format.py:191(ToJsonString)
      100    0.000    0.000    2.367    0.024 api.py:104(post)
      100    0.001    0.000    2.367    0.024 api.py:16(request)
      100    0.001    0.000    2.347    0.023 sessions.py:466(request)
      100    0.002    0.000    1.567    0.016 sessions.py:617(send)
      100    0.001    0.000    1.550    0.016 adapters.py:394(send)
      100    0.002    0.000    1.494    0.015 connectionpool.py:449(urlopen)
  300/100    0.002    0.000    1.493    0.015 json_format.py:195(_MessageToJsonObject)
  200/100    0.001    0.000    1.492    0.015 json_format.py:206(_RegularMessageToJsonObject)
528500/100    0.408    0.000    1.491    0.015 json_format.py:271(_FieldToJsonObject)
20100/100    0.017    0.000    1.486    0.015 json_format.py:349(_ListValueMessageToJsonObject)
20100/100    0.293    0.000    1.486    0.015 json_format.py:351(<listcomp>)
      100    0.002    0.000    1.477    0.015 connectionpool.py:320(_make_request)
560000/20000    0.447    0.000    1.473    0.000 json_format.py:333(_ValueMessageToJsonObject)
      100    0.250    0.002    1.374    0.014 utils.py:234(array_to_grpc_datadef)
      100    0.001    0.000    1.319    0.013 client.py:1287(getresponse)
      100    0.002    0.000    1.315    0.013 client.py:290(begin)
      700    0.002    0.000    1.288    0.002 {method 'readline' of '_io.BufferedReader' objects}
      100    0.002    0.000    1.287    0.013 client.py:257(_read_status)
      202    0.001    0.000    1.286    0.006 socket.py:572(readinto)
      202    1.285    0.006    1.285    0.006 {method 'recv_into' of '_socket.socket' objects}
      200    0.006    0.000    1.274    0.006 __init__.py:183(dumps)
      200    0.052    0.000    1.268    0.006 encoder.py:182(encode)
20100/100    0.043    0.000    1.124    0.011 utils.py:280(array_to_list_value)
   604700    0.088    0.000    1.093    0.000 encoder.py:412(_iterencode)
    20000    0.087    0.000    1.036    0.000 well_known_types.py:817(extend)
1208800/604700    0.174    0.000    1.004    0.000 encoder.py:333(_iterencode_dict)
   540000    0.324    0.000    0.948    0.000 well_known_types.py:814(append)
1203300/603300    0.476    0.000    0.830    0.000 encoder.py:277(_iterencode_list)
      100    0.001    0.000    0.682    0.007 sessions.py:426(prepare_request)
      100    0.001    0.000    0.662    0.007 models.py:307(prepare)
      100    0.001    0.000    0.620    0.006 models.py:452(prepare_body)
      200    0.001    0.000    0.618    0.003 models.py:82(_encode_params)
      200    0.001    0.000    0.616    0.003 parse.py:846(urlencode)
      200    0.001    0.000    0.614    0.003 parse.py:807(quote_plus)

,
predict function in seldon_method:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100    0.001    0.000    0.729    0.007 seldon_methods.py:11(predict)
      100    0.003    0.000    0.564    0.006 user_model.py:122(client_predict)
      100    0.002    0.000    0.560    0.006 HHMlflowModel.py:20(predict)
      100    0.000    0.000    0.409    0.004 HHMlflowModel.py:34(_predict)
      100    0.400    0.004    0.409    0.004 core.py:1207(predict)
     4500    0.178    0.000    0.179    0.000 {built-in method numpy.array}
      100    0.001    0.000    0.161    0.002 utils.py:456(extract_request_parts_json)
      100    0.061    0.001    0.121    0.001 HHMlflowModel.py:40(_data_prep_prehac)
     1800    0.002    0.000    0.045    0.000 shape_base.py:286(hstack)
     1800    0.033    0.000    0.033    0.000 {built-in method numpy.concatenate}
      100    0.000    0.000    0.028    0.000 HHMlflowModel.py:30(_input_conversion)
      100    0.001    0.000    0.028    0.000 core.py:332(__init__)
      100    0.006    0.000    0.026    0.000 core.py:461(_init_from_npy2d)
     1800    0.001    0.000    0.008    0.000 shape_base.py:335(<listcomp>)
      200    0.001    0.000    0.007    0.000 _internal.py:289(__init__)
      200    0.002    0.000    0.007    0.000 _internal.py:272(_get_void_ptr)
     3600    0.003    0.000    0.006    0.000 shape_base.py:25(atleast_1d)
      900    0.006    0.000    0.006    0.000 {method 'astype' of 'numpy.ndarray' objects}
      100    0.002    0.000    0.005    0.000 core.py:179(ctypes2numpy)
     1900    0.004    0.000    0.004    0.000 {method 'reshape' of 'numpy.ndarray' objects}
      200    0.000    0.000    0.003    0.000 numeric.py:469(asarray)
      100    0.000    0.000    0.003    0.000 core.py:1665(_validate_features)
     1100    0.003    0.000    0.003    0.000 {built-in method numpy.where}
      100    0.001    0.000    0.003    0.000 utils.py:305(construct_response_json)
      100    0.000    0.000    0.003    0.000 core.py:820(feature_names)
     1800    0.001    0.000    0.002    0.000 shape_base.py:220(_warn_for_nonsequence)
     3600    0.001    0.000    0.002    0.000 numeric.py:541(asanyarray)
      100    0.001    0.000    0.002    0.000 core.py:829(<listcomp>)
      200    0.001    0.000    0.001    0.000 _internal.py:260(__array_interface__)
      300    0.000    0.000    0.001    0.000 numeric.py:344(full_like)
      300    0.001    0.000    0.001    0.000 __init__.py:484(cast)
     1800    0.001    0.000    0.001    0.000 shape_base.py:209(_arrays_for_stack_dispatcher)
     4200    0.001    0.000    0.001    0.000 {method 'format' of 'str' objects}
      300    0.001    0.000    0.001    0.000 {built-in method numpy.arange}
     2100    0.001    0.000    0.001    0.000 {built-in method builtins.hasattr}
      100    0.001    0.000    0.001    0.000 core.py:529(__del__)
      100    0.000    0.000    0.001    0.000 numerictypes.py:365(issubdtype)
      100    0.000    0.000    0.001    0.000 type_check.py:370(nan_to_num)
      200    0.001    0.000    0.001    0.000 {method 'from_buffer' of '_ctypes.PyCArrayType' objects}
     1800    0.001    0.000    0.001    0.000 {method 'index' of 'list' objects}

GRPC:
Total time:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100    0.101    0.001    5.612    0.056 client.py:45(predict_localhost)
      100    0.096    0.001    5.509    0.055 seldon_client.py:351(microservice)
      100    0.186    0.002    5.411    0.054 seldon_client.py:677(microservice_api_grpc_seldon_message)
      100    0.001    0.000    3.935    0.039 _channel.py:556(__call__)
      100    0.001    0.000    3.934    0.039 _channel.py:538(_blocking)
      100    3.616    0.036    3.618    0.036 {method 'next_event' of 'grpc._cython.cygrpc.SegregatedCall' objects}
      100    0.240    0.002    1.281    0.013 utils.py:234(array_to_grpc_datadef)
20100/100    0.038    0.000    1.041    0.010 utils.py:280(array_to_list_value)
    20000    0.083    0.000    0.960    0.000 well_known_types.py:817(extend)
   540000    0.294    0.000    0.876    0.000 well_known_types.py:814(append)
   540000    0.266    0.000    0.415    0.000 well_known_types.py:714(_SetStructValue)
      200    0.223    0.001    0.228    0.001 _common.py:77(_transform)
      100    0.001    0.000    0.225    0.002 _channel.py:516(_prepare)
      100    0.000    0.000    0.223    0.002 _channel.py:447(_start_unary_request)
      100    0.000    0.000    0.223    0.002 _common.py:88(serialize)
   560000    0.174    0.000    0.174    0.000 {method 'add' of 'google.protobuf.pyext._message.RepeatedCompositeContainer' objects}
  1529500    0.149    0.000    0.149    0.000 {built-in method builtins.isinstance}
      100    0.080    0.001    0.081    0.001 {method 'segregated_call' of 'grpc._cython.cygrpc.Channel' objects}
    20000    0.018    0.000    0.033    0.000 well_known_types.py:842(add_list)
    20000    0.009    0.000    0.009    0.000 {method 'Clear' of 'google.protobuf.pyext._message.CMessage' objects}
      100    0.003    0.000    0.009    0.000 _channel.py:129(_handle_event)
      100    0.001    0.000    0.008    0.000 __init__.py:1741(insecure_channel)
    20000    0.008    0.000    0.008    0.000 {method 'tolist' of 'numpy.ndarray' objects}
      100    0.005    0.000    0.007    0.000 _channel.py:1009(__init__)
      100    0.000    0.000    0.005    0.000 _common.py:92(deserialize)
      100    0.005    0.000    0.005    0.000 {built-in method FromString}
    20200    0.002    0.000    0.002    0.000 {built-in method builtins.len}
      100    0.001    0.000    0.002    0.000 seldon_client.py:109(__init__)
      100    0.000    0.000    0.001    0.000 prediction_pb2_grpc.py:127(__init__)
      200    0.000    0.000    0.001    0.000 _channel.py:1032(unary_unary)
      100    0.000    0.000    0.001    0.000 threading.py:357(notify_all)
      100    0.001    0.000    0.001    0.000 _channel.py:97(__init__)
      200    0.001    0.000    0.001    0.000 threading.py:215(__init__)
      100    0.001    0.000    0.001    0.000 threading.py:334(notify)
      100    0.000    0.000    0.001    0.000 seldon_client.py:156(_gather_args)
      300    0.001    0.000    0.001    0.000 threading.py:239(__enter__)
      100    0.001    0.000    0.001    0.000 _channel.py:783(__new__)
      300    0.000    0.000    0.001    0.000 _common.py:64(encode)
      300    0.000    0.000    0.001    0.000 threading.py:242(__exit__)
      100    0.000    0.000    0.000    0.000 _channel.py:1090(__del__)

,
predict in seldon_method:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100    0.004    0.000    3.376    0.034 seldon_methods.py:11(predict)
     4600    0.662    0.000    2.796    0.001 {built-in method numpy.array}
      100    0.000    0.000    2.781    0.028 utils.py:505(extract_request_parts)
      100    0.001    0.000    2.778    0.028 utils.py:120(get_data_from_proto)
      100    0.001    0.000    2.778    0.028 utils.py:147(grpc_datadef_to_array)
  1763200    1.195    0.000    2.068    0.000 well_known_types.py:821(__getitem__)
  1702700    0.513    0.000    0.873    0.000 well_known_types.py:735(_GetStructValue)
      100    0.003    0.000    0.531    0.005 user_model.py:122(client_predict)
      100    0.002    0.000    0.528    0.005 HHMlflowModel.py:20(predict)
      100    0.000    0.000    0.383    0.004 HHMlflowModel.py:34(_predict)
      100    0.375    0.004    0.383    0.004 core.py:1207(predict)
  1703200    0.361    0.000    0.361    0.000 {method 'WhichOneof' of 'google.protobuf.pyext._message.CMessage' objects}
      100    0.058    0.001    0.115    0.001 HHMlflowModel.py:40(_data_prep_prehac)
   121000    0.053    0.000    0.064    0.000 well_known_types.py:811(__len__)
      100    0.008    0.000    0.060    0.001 utils.py:397(construct_response)
      100    0.008    0.000    0.048    0.000 utils.py:234(array_to_grpc_datadef)
     1800    0.002    0.000    0.042    0.000 shape_base.py:286(hstack)
      100    0.001    0.000    0.040    0.000 utils.py:280(array_to_list_value)
      100    0.003    0.000    0.039    0.000 well_known_types.py:817(extend)
    20000    0.012    0.000    0.036    0.000 well_known_types.py:814(append)
     1800    0.030    0.000    0.030    0.000 {built-in method numpy.concatenate}
      100    0.000    0.000    0.028    0.000 HHMlflowModel.py:30(_input_conversion)
      100    0.001    0.000    0.027    0.000 core.py:332(__init__)
      100    0.005    0.000    0.026    0.000 core.py:461(_init_from_npy2d)
    20000    0.011    0.000    0.018    0.000 well_known_types.py:714(_SetStructValue)
   125000    0.012    0.000    0.012    0.000 {built-in method builtins.len}
    61305    0.007    0.000    0.007    0.000 {built-in method builtins.isinstance}
     1800    0.001    0.000    0.007    0.000 shape_base.py:335(<listcomp>)
      200    0.000    0.000    0.007    0.000 _internal.py:289(__init__)
    20000    0.007    0.000    0.007    0.000 {method 'add' of 'google.protobuf.pyext._message.RepeatedCompositeContainer' objects}
      200    0.001    0.000    0.006    0.000 _internal.py:272(_get_void_ptr)
     3600    0.003    0.000    0.006    0.000 shape_base.py:25(atleast_1d)
      900    0.006    0.000    0.006    0.000 {method 'astype' of 'numpy.ndarray' objects}
      100    0.002    0.000    0.005    0.000 core.py:179(ctypes2numpy)
     1900    0.003    0.000    0.003    0.000 {method 'reshape' of 'numpy.ndarray' objects}
      100    0.000    0.000    0.003    0.000 core.py:1665(_validate_features)
      200    0.000    0.000    0.003    0.000 numeric.py:469(asarray)
      100    0.000    0.000    0.003    0.000 core.py:820(feature_names)
     1100    0.002    0.000    0.002    0.000 {built-in method numpy.where}
     1800    0.001    0.000    0.002    0.000 shape_base.py:220(_warn_for_nonsequence)

So (1) Parsing in REST seems fixed but in GRPC is not, and (2) There are still baseline latency from network request in both REST and GRPC. @cliveseldon

@axsaucedo
Copy link
Contributor

@lennon310 thank you for rerunning the profiler on the pull request. Good to see the rest path seems to be fixed, we should be able to merge this soon.

That is correct, the current code changes are focused purely on the REST code path.

In regards to the gRPC latency, is that still the case if you use the predict_raw function instead of the predict function?

Furthermore, when you say there is baseline latency from network, what do you mean? Is this on a kubernetes deployment? If that's the case are you referring to latency on the Seldon Engine handling the request or the network itself.

@ukclivecox
Copy link
Contributor

There should be no parsing in gRPC as the message should already be a proto buffer.

@lennon310
Copy link
Contributor Author

lennon310 commented Aug 16, 2019

Looks like test locally has higher latency. I deployed @axsaucedo branch to Kubernetes, and ran a simple load test,

Requests      [total, rate, throughput]  3000, 50.02, 50.00
Duration      [total, attack, wait]      59.999528942s, 59.977632686s, 21.896256ms
Latencies     [mean, 50, 95, 99, max]    22.409443ms, 21.706025ms, 24.474023ms, 27.775505ms, 559.943582ms
Bytes In      [total, mean]              12878977, 4292.99
Bytes Out     [total, mean]              120012000, 40004.00
Success       [ratio]                    100.00%
Status Codes  [code:count]               200:3000  

It looks good. Thanks for the help @axsaucedo and @cliveseldon !

@axsaucedo axsaucedo self-assigned this Aug 16, 2019
@axsaucedo
Copy link
Contributor

@lennon310 from the positive response I assume you meant lower latency + higher throughput?

Clive is correct, there should be no parsing in GRPC as message already comes as proto (except conversions to ndarray and back to proto for response).

We'll add a few more tests to the PR before landing. I've added it to our next release 👍

@lennon310
Copy link
Contributor Author

lennon310 commented Aug 16, 2019

@axsaucedo Yeah I mean the latency and throughput.

So am I understanding this correctly that with ndarray as input, the input data type conversion in REST is slightly higher than sending the input as proto in GRPC by probably no more than 10 ms (and this part makes GRPC faster)? Is this consistent with your team's benchmarking?

@ukclivecox
Copy link
Contributor

gRPC is faster for many reasons. The binary protocol and faster serialization/deserialization and also becuase it shares a single http2 channel. See the Intel benchmarking post above in my initial reply.

@ukclivecox
Copy link
Contributor

It would be good to add your work as a PR to the python library as part of the test infrastructure if you have time.

@lennon310
Copy link
Contributor Author

@cliveseldon I would love to. Which file(s) should I make changes?

@ukclivecox
Copy link
Contributor

I would a new entry to the Makefile (profile ?):

test:
python setup.py test
.PHONY: type_check
type_check:
mypy --ignore-missing-imports --no-strict-optional seldon_core

And add whatever files you need to run the test. Plus extend the test_require in the setup.py

@lennon310
Copy link
Contributor Author

OK, I will do that. Thanks!

@lennon310
Copy link
Contributor Author

I think we were talking about this but was still wondering for the GRPC path, even if no parsing is used, is the time spent on getting data from proto expected?

100    0.000    0.000    2.781    0.028 utils.py:505(extract_request_parts)
100    0.001    0.000    2.778    0.028 utils.py:120(get_data_from_proto)
100    0.001    0.000    2.778    0.028 utils.py:147(grpc_datadef_to_array)

I tested our service and gRPC (mean ~ 50 ms) seems to have higher latency than REST (mean ~ 25 ms) so I'm not sure if there is still something needs optimized? @cliveseldon

@axsaucedo
Copy link
Contributor

As per my other comment, the conversion is necessary as in GRPC the request is received as Proto, and needs to be converted into a numpy array before it's passed to the python wrapper. You can access the raw proto through the predict_raw function, but if you want to access the data in a readable format you will eventually have to convert it. Similarly when sending the response, it's necessary to convert the output (which is often a numpy array) into a Proto - which in turn requires that conversion.

We're happy to take suggestions but if you have a look at the functions, it's basically converting the Proto directly to numpy array. If you run some alternative implementations of methods to convert numpy arrays to/from proto objects that are more efficient, we would certainly be keen to introduce them.

axsaucedo added a commit that referenced this issue Aug 22, 2019
* Removed Proto conversion in REST /predict path
@lennon310
Copy link
Contributor Author

We just update the image version of operator and this change now applies to prepackaged server as well.

However, it seems like the transformer_input is still having the same latency:

image

I was wondering if the PR needs to apply to other code paths as well? @axsaucedo @cliveseldon

@axsaucedo
Copy link
Contributor

@lennon310 that is correct, this will be resolved with #739

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants