-
Notifications
You must be signed in to change notification settings - Fork 835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removing proto functionality from REST /predict endpoint [#803] #806
Removing proto functionality from REST /predict endpoint [#803] #806
Conversation
The profiler result shows parsing time reduces for REST but not GRPC, is it expected? |
@lennon310 that is correct, I've replied in #803, we should be able to merge this soon |
Also I was wondering whether this change applies to the prepackaged server as well? |
@lennon310 yes, although the model servers will also be bumped a version when they are recompiled. |
Added a couple more tests. PR should be ready for review. @lennon310 could you provide a review as well given that you have been currently going through these pieces? |
@@ -397,8 +463,12 @@ def construct_response(user_model: SeldonComponent, is_request: bool, client_req | |||
raise SeldonMicroserviceException("Unknown data type returned as payload:" + client_raw_response) | |||
|
|||
|
|||
def extract_request_parts_json(request_raw: Union[Dict, List]) -> Tuple[ | |||
Union[np.ndarray, str, bytes, dict], Dict, prediction_pb2.DefaultData, str]: | |||
def extract_request_parts_json(request: Union[Dict, List] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks good to me.
Probably not related to this PR, but I was wondering why gRPC path should not go this way.
From my profiler commented in issue it shows in gRPC the input was first converted to datadef
in SeldonMessage
:
100 0.240 0.002 1.281 0.013 utils.py:234(array_to_grpc_datadef)
20100/100 0.038 0.000 1.041 0.010 utils.py:280(array_to_list_value)
then in predict
function it is converted back to Numpy array:
100 0.000 0.000 2.781 0.028 utils.py:505(extract_request_parts)
100 0.001 0.000 2.778 0.028 utils.py:120(get_data_from_proto)
100 0.001 0.000 2.778 0.028 utils.py:147(grpc_datadef_to_array)
The round-trip conversion seems double the latency I guess?
@lennon310 the conversion is necessary as in GRPC the request is received as Proto, and needs to be converted into a numpy array before it's passed to the python wrapper. You can access the raw proto through the We're happy to take suggestions but if you have a look at the functions, it's basically converting the Proto directly to numpy array. If you run some alternative implementations of methods to convert numpy arrays to/from proto objects that are more efficient, we would certainly be keen to introduce them. |
This is the implementation to fix #803.
Once this is landed there will still be work to replicate across remaining endpoints (as per #607)