How triton decide what value to return within a BaseModelOutputWithPoolingAndCrossAttentions #5985
-
I am using triton inference server to do inference for a Bert model to do embeddings. Below is the config.pbtxt.
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Please see here: https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#special-conventions-for-pytorch-backend It looks like because the naming did not follow one of the conventions, Triton assumes the input and output order matches that provided in your model. Your config only has one output, so it looks to be selecting the first output returned in your model, which I'd guess is |
Beta Was this translation helpful? Give feedback.
Please see here: https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#special-conventions-for-pytorch-backend
It looks like because the naming did not follow one of the conventions, Triton assumes the input and output order matches that provided in your model. Your config only has one output, so it looks to be selecting the first output returned in your model, which I'd guess is
last_hidden_state
.