How triton decide what value to return within a BaseModelOutputWithPoolingAndCrossAttentions #5985

sfc-gh-zhwang · 2023-06-23T17:50:45Z

sfc-gh-zhwang
Jun 23, 2023

I am using triton inference server to do inference for a Bert model to do embeddings. Below is the config.pbtxt.
If I load the model locally and inference, the model would return a BaseModelOutputWithPoolingAndCrossAttentions with a last_hidden_state and pooler_output. While the triton server only returns last_hidden_state as embedding, while pooler_output is dropped. Can I know how this is decided in triton?

name: "vector_embedding"
platform: "pytorch_libtorch"
input[
    {
        name: "input_ids"
        data_type: TYPE_INT32
        dims: [1, -1]
    },
    {
        name: "attention_mask"
        data_type: TYPE_INT32
        dims: [1, -1]
    }
]
output {
    name: "embedding"
    data_type: TYPE_FP32
    dims: [1, -1, -1]
}

Answered by dyastremsky

Jun 27, 2023

Please see here: https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#special-conventions-for-pytorch-backend

It looks like because the naming did not follow one of the conventions, Triton assumes the input and output order matches that provided in your model. Your config only has one output, so it looks to be selecting the first output returned in your model, which I'd guess is last_hidden_state.

View full answer

dyastremsky · 2023-06-27T19:20:04Z

dyastremsky
Jun 27, 2023
Collaborator

Please see here: https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#special-conventions-for-pytorch-backend

It looks like because the naming did not follow one of the conventions, Triton assumes the input and output order matches that provided in your model. Your config only has one output, so it looks to be selecting the first output returned in your model, which I'd guess is last_hidden_state.

0 replies

zengqingfu1442 · 2024-07-04T07:08:41Z

zengqingfu1442
Jul 4, 2024

m

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How triton decide what value to return within a BaseModelOutputWithPoolingAndCrossAttentions #5985

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

How triton decide what value to return within a BaseModelOutputWithPoolingAndCrossAttentions #5985

sfc-gh-zhwang Jun 23, 2023

Replies: 2 comments

dyastremsky Jun 27, 2023 Collaborator

zengqingfu1442 Jul 4, 2024

sfc-gh-zhwang
Jun 23, 2023

dyastremsky
Jun 27, 2023
Collaborator

zengqingfu1442
Jul 4, 2024