[BUG] getting wrong dimension from Triton as a response when serving a TF4Rec model with TorchScript #538

rnyak · 2022-11-17T22:12:35Z

Bug description

This is a blocker for this task: #491

When we serve a TF4Rec model with Torch backend we get the response in wrong dimension. Note that serving works only with masking = 'casual'. Otherwise we get errors because traced_model(torch_yoochoose_like) dimension and model(model(torch_yoochoose_like) dimensions do not match.

Please follow the steps below and run the script to reproduce the issue:

Run the unit test example
check the shape of the response via response['next-item'].shape. you will see that the number of rows it returns do not match with the number of rows of the request dataframe.

I am using merlin-pytorch:22.10 image and I pulled all the latest branches from all the libraries.

The text was updated successfully, but these errors were encountered:

sararb · 2022-11-18T13:13:27Z

Thanks, @rnyak for filing the bug ticket. The issue is related to the default setting of T4Rec to training=True mode. So the model(torch_yoochoose_like) call will use masking (as we are in training mode) to generate random labels for each session in the input torch_yoochoose_like which will result in an output tensor fo shape [batch_size * N-random-labels, item_cardinality].

model(torch_yoochoose_like, training=False) is the correct way to apply the model in the inference mode and obtain one prediction per session (the next interaction). However, calling traced_model(torch_yoochoose_like, training=False) is not working as the traced torch model is expecting only one argument (the dictionary input).

From the T4Rec side, this should be fixed by setting training=False as the default mode, but one open question remains about whether the traced model can accept additional parameters or not.

sararb · 2022-11-22T18:42:07Z

Setting the default mode of T4Rec models to inference (i.e training=False + testing=False) is implemented in #543

sararb · 2022-11-24T16:59:29Z

fixed by #543

rnyak added bug Something isn't working status/needs-triage labels Nov 17, 2022

rnyak assigned sararb Nov 17, 2022

rnyak added this to the Merlin 22.12 milestone Nov 17, 2022

rnyak added P0 P1 and removed P0 labels Nov 22, 2022

sararb removed the status/needs-triage label Nov 24, 2022

sararb closed this as completed Nov 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] getting wrong dimension from Triton as a response when serving a TF4Rec model with TorchScript #538

[BUG] getting wrong dimension from Triton as a response when serving a TF4Rec model with TorchScript #538

rnyak commented Nov 17, 2022

sararb commented Nov 18, 2022 •

edited

Loading

sararb commented Nov 22, 2022

sararb commented Nov 24, 2022

[BUG] getting wrong dimension from Triton as a response when serving a TF4Rec model with TorchScript #538

[BUG] getting wrong dimension from Triton as a response when serving a TF4Rec model with TorchScript #538

Comments

rnyak commented Nov 17, 2022

Bug description

sararb commented Nov 18, 2022 • edited Loading

sararb commented Nov 22, 2022

sararb commented Nov 24, 2022

sararb commented Nov 18, 2022 •

edited

Loading