Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] getting wrong dimension from Triton as a response when serving a TF4Rec model with TorchScript #538

Closed
rnyak opened this issue Nov 17, 2022 · 3 comments
Assignees
Labels
bug Something isn't working P1
Milestone

Comments

@rnyak
Copy link
Contributor

rnyak commented Nov 17, 2022

Bug description

This is a blocker for this task: #491

When we serve a TF4Rec model with Torch backend we get the response in wrong dimension. Note that serving works only with masking = 'casual'. Otherwise we get errors because traced_model(torch_yoochoose_like) dimension and model(model(torch_yoochoose_like) dimensions do not match.

Please follow the steps below and run the script to reproduce the issue:

  • Run the unit test example
  • check the shape of the response via response['next-item'].shape. you will see that the number of rows it returns do not match with the number of rows of the request dataframe.

I am using merlin-pytorch:22.10 image and I pulled all the latest branches from all the libraries.

@rnyak rnyak added bug Something isn't working status/needs-triage labels Nov 17, 2022
@rnyak rnyak added this to the Merlin 22.12 milestone Nov 17, 2022
@sararb
Copy link
Contributor

sararb commented Nov 18, 2022

Thanks, @rnyak for filing the bug ticket. The issue is related to the default setting of T4Rec to training=True mode. So the model(torch_yoochoose_like) call will use masking (as we are in training mode) to generate random labels for each session in the input torch_yoochoose_like which will result in an output tensor fo shape [batch_size * N-random-labels, item_cardinality].

model(torch_yoochoose_like, training=False) is the correct way to apply the model in the inference mode and obtain one prediction per session (the next interaction). However, calling traced_model(torch_yoochoose_like, training=False) is not working as the traced torch model is expecting only one argument (the dictionary input). 

From the T4Rec side, this should be fixed by setting training=False as the default mode, but one open question remains about whether the traced model can accept additional parameters or not.

@rnyak rnyak added P0 P1 and removed P0 labels Nov 22, 2022
@sararb
Copy link
Contributor

sararb commented Nov 22, 2022

Setting the default mode of T4Rec models to inference (i.e training=False + testing=False) is implemented in #543

@sararb
Copy link
Contributor

sararb commented Nov 24, 2022

fixed by #543

@sararb sararb closed this as completed Nov 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P1
Projects
None yet
Development

No branches or pull requests

2 participants