Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing deployment part on TensorRt #2

Open
pommedeterresautee opened this issue Mar 22, 2021 · 3 comments
Open

Missing deployment part on TensorRt #2

pommedeterresautee opened this issue Mar 22, 2021 · 3 comments
Labels
question Further information is requested

Comments

@pommedeterresautee
Copy link

❓ Questions and Help

You make reference in the paper and on Huggingface to a tensorRt deployment but I can't find the code.
Do you plan to share it too?

As far as I know the nvidia repo has only examples for their own models (all bert based), it's a bit hard to try it on our own without an example.

@pommedeterresautee pommedeterresautee added the question Further information is requested label Mar 22, 2021
@kssteven418
Copy link
Owner

We did not opensource our code for TensorRT deployment. We are planning to deploy our model using TVM which I think is a more suitable framework for an opensource project, but cannot be sure of the exact date.

@pommedeterresautee
Copy link
Author

Thank you @kssteven418 for your answer.

Don't know about the open source thing... most of us are already using cudnn stuff, nvidia drivers, etc. so far Nvidia GPU implies some non open source parts. Plus having both a script for TVM and another for tensorRT would be interesting to benchmark (strangely there is very few independent measures on big LM + TensorRT / TVM / ORT). To finish, I was looking for resources on how to use tensorRT with huggingface models and listened that virtual conf https://events.nvidia.com/meettheexperts5?ncid=so-twit-46587-vt04, at the very end their engineer Mr Boudier explained that they were not gonna share their optimizations to run models on tensorRT, it was reserved to the huggingface cloud clients... (at least they are transparent about their intent). So any indication on that part may help the community to make progress on the inference side.

That said, I understand your view, TVM is a great project, well run, with a community first approach, so it makes sense to push that project that is not enough well known (IMO) in the NLP community (compared to ORT for instance).

Anyway, if possible, I would really appreciate any guideline to run in a performant way your model on a GPU :-) (even if no code is provided)

@bdalal
Copy link

bdalal commented Apr 6, 2021

@kssteven418 I've also been trying to export the model to ONNX (from pytorch) for deploying on TRT. It seems like it needs a custom operator for the SymmetricQuantFunction and possibly the other layers too. Are you able to share your custom operators?

I also agree with @pommedeterresautee 's point on benchmarking the differences, so it'd be fantastic if you were able to share the deployment code or custom onnx ops for TRT.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants