-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support GPU tensors in eager mode #1873
Comments
maybe one way to implement this would be to have to store the data in the onnxscript Tensor class by using an instance of OrtValue from onnxruntime.capi.onnxruntime_inference_collection instead of a numpy array? |
I experimented with using OrtValue instances instead of numpy arrays to store the data in the Tensor class. Here are the changes I made https://github.com/martinResearch/onnxscript/pull/1/files |
FWIW, Running onnx ops via onnxscript may still be too expensive because the overhead is too great. For what you described, would the array apis be what you need? https://data-apis.org/array-api/latest/ |
Adding some references to related projects for anyone interested in this issue:
It seems that getting the eager mode based on onnxruntime runtime sessions at competitive speed with cupy would be hard to achieve because
One approach to reduce python code duplication when going from python to onnx would consist in using cupy and numpy through the array API standard interface (https://data-apis.org/array-api/latest/) and then use ndonnx or onnx-array-api to export the code to onnx without much code rewrite. Note that this would not allow to use some of the advanced onnx operators that are not in the array API. |
@justinchuby do you think there is an interest in getting the changes I made in https://github.com/martinResearch/onnxscript/pull/1/files to this repository? Although it does not allow to match cupy's speed, it still improves significatively the speed on the eager mode and adds support for gpu execution, which can potentially be helpful while debugging in case a bug appears only when executed on the GPU. If so I could submit a PR or multiple PRs. |
Thank you! I will look deeper and let you know. As a note, we would not want a tight coupling between onnxscript and onnx runtime. Onnxscript needs to work without onnx runtime. |
The eager mode is described in the doc as " mostly used to debug and check intermediate results are as expected",
however it seems it has a much greater potential than just this: with support for GPU tensors it could be use as an alternative to numpy and cupy, with having the advantages over using numpy+cupy:
Is this something that could be considered to be added in the roadmap? Any potential limitation?
The text was updated successfully, but these errors were encountered: