Replies: 1 comment
-
CC: @jbkyang-nvi |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The Python client API is fairly low-level, and requires that the user (i.e, developer invoking the client API) include a lot of boilerplate to invoke the inference remote call. Consider this client example for doing inference on an image, which is over 400 lines long.
While it's valuable to have access to the low level API for building the RPC request objects directly, it would be nice to have a higher level API that handles many simpler cases.
I wrote a wrapper API that shows one way this could be done. Using this API, performing a classification on an image model looks like this, just a few lines:
In this example, the
ImageInput
preprocesses the user's input (which is a PILImage
object), andClassificationOutput
postprocesses Triton's output into an object with attributes likescore
andclass_name
. The user doesn't need to cargo cult this code from Nvidia's examples (which at the time of writing is inconsistent across example scripts, by the way).I don't contend that this is the ideal higher level API, but I think it shows the potential of a simpler way to make inference calls without a lot of boilerplate. The whole implementation is not much more than the example code file that I linked earlier, but is easier to reuse and extend, and nicely separates the user's application logic from the fiddly parts of making inference requests.
This is just a sketch and shouldn't be used. I did add support for multiple inputs and outputs. I didn't implement the HTML API, streaming, or async. I didn't test on many different models.
Beta Was this translation helpful? Give feedback.
All reactions