You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'd like to serve multiple predictions from one model. I currently create multiple model files using one model and create multiple containers that serve a model from a different model file. In this method, the memory footprint of the loaded model becomes multiple of one model serving.
I came up with an idea of using metadata so I can call different methods in the predict method of a model. However, I think an endpoint serving different kinds of outputs is weird and hard to maintain the graphs.
Could someone tell me your idea or known best practice for this kind of case?
The text was updated successfully, but these errors were encountered:
Its much harder to manage SeldonDeployments if the underlying resources are shared. It certainly could be that we implement some form of sharing for model servers that allow it but I would think this is more when 2 separate SeldonDeployments happen to use the same model. This seems to be your use case.
You could create create 3 SeldonDeployments for your use case. 1 there has the core model and 2 that act as proxy servers to the core model for each external use case. Would this fir your need?
I'd like to serve multiple predictions from one model. I currently create multiple model files using one model and create multiple containers that serve a model from a different model file. In this method, the memory footprint of the loaded model becomes multiple of one model serving.
I came up with an idea of using metadata so I can call different methods in the
predict
method of a model. However, I think an endpoint serving different kinds of outputs is weird and hard to maintain the graphs.Could someone tell me your idea or known best practice for this kind of case?
The text was updated successfully, but these errors were encountered: