-
Notifications
You must be signed in to change notification settings - Fork 834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support dynamic graph execution #1419
Comments
Sounds interesting. Can you expand a bit on how you see this working? Would this allow models to be running in multiple shared graphs? |
I feel like this would be more of an additional service that leverages I'm envisioning a registry of all deployed For statically defined graphs, a dependency map that tracks enough information between graphs to ensure the exact same model service is truly intended to be used (runtime arguments, environment variables, resource requests/limits, mounted volumes, etc.) would become unwieldy pretty quickly. I would likely see the option for models to be running in multiple shared graphs as a feature of runtime-defined graphs only. |
This can be solved via Tempo |
Currently, the inference graph must be statically defined within the
SeldonDeployment
. We have multiple models that would be reused across multiple different inference graphs, leading to an increase in resource usage (since each graph spins up its own underlying model pods). This also means we need to deploy a new inference graph to our cluster for any slightly modified graph that our consumers may need.It would be great to be able to dynamically define the inference graph at request-time as opposed to deploy-time to decrease both the amount of resources used and production deployments needed. Some sort of model registry within the cluster could potentially be a way to discover what model services are available for use.
The text was updated successfully, but these errors were encountered: