-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(csi): support multiple model registries #508
feat(csi): support multiple model registries #508
Conversation
57cd0ea
to
648f4a0
Compare
55ae816
to
5ad3b2b
Compare
Signed-off-by: Alessio Pragliola <[email protected]>
399abc3
to
1f3cb5d
Compare
Signed-off-by: Alessio Pragliola <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Alessio Pragliola <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this PR I have modified the code to allow users to specify a different MR url in the InferenceService to have support for multiple model registries, again this change does not break the current behavior because the url from the env var MODEL_REGISTRY_BASE_URL is used as a fallback.
I really like this approach and the fact that it is backward compatible by falling back to the default modelregistry.
I left just one comment that I think should be sorted out.
// Possible URIs: | ||
// (1) model-registry://{modelName} | ||
// (2) model-registry://{modelName}/{modelVersion} | ||
// (3) model-registry://{modelRegistryUrl}/{modelName} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not completely sure this is actually supported, because if you have just 2 tokens (after the trim) it will return the default apiClient (i.e., using the default env model registry).
I think here the problem would be, how do you know whether the user is providing option 2 or 3?
I tried running make SOURCE_URI=model-registry://model-registry-url/model DEST_PATH=./ run
and in fact model-registry-url
is interpreted as being the registered model, which is actually not what the user was trying to do.
I think that we have two options here:
- If the
model-registry-url
is provided, users cannot omit theversion
- We use a different delimiter for the
model-registry-url
, e.g.,model-registry://{modelRegistryUrl}:{modelName}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I approached this problem can be found here:
We try to reach (token 0) as a model registry and on failure we assume that it is a model name and not a valid mr url (falling back to the default url from env var), it's not perfect but I think it might be a fair compromise
by doing this in the function from this comment:
// Check if the first token is the host and remove it so that we reduce cases (3) and (4) to (1) and (2)
if len(tokens) >= 2 && p.Client.GetConfig().Host == tokens[0] {
tokens = tokens[1:]
}
- case (1) stays the same
- case (2) stays the same
- case (3) by removing token[0] becomes (1)
- case (4) by removing token[0] becomes (2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry missed that, I tried and I can confirm that if the "host" is reachable it works as expected:
$ make SOURCE_URI=model-registry://localhost:8080/mymodel DEST_PATH=./ run [17:59:56]
"/usr/bin/go" fmt ./...
"/usr/bin/go" vet ./...
"/usr/bin/go" run ./main.go model-registry://localhost:8080/mymodel ./
2024/10/24 18:00:02 Initializing, args: src_uri [model-registry://localhost:8080/mymodel] dest_path[ [./]
2024/10/24 18:00:02 Download model indexed in model registry: modelName=, storageUri=model-registry://localhost:8080/mymodel, modelDir=./
2024/10/24 18:00:02 Fetching model: registeredModelName=mymodel, versionName=<nil>
2024/10/24 18:00:02 404 Not Found
exit status 1
make: *** [Makefile:59: run] Error 1
With a model registry running at localhost:8080
.
My main concern with this assumption is that, if there is a model registry running but for any reason it is not accessible/reachable we are going to make the wrong assumption that it is a registeredModel name , right? And the error would be misleading to the user as it will find in the logs.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep that's true, the only hint that user have from the logs is this:
log.Printf("Falling back to base url %s for model registry service", cfg.Host)
We can improve this message telling the user that it failed to reach url from uri
and it's going to use fallback url
I wanted this to be as retrocompatible as possible, otherwise there are many alternatives, like a query parameter approach:
model-registry://url?modelName=x&modelVersion=y
model-registry://?modelName=x&modelVersion=y
or the other delimiter you mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment below for line 129 my2c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw forgot to mention; options 3-4 also consistent with standard URLs and also KServe examples
- regex: "https://(.+?).blob.core.windows.net/(.+)"
- regex: "https://(.+?).file.core.windows.net/(.+)"
much appreciated @Al-Pragliola
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
with couple of comments; lmk your view?
csi/test/kind_config.yaml
Outdated
apiVersion: kind.x-k8s.io/v1alpha4 | ||
kind: Cluster | ||
nodes: | ||
- role: control-plane | ||
- role: worker | ||
- role: worker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've missed why this Kind context is needed/ introduced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I forgot to mention it in the PR description, when adding more scenarios in the e2e tests there are now 4 inference services with the associated deployments/pods etcetera and the scheduler in the CI complained with errors like Warning FailedScheduling 1s default-scheduler 0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes available: 1 No preemption victims found for incoming pod..
, so I had to add workers to the kind test cluster. Now that I think about it, I can also clean up inferenceservices after each scenario, wdyt @tarilabs ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I think about it, I can also clean up inferenceservices after each scenario, wdyt @tarilabs ?
I believe this option is preferable in general, so to have a cleanup after each run, but it's nice to know in case one day if we want a more "extended" testings! my2c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// Check if the first token is the host and remove it so that we reduce cases (3) and (4) to (1) and (2) | ||
if len(tokens) >= 2 && p.Client.GetConfig().Host == tokens[0] { | ||
tokens = tokens[1:] | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern with this assumption is that, if there is a model registry running but for any reason it is not accessible/reachable we are going to make the wrong assumption that it is a registeredModel name , right? And the error would be misleading to the user as it will find in the logs.
We can improve this message telling the user that it failed to reach url from uri and it's going to use fallback url
Here (line 129) you have done all the storageURI parsing to determine in which case you are.
To me, add a Log info here that shows ~circa
Parsed storageUri=... as: modelRegistryUrl=... modelName=... modelVersion=...
This way, we're being explicit on how we interpret what we have received based on the documentation provided. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you @tarilabs @Al-Pragliola , I think that providing a more meaningful and explicit log message would be enough for this use case!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
log.Printf("Parsed storageUri=%s as: modelRegistryUrl=%s, registeredModelName=%s, versionName=%v",
storageUri,
p.Client.GetConfig().Host,
registeredModelName,
versionName,
)
// Possible URIs: | ||
// (1) model-registry://{modelName} | ||
// (2) model-registry://{modelName}/{modelVersion} | ||
// (3) model-registry://{modelRegistryUrl}/{modelName} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment below for line 129 my2c
…arsing Signed-off-by: Alessio Pragliola <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @Al-Pragliola 🚀
/lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comment then I think it's good to go; thanks a lot for keeping tabs on this
csi/test/kind_config.yaml
Outdated
- role: worker | ||
- role: worker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this file left if one day we want to add feature gates, etc?
Because to my understanding, the way this file is right now, is a default setting by looking up in the doc. If confirmed, I'd suggest making a comment in this file (so we don't have one day to wonder "why is this config file necessary") or just leave the 2 workers even if the test scenarios can run efficiently without requiring 2 actual nodes. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Third option, I forgot to remove it after successfully testing the e2e tests only with the control plane 😞 🙏🏼, I just removed the file in the last commit, I think it just brings confusion and adds nothing, if in the future we might need to add workers we'll recreate it at that time.
Signed-off-by: Alessio Pragliola <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot @Al-Pragliola
and thanks to @lampajr for chiming in
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lampajr, tarilabs The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Description
As it stands, CSI is designed around the idea that there is only one model registry on the cluster and each model's metadata is registered here. In this PR I have modified the code to allow users to specify a different MR url in the InferenceService to have support for multiple model registries, again this change does not break the current behavior because the url from the env var
MODEL_REGISTRY_BASE_URL
is used as a fallback.Here's an example of an InferenceService using the new option:
I have also added more scenarios to the e2e tests to check if every supported URI combination is working correctly:
How Has This Been Tested?
Merge criteria:
DCO
check)