-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ML fault tolerance #3803
Add ML fault tolerance #3803
Conversation
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Need to update the PR with the following information:
|
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks a lot!
Signed-off-by: Naarcha-AWS <[email protected]>
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]>
_ml-commons-plugin/api.md
Outdated
|
||
If you want to reserve the memory of other ML nodes within your cluster, you can load your model into a specific node(s) by specifying the `node_ids` in the request body: | ||
If you want to reserve the memory of other ML nodes within your cluster, you can deploy your model into a specific node(s) by specifying the `node_ids` in the request body: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to reserve the memory of other ML nodes within your cluster, you can deploy your model into a specific node(s) by specifying the `node_ids` in the request body: | |
If you want to reserve the memory of other ML nodes within your cluster, you can deploy your model to a specific node(s) by specifying the `node_ids` in the request body: |
Signed-off-by: Naarcha-AWS <[email protected]>
_ml-commons-plugin/api.md
Outdated
} | ||
} | ||
} | ||
``` | ||
|
||
### Response: Unload all models from specific nodes | ||
### Response: Undeploy all models from specific nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Response: Undeploy all models from specific nodes | |
### Response: Undeploying all models from specific nodes |
_ml-commons-plugin/api.md
Outdated
|
||
```json | ||
{ | ||
"model_ids": ["KDo2ZYQB-v9VEDwdjkZ4"] | ||
} | ||
``` | ||
|
||
### Response: Unload specific models from all nodes | ||
### Response: Undeploy specific models from all nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Response: Undeploy specific models from all nodes | |
### Response: Undeploying specific models from all nodes |
Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
* Add ML fault tolerance Signed-off-by: Naarcha-AWS <[email protected]> * Rework Profile API sentence Signed-off-by: Naarcha-AWS <[email protected]> * Fix link Signed-off-by: Naarcha-AWS <[email protected]> * Add review feedback Signed-off-by: Naarcha-AWS <[email protected]> * Add technical feedback for ML. Change API names Signed-off-by: Naarcha-AWS <[email protected]> * Add final ML node setting Signed-off-by: Naarcha-AWS <[email protected]> * Add more technical feedback Signed-off-by: Naarcha-AWS <[email protected]> * Apply suggestions from code review Co-authored-by: Chris Moore <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]> * Update cluster-settings.md Signed-off-by: Naarcha-AWS <[email protected]> * Update _ml-commons-plugin/api.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]> * Update _ml-commons-plugin/api.md Signed-off-by: Naarcha-AWS <[email protected]> * Apply suggestions from code review Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]> * Update _ml-commons-plugin/api.md Signed-off-by: Naarcha-AWS <[email protected]> * Update _ml-commons-plugin/api.md Signed-off-by: Naarcha-AWS <[email protected]> * Update _ml-commons-plugin/api.md Signed-off-by: Naarcha-AWS <[email protected]> * Update api.md Signed-off-by: Naarcha-AWS <[email protected]> --------- Signed-off-by: Naarcha-AWS <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]> Co-authored-by: Chris Moore <[email protected]> Co-authored-by: Nathan Bower <[email protected]>
* Add ML fault tolerance Signed-off-by: Naarcha-AWS <[email protected]> * Rework Profile API sentence Signed-off-by: Naarcha-AWS <[email protected]> * Fix link Signed-off-by: Naarcha-AWS <[email protected]> * Add review feedback Signed-off-by: Naarcha-AWS <[email protected]> * Add technical feedback for ML. Change API names Signed-off-by: Naarcha-AWS <[email protected]> * Add final ML node setting Signed-off-by: Naarcha-AWS <[email protected]> * Add more technical feedback Signed-off-by: Naarcha-AWS <[email protected]> * Apply suggestions from code review Co-authored-by: Chris Moore <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]> * Update cluster-settings.md Signed-off-by: Naarcha-AWS <[email protected]> * Update _ml-commons-plugin/api.md Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]> * Update _ml-commons-plugin/api.md Signed-off-by: Naarcha-AWS <[email protected]> * Apply suggestions from code review Co-authored-by: Nathan Bower <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]> * Update _ml-commons-plugin/api.md Signed-off-by: Naarcha-AWS <[email protected]> * Update _ml-commons-plugin/api.md Signed-off-by: Naarcha-AWS <[email protected]> * Update _ml-commons-plugin/api.md Signed-off-by: Naarcha-AWS <[email protected]> * Update api.md Signed-off-by: Naarcha-AWS <[email protected]> --------- Signed-off-by: Naarcha-AWS <[email protected]> Signed-off-by: Naarcha-AWS <[email protected]> Co-authored-by: Chris Moore <[email protected]> Co-authored-by: Nathan Bower <[email protected]>
Fixes #2654
Checklist
For more information on following Developer Certificate of Origin and signing off your commits, please check here.