Skip to content

Commit

Permalink
feat(pdb): Add support for managing PDBs
Browse files Browse the repository at this point in the history
Add support for managing PDBs through the SeldonSpec.

Contributes to #2508

Signed-off-by: Nick Groszewski <[email protected]>
  • Loading branch information
groszewn committed Oct 15, 2020
1 parent dd3f71f commit 6fe3a96
Show file tree
Hide file tree
Showing 21 changed files with 1,372 additions and 71 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -328,6 +328,7 @@ Below are some of the core components together with link to the logs that provid
* [Payload Logging with ELK ](https://docs.seldon.io/projects/seldon-core/en/latest/analytics/logging.html)
* [Distributed Tracing with Jaeger ](https://docs.seldon.io/projects/seldon-core/en/latest/graph/distributed-tracing.html)
* [Replica Scaling ](https://docs.seldon.io/projects/seldon-core/en/latest/graph/scaling.html)
* [Budgeting Disruptions](https://docs.seldon.io/projects/seldon-core/en/latest/graph/disruption-budgets.html)
* [Custom Inference Servers](https://docs.seldon.io/projects/seldon-core/en/latest/servers/custom.html)

### Advanced Inference
Expand Down
2 changes: 1 addition & 1 deletion doc/source/examples/notebooks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,6 @@ MLOps: Scaling and Monitoring and Observability
CI / CD with Jenkins Classic <jenkins_classic>
CI / CD with Jenkins X <jenkins_x>
Replica control <scale>


Production Configurations and Integrations
------------------------------------------
Expand All @@ -121,6 +120,7 @@ Production Configurations and Integrations
Deploy Multiple Seldon Core Operators <multiple_operators>
Protocol Examples <protocol_examples>
Custom Protobuf Data Example <customdata_example>
Disruption Budgets Example <disruption_budgets_example>

Complex Graph Examples
----------------------
Expand Down
39 changes: 39 additions & 0 deletions doc/source/graph/disruption-budgets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Budgeting Disruptions

High availability is an important aspect in running production systems.
To this end, you can add Pod Disruption Budget Specifications to the Pod Template Specifications you create.
Depending on how you want your application to handle disruptions, you can define your disruption budget accordingly.

An example Seldon Deployment with disruption budgets defined can be seen below:

```yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: seldon-model
spec:
name: test-deployment
replicas: 2
predictors:
- componentSpecs:
- pdbSpec:
minAvailable: 90%
spec:
containers:
- image: seldonio/mock_classifier_rest:1.3
imagePullPolicy: IfNotPresent
name: classifier
resources:
requests:
cpu: '0.5'
terminationGracePeriodSeconds: 1
graph:
children: []
endpoint:
type: REST
name: classifier
type: MODEL
name: example
```
This example ensures that our serving capacity does not decrease by more than 10%.
13 changes: 1 addition & 12 deletions doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@ Documentation Index
Payload Logging with ELK <analytics/logging.md>
Distributed Tracing with Jaeger <graph/distributed-tracing.md>
Replica Scaling <graph/scaling.md>
Budgeting Disruptions <graph/disruption-budgets.md>
Custom Inference Servers <servers/custom.md>

.. toctree::
Expand Down Expand Up @@ -137,18 +138,6 @@ Documentation Index
Istio Ingress <ingress/istio.md>
OpenShift <ingress/openshift.md>

.. toctree::
:maxdepth: 1
:caption: Production

Supported API Protocols <graph/protocols.md>
CI/CD MLOps at Scale <analytics/cicd-mlops.md>
Metrics with Prometheus <analytics/analytics.md>
Payload Logging with ELK <analytics/logging.md>
Distributed Tracing with Jaeger <graph/distributed-tracing.md>
Replica Scaling <graph/scaling.md>
Custom Inference Servers <servers/custom.md>

.. toctree::
:maxdepth: 1
:caption: Streaming and Batch Processing
Expand Down
27 changes: 27 additions & 0 deletions examples/models/disruption_budgets/model_with_patched_pdb.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: seldon-model
spec:
name: test-deployment
replicas: 2
predictors:
- componentSpecs:
- pdbSpec:
maxUnavailable: 1
spec:
containers:
- image: seldonio/mock_classifier_rest:1.3
imagePullPolicy: IfNotPresent
name: classifier
resources:
requests:
cpu: '0.5'
terminationGracePeriodSeconds: 1
graph:
children: []
endpoint:
type: REST
name: classifier
type: MODEL
name: example
27 changes: 27 additions & 0 deletions examples/models/disruption_budgets/model_with_pdb.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: seldon-model
spec:
name: test-deployment
replicas: 2
predictors:
- componentSpecs:
- pdbSpec:
maxUnavailable: 2
spec:
containers:
- image: seldonio/mock_classifier_rest:1.3
imagePullPolicy: IfNotPresent
name: classifier
resources:
requests:
cpu: '0.5'
terminationGracePeriodSeconds: 1
graph:
children: []
endpoint:
type: REST
name: classifier
type: MODEL
name: example
249 changes: 249 additions & 0 deletions examples/models/disruption_budgets/pdbs_example.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,249 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Defining Disruption Budgets for Seldon Deployments"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites\n",
" \n",
"* A kubernetes cluster with kubectl configured\n",
"* pygmentize"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup Seldon Core\n",
"\n",
"Use the setup notebook to [Setup Cluster](../../../notebooks/seldon_core_setup.ipynb#Setup-Cluster) with [Ambassador Ingress](../../../notebooks/seldon_core_setup.ipynb#Ambassador) and [Install Seldon Core](../../seldon_core_setup.ipynb#Install-Seldon-Core). Instructions [also online](./seldon_core_setup.html)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!kubectl create namespace seldon"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!kubectl config set-context $(kubectl config current-context) --namespace=seldon"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create model with Pod Disruption Budget\n",
"\n",
"To create a model with a Pod Disruption Budget, it is first important to understand how you would like your application to respond to [voluntary disruptions](https://kubernetes.io/docs/concepts/workloads/pods/disruptions/#voluntary-and-involuntary-disruptions). Depending on the type of disruption budgeting your application needs, you will either define either of the following:\n",
"\n",
"* `minAvailable` which is a description of the number of pods from that set that must still be available after the eviction, even in the absence of the evicted pod. `minAvailable` can be either an absolute number or a percentage.\n",
"* `maxUnavailable` which is a description of the number of pods from that set that can be unavailable after the eviction. It can be either an absolute number or a percentage.\n",
"\n",
"The full SeldonDeployment spec is shown below."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pygmentize model_with_pdb.yaml"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!kubectl apply -f model_with_pdb.yaml"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=seldon-model -o jsonpath='{.items[0].metadata.name}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Validate Disruption Budget Configuration"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"\n",
"def getPdbConfig():\n",
" dp=!kubectl get pdb seldon-model-example-0-classifier -o json\n",
" dp=json.loads(\"\".join(dp))\n",
" return dp[\"spec\"][\"maxUnavailable\"]\n",
" \n",
"assert getPdbConfig() == 2"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!kubectl get pods,deployments,pdb"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Update Disruption Budget and Validate Change\n",
"\n",
"Next, we'll update the maximum number of unavailable pods and check that the PDB is properly updated to match."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pygmentize model_with_patched_pdb.yaml"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!kubectl apply -f model_with_patched_pdb.yaml"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!kubectl rollout status deploy/$(kubectl get deploy -l seldon-deployment-id=seldon-model -o jsonpath='{.items[0].metadata.name}')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"assert getPdbConfig() == 1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Clean Up"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!kubectl get pods,deployments,pdb"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!kubectl delete -f model_with_patched_pdb.yaml"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Loading

0 comments on commit 6fe3a96

Please sign in to comment.