Skip to content

Commit

Permalink
Logger params (SeldonIO#3738)
Browse files Browse the repository at this point in the history
* Allow request logger work queue parameters in Helm chart

* add upgrading doc

* Add annotations

* Run helm-docs
  • Loading branch information
ukclivecox authored Nov 16, 2021
1 parent 9c48b96 commit bf230e0
Show file tree
Hide file tree
Showing 16 changed files with 245 additions and 65 deletions.
8 changes: 6 additions & 2 deletions doc/source/graph/annotations.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,15 @@ You can configure aspects of Seldon Core via annotations in the SeldonDeployment
### Service Orchestrator

* ```seldon.io/engine-separate-pod``` : Use a separate pod for the service orchestrator
* Locations : SeldonDeployment.spec.annotations
* Locations : SeldonDeployment.metadata.annotations, SeldonDeployment.spec.annotations
* [Separate svc-orc pod example](model_svcorch_sep.md)
* ```seldon.io/headless-svc``` : Run main endpoint as headless kubernetes service. This is required for gRPC load balancing via Ambassador.
* Locations : SeldonDeployment.spec.annotations
* Locations : SeldonDeployment.metadata.annotations, SeldonDeployment.spec.annotations
* [gRPC headless example](grpc_load_balancing_ambassador.md)
* ```seldon.io/executor-logger-queue-size``` : Size of request logging worker queue
* Locations: SeldonDeployment.metadata.annotations, SeldonDeployment.spec.annotations
* ```seldon.io/executor-logger-write-timeout-ms``` : Write timeout for adding to logging work queue
* Locations: SeldonDeployment.metadata.annotations, SeldonDeployment.spec.annotations


### Misc
Expand Down
12 changes: 12 additions & 0 deletions doc/source/reference/upgrading.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,18 @@ Seldon Core adds support for Kubernetes 1.22 by upgrading all ValidatingWebhookC
* Access required to modify files in the local folder are required so the application folder should be writable
* The default base image now changes the owner of the /microservice folder to user 8888

### Updated executor request logger settings

The request logging from the executor now has a configurable queue size and write timeout. This will allow a tradeoff between pending request memory usage and failing requests when sending to various logging endpoints that may be slow. The write timeout will mean logging of requests will fail if waiting for more than the given time to be added to the work queue. The two settings are:

* `executor.requestLogger.workQueueSize` (default 10000)
* `executor.requestLogger.writeTimeoutMs` (default 2000)

It is also possible to update these values on a per SeldonDeployment basis with the annotations:

* `seldon.io/executor-logger-queue-size`
* `seldon.io/executor-logger-write-timeout-ms`

## Upgrading to 1.11

### Python S2I Wrapper
Expand Down
4 changes: 2 additions & 2 deletions executor/cmd/executor/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -71,8 +71,8 @@ var (
filename = flag.String("file", "", "Load graph from file")
hostname = flag.String("hostname", "", "The hostname of the running server")
logWorkers = flag.Int("logger_workers", 10, "Number of workers handling payload logging")
logWorkBufferSize = flag.Int("log_work_buffer_size", 10000, "Limit of buffered logs in memory while waiting for downstream request ingestion")
logWriteTimeoutMs = flag.Int("log_write_timeout_ms", 2000, "Timeout before giving up writing log if buffer is full. If <= 0 will immediately drop log on full log buffer.")
logWorkBufferSize = flag.Int("log_work_buffer_size", loghandler.DefaultWorkQueueSize, "Limit of buffered logs in memory while waiting for downstream request ingestion")
logWriteTimeoutMs = flag.Int("log_write_timeout_ms", loghandler.DefaultWriteTimeoutMilliseconds, "Timeout before giving up writing log if buffer is full. If <= 0 will immediately drop log on full log buffer.")
prometheusPath = flag.String("prometheus_path", "/metrics", "The prometheus metrics path")
kafkaBroker = flag.String("kafka_broker", "", "The kafka broker as host:port")
kafkaTopicIn = flag.String("kafka_input_topic", "", "The kafka input topic")
Expand Down
4 changes: 2 additions & 2 deletions helm-charts/seldon-abtest/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,10 @@ helm install $MY_MODEL_NAME seldonio/seldon-abtest --namespace $MODELS_NAMESPACE
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| modela.image.name | string | `"seldonio/mock_classifier"` | |
| modela.image.version | string | `"1.9.0"` | |
| modela.image.version | string | `"1.12.0-dev"` | |
| modela.name | string | `"classifier-1"` | |
| modelb.image.name | string | `"seldonio/mock_classifier"` | |
| modelb.image.version | string | `"1.9.0"` | |
| modelb.image.version | string | `"1.12.0-dev"` | |
| modelb.name | string | `"classifier-2"` | |
| predictor.name | string | `"default"` | |
| replicas | int | `1` | |
Expand Down
68 changes: 66 additions & 2 deletions helm-charts/seldon-benchmark-workflow/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,67 @@
# Seldon Batch Workflow
# seldon-benchmark-workflow

This chart creates a batch workflow which leverages the seldon batch processor functionality.
![Version: 0.1](https://img.shields.io/static/v1?label=Version&message=0.1&color=informational&style=flat-square)

Seldon Benchmark Workflow

## Usage

To use this chart, you will first need to add the `seldonio` Helm repo:

```bash
helm repo add seldonio https://storage.googleapis.com/seldon-charts
helm repo update
```

Once that's done, you should then be able to use the inference graph template as:

```bash
helm template $MY_MODEL_NAME seldonio/seldon-benchmark-workflow --namespace $MODELS_NAMESPACE
```

Note that you can also deploy the inference graph directly to your cluster
using:

```bash
helm install $MY_MODEL_NAME seldonio/seldon-benchmark-workflow --namespace $MODELS_NAMESPACE
```

## Source Code

* <https://github.com/SeldonIO/seldon-core>

## Values

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| benchmark.concurrency | int | `1` | |
| benchmark.cpu | int | `4` | |
| benchmark.data | string | `"{\"data\": {\"ndarray\": [[0,1,2,3]]}}"` | |
| benchmark.duration | string | `"30s"` | |
| benchmark.grpcDataOverride | string | `nil` | |
| benchmark.grpcImage | string | `"seldonio/ghz:v0.95.0"` | |
| benchmark.host | string | `"istio-ingressgateway.istio-system.svc.cluster.local:80"` | |
| benchmark.rate | int | `0` | |
| benchmark.restImage | string | `"peterevans/vegeta:latest-vegeta12.8.4"` | |
| seldonDeployment.apiType | string | `"rest"` | |
| seldonDeployment.disableOrchestrator | bool | `false` | |
| seldonDeployment.enableResources | string | `"false"` | |
| seldonDeployment.image | string | `nil` | |
| seldonDeployment.limits.cpu | string | `"50m"` | |
| seldonDeployment.limits.memory | string | `"1000Mi"` | |
| seldonDeployment.modelName | string | `"classifier"` | |
| seldonDeployment.modelUri | string | `nil` | |
| seldonDeployment.name | string | `"seldon-{{workflow.uid}}"` | |
| seldonDeployment.protocol | string | `"seldon"` | |
| seldonDeployment.replicas | int | `2` | |
| seldonDeployment.requests.cpu | string | `"50m"` | |
| seldonDeployment.requests.memory | string | `"100Mi"` | |
| seldonDeployment.server | string | `nil` | |
| seldonDeployment.serverThreads | int | `1` | |
| seldonDeployment.serverWorkers | int | `4` | |
| seldonDeployment.waitTime | int | `5` | |
| workflow.name | string | `"seldon-benchmark-process"` | |
| workflow.namespace | string | `"default"` | |
| workflow.parallelism | int | `1` | |
| workflow.paramDelimiter | string | `"|"` | |
| workflow.useNameAsGenerateName | string | `"false"` | |
2 changes: 1 addition & 1 deletion helm-charts/seldon-core-analytics/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# seldon-core-analytics

![Version: 1.9.0](https://img.shields.io/static/v1?label=Version&message=1.9.0&color=informational&style=flat-square)
![Version: 1.12.0-dev](https://img.shields.io/static/v1?label=Version&message=1.12.0--dev&color=informational&style=flat-square)

Prometheus and Grafana installation with a basic Grafana dashboard showing
the default Prometheus metrics exposed by Seldon for each inference graph
Expand Down
36 changes: 20 additions & 16 deletions helm-charts/seldon-core-operator/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# seldon-core-operator

![Version: 1.9.0](https://img.shields.io/static/v1?label=Version&message=1.9.0&color=informational&style=flat-square)
![Version: 1.12.0-dev](https://img.shields.io/static/v1?label=Version&message=1.12.0--dev&color=informational&style=flat-square)

Seldon Core CRD and controller helm chart for Kubernetes.

Expand Down Expand Up @@ -34,10 +34,10 @@ helm install seldon-core-operator seldonio/seldon-core-operator --namespace seld
| ambassador.singleNamespace | bool | `false` | |
| certManager.enabled | bool | `false` | |
| controllerId | string | `""` | |
| crd.annotations | object | `{}` | |
| crd.create | bool | `true` | |
| crd.forceV1 | bool | `false` | |
| crd.forceV1beta1 | bool | `false` | |
| crd.annotations | map | `{}` | Annotations to add to the CRD |
| credentials.gcs.gcsCredentialFileName | string | `"gcloud-application-credentials.json"` | |
| credentials.s3.s3AccessKeyIDName | string | `"awsAccessKeyID"` | |
| credentials.s3.s3SecretAccessKeyName | string | `"awsSecretAccessKey"` | |
Expand All @@ -46,7 +46,7 @@ helm install seldon-core-operator seldonio/seldon-core-operator --namespace seld
| engine.image.pullPolicy | string | `"IfNotPresent"` | |
| engine.image.registry | string | `"docker.io"` | |
| engine.image.repository | string | `"seldonio/engine"` | |
| engine.image.tag | string | `"1.9.0"` | |
| engine.image.tag | string | `"1.12.0-dev"` | |
| engine.logMessagesExternally | bool | `false` | |
| engine.port | int | `8000` | |
| engine.prometheus.path | string | `"/prometheus"` | |
Expand All @@ -59,58 +59,62 @@ helm install seldon-core-operator seldonio/seldon-core-operator --namespace seld
| executor.image.pullPolicy | string | `"IfNotPresent"` | |
| executor.image.registry | string | `"docker.io"` | |
| executor.image.repository | string | `"seldonio/seldon-core-executor"` | |
| executor.image.tag | string | `"1.9.0"` | |
| executor.image.tag | string | `"1.12.0-dev"` | |
| executor.metricsPortName | string | `"metrics"` | |
| executor.port | int | `8000` | |
| executor.prometheus.path | string | `"/prometheus"` | |
| executor.requestLogger.defaultEndpoint | string | `"http://default-broker"` | |
| executor.requestLogger.workQueueSize | int | `10000` | |
| executor.requestLogger.writeTimeoutMs | int | `2000` | |
| executor.resources.cpuLimit | string | `"500m"` | |
| executor.resources.cpuRequest | string | `"500m"` | |
| executor.resources.memoryLimit | string | `"512Mi"` | |
| executor.resources.memoryRequest | string | `"512Mi"` | |
| executor.serviceAccount.name | string | `"default"` | |
| executor.user | int | `8888` | |
| explainer.image | string | `"seldonio/alibiexplainer:1.9.0"` | |
| explainer.image | string | `"seldonio/alibiexplainer:1.12.0-dev"` | |
| image.pullPolicy | string | `"IfNotPresent"` | |
| image.registry | string | `"docker.io"` | |
| image.repository | string | `"seldonio/seldon-core-operator"` | |
| image.tag | string | `"1.9.0"` | |
| image.tag | string | `"1.12.0-dev"` | |
| istio.enabled | bool | `false` | |
| istio.gateway | string | `"istio-system/seldon-gateway"` | |
| istio.tlsMode | string | `""` | |
| keda.enabled | bool | `false` | |
| kubeflow | bool | `false` | |
| manager.annotations | object | `{}` | |
| manager.cpuLimit | string | `"500m"` | |
| manager.cpuRequest | string | `"100m"` | |
| manager.logLevel | string | `"INFO"` | |
| manager.leaderElectionID | string | `"a33bd623.machinelearning.seldon.io"` | |
| manager.logLevel | string | `"INFO"` | |
| manager.memoryLimit | string | `"300Mi"` | |
| manager.memoryRequest | string | `"200Mi"` | |
| manager.annotations | map | `{}` | Annotations to add to the deployment template spec |
| managerCreateResources | bool | `false` | |
| managerUserID | int | `8888` | |
| namespaceOverride | string | `""` | |
| predictiveUnit.defaultEnvSecretRefName | string | `""` | |
| predictiveUnit.grpcPort | int | `9500` | |
| predictiveUnit.httpPort | int | `9000` | |
| predictiveUnit.metricsPortName | string | `"metrics"` | |
| predictor_servers.MLFLOW_SERVER.protocols.seldon.defaultImageVersion | string | `"1.9.0"` | |
| predictor_servers.MLFLOW_SERVER.protocols.kfserving.defaultImageVersion | string | `"0.5.0"` | |
| predictor_servers.MLFLOW_SERVER.protocols.kfserving.image | string | `"seldonio/mlserver"` | |
| predictor_servers.MLFLOW_SERVER.protocols.seldon.defaultImageVersion | string | `"1.12.0-dev"` | |
| predictor_servers.MLFLOW_SERVER.protocols.seldon.image | string | `"seldonio/mlflowserver"` | |
| predictor_servers.SKLEARN_SERVER.protocols.kfserving.defaultImageVersion | string | `"0.3.2"` | |
| predictor_servers.SKLEARN_SERVER.protocols.kfserving.defaultImageVersion | string | `"0.5.0"` | |
| predictor_servers.SKLEARN_SERVER.protocols.kfserving.image | string | `"seldonio/mlserver"` | |
| predictor_servers.SKLEARN_SERVER.protocols.seldon.defaultImageVersion | string | `"1.9.0"` | |
| predictor_servers.SKLEARN_SERVER.protocols.seldon.defaultImageVersion | string | `"1.12.0-dev"` | |
| predictor_servers.SKLEARN_SERVER.protocols.seldon.image | string | `"seldonio/sklearnserver"` | |
| predictor_servers.TEMPO_SERVER.protocols.kfserving.defaultImageVersion | string | `"0.3.2"` | |
| predictor_servers.TEMPO_SERVER.protocols.kfserving.defaultImageVersion | string | `"0.5.0"` | |
| predictor_servers.TEMPO_SERVER.protocols.kfserving.image | string | `"seldonio/mlserver"` | |
| predictor_servers.TENSORFLOW_SERVER.protocols.seldon.defaultImageVersion | string | `"1.9.0"` | |
| predictor_servers.TENSORFLOW_SERVER.protocols.seldon.defaultImageVersion | string | `"1.12.0-dev"` | |
| predictor_servers.TENSORFLOW_SERVER.protocols.seldon.image | string | `"seldonio/tfserving-proxy"` | |
| predictor_servers.TENSORFLOW_SERVER.protocols.tensorflow.defaultImageVersion | string | `"2.1.0"` | |
| predictor_servers.TENSORFLOW_SERVER.protocols.tensorflow.image | string | `"tensorflow/serving"` | |
| predictor_servers.TRITON_SERVER.protocols.kfserving.defaultImageVersion | string | `"21.08-py3"` | |
| predictor_servers.TRITON_SERVER.protocols.kfserving.image | string | `"nvcr.io/nvidia/tritonserver"` | |
| predictor_servers.XGBOOST_SERVER.protocols.kfserving.defaultImageVersion | string | `"0.3.2"` | |
| predictor_servers.XGBOOST_SERVER.protocols.kfserving.defaultImageVersion | string | `"0.5.0"` | |
| predictor_servers.XGBOOST_SERVER.protocols.kfserving.image | string | `"seldonio/mlserver"` | |
| predictor_servers.XGBOOST_SERVER.protocols.seldon.defaultImageVersion | string | `"1.9.0"` | |
| predictor_servers.XGBOOST_SERVER.protocols.seldon.defaultImageVersion | string | `"1.12.0-dev"` | |
| predictor_servers.XGBOOST_SERVER.protocols.seldon.image | string | `"seldonio/xgboostserver"` | |
| rbac.configmap.create | bool | `true` | |
| rbac.create | bool | `true` | |
Expand All @@ -119,7 +123,7 @@ helm install seldon-core-operator seldonio/seldon-core-operator --namespace seld
| singleNamespace | bool | `false` | |
| storageInitializer.cpuLimit | string | `"1"` | |
| storageInitializer.cpuRequest | string | `"100m"` | |
| storageInitializer.image | string | `"seldonio/rclone-storage-initializer:1.9.0"` | |
| storageInitializer.image | string | `"seldonio/rclone-storage-initializer:1.12.0-dev"` | |
| storageInitializer.memoryLimit | string | `"1Gi"` | |
| storageInitializer.memoryRequest | string | `"100Mi"` | |
| usageMetrics.enabled | bool | `false` | |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,10 @@ spec:
value: '{{ .Values.executor.metricsPortName }}'
- name: EXECUTOR_REQUEST_LOGGER_DEFAULT_ENDPOINT
value: '{{ .Values.executor.requestLogger.defaultEndpoint }}'
- name: EXECUTOR_REQUEST_LOGGER_WORK_QUEUE_SIZE
value: '{{ .Values.executor.requestLogger.workQueueSize }}'
- name: EXECUTOR_REQUEST_LOGGER_WRITE_TIMEOUT_MS
value: '{{ .Values.executor.requestLogger.writeTimeoutMs }}'
- name: DEFAULT_USER_ID
value: '{{ .Values.defaultUserID }}'
- name: EXECUTOR_DEFAULT_CPU_REQUEST
Expand Down
2 changes: 2 additions & 0 deletions helm-charts/seldon-core-operator/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,8 @@ executor:
# For more information see the Production Integration for Payload Request Logging with ELK in the docs
requestLogger:
defaultEndpoint: 'http://default-broker'
workQueueSize: 10000
writeTimeoutMs: 2000

# ## Seldon Core Controller Manager Options
image:
Expand Down
8 changes: 4 additions & 4 deletions helm-charts/seldon-mab/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,17 +47,17 @@ helm install $MY_MODEL_NAME seldonio/seldon-mab --namespace $MODELS_NAMESPACE
| mab.branches | int | `2` | |
| mab.epsilon | float | `0.2` | |
| mab.image.name | string | `"seldonio/mab_epsilon_greedy"` | |
| mab.image.version | string | `"1.9.0"` | |
| mab.image.version | string | `"1.12.0-dev"` | |
| mab.name | string | `"eg-router"` | |
| mab.verbose | int | `1` | |
| modela.image.name | string | `"seldonio/mock_classifier"` | |
| modela.image.version | string | `"1.9.0"` | |
| modela.image.version | string | `"1.12.0-dev"` | |
| modela.name | string | `"classifier-1"` | |
| modelb.image.name | string | `"seldonio/mock_classifier"` | |
| modelb.image.version | string | `"1.9.0"` | |
| modelb.image.version | string | `"1.12.0-dev"` | |
| modelb.name | string | `"classifier-2"` | |
| predictor.name | string | `"default"` | |
| predictorLabels.fluentd | string | `"true"` | |
| predictorLabels.version | string | `"1.9.0"` | |
| predictorLabels.version | string | `"1.12.0-dev"` | |
| replicas | int | `1` | |
| sdepLabels.app | string | `"seldon"` | |
Original file line number Diff line number Diff line change
Expand Up @@ -66,12 +66,14 @@ const (
ENV_SELDON_DEPLOYMENT_ID = "SELDON_DEPLOYMENT_ID"
ENV_SELDON_EXECUTOR_ENABLED = "SELDON_EXECUTOR_ENABLED"

ANNOTATION_JAVA_OPTS = "seldon.io/engine-java-opts"
ANNOTATION_SEPARATE_ENGINE = "seldon.io/engine-separate-pod"
ANNOTATION_HEADLESS_SVC = "seldon.io/headless-svc"
ANNOTATION_NO_ENGINE = "seldon.io/no-engine"
ANNOTATION_CUSTOM_SVC_NAME = "seldon.io/svc-name"
ANNOTATION_EXECUTOR = "seldon.io/executor"
ANNOTATION_JAVA_OPTS = "seldon.io/engine-java-opts"
ANNOTATION_SEPARATE_ENGINE = "seldon.io/engine-separate-pod"
ANNOTATION_HEADLESS_SVC = "seldon.io/headless-svc"
ANNOTATION_NO_ENGINE = "seldon.io/no-engine"
ANNOTATION_CUSTOM_SVC_NAME = "seldon.io/svc-name"
ANNOTATION_EXECUTOR = "seldon.io/executor"
ANNOTATION_LOGGER_WORK_QUEUE_SIZE = "seldon.io/executor-logger-queue-size"
ANNOTATION_LOGGER_WRITE_TIMEOUT_MS = "seldon.io/executor-logger-write-timeout-ms"

DeploymentNamePrefix = "seldon"
)
Expand Down
4 changes: 4 additions & 0 deletions operator/config/manager/manager.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,10 @@ spec:
value: "metrics"
- name: EXECUTOR_REQUEST_LOGGER_DEFAULT_ENDPOINT
value: "http://default-broker"
- name: EXECUTOR_REQUEST_LOGGER_WORK_QUEUE_SIZE
value: "10000"
- name: EXECUTOR_REQUEST_LOGGER_WRITE_TIMEOUT_MS
value: "2000"
- name: DEFAULT_USER_ID
value: ''
- name: EXECUTOR_DEFAULT_CPU_REQUEST
Expand Down
18 changes: 10 additions & 8 deletions operator/constants/constants.go
Original file line number Diff line number Diff line change
Expand Up @@ -79,12 +79,14 @@ const (

// Default resources
const (
DefaultExecutorCpuRequest = "0.5"
DefaultExecutorCpuLimit = "0.5"
DefaultExecutorMemoryRequest = "512Mi"
DefaultExecutorMemoryLimit = "512Mi"
DefaultEngineCpuRequest = "0.5"
DefaultEngineCpuLimit = "0.5"
DefaultEngineMemoryRequest = "512Mi"
DefaultEngineMemoryLimit = "512Mi"
DefaultExecutorCpuRequest = "0.5"
DefaultExecutorCpuLimit = "0.5"
DefaultExecutorMemoryRequest = "512Mi"
DefaultExecutorMemoryLimit = "512Mi"
DefaultEngineCpuRequest = "0.5"
DefaultEngineCpuLimit = "0.5"
DefaultEngineMemoryRequest = "512Mi"
DefaultEngineMemoryLimit = "512Mi"
DefaultExecutorReqLoggerWorkQueueSize = "10000"
DefaultExecutorReqLoggerWriteTimeoutMs = "2000"
)
Loading

0 comments on commit bf230e0

Please sign in to comment.