Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch Processor Component #1915

Merged
Merged
Show file tree
Hide file tree
Changes from 70 commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
1c2ff45
Added initial argo example
axsaucedo Apr 17, 2020
b3a68e0
Updated file to ensure name of seldon deployment is unique
axsaucedo Apr 18, 2020
8a02187
Updated minio example
axsaucedo Apr 18, 2020
c848fe8
Updated minio example
axsaucedo Apr 18, 2020
4bbc40b
Updated minio example and added diagrams
axsaucedo Apr 18, 2020
0699b61
Added readme
axsaucedo Apr 18, 2020
400dda1
Added python gang scheduler
axsaucedo Apr 20, 2020
522e45e
Added example batch
axsaucedo Apr 21, 2020
9a0d9a2
Merge remote-tracking branch 'upstream/master' into 1391_batch_proces…
axsaucedo Apr 27, 2020
c82e28b
Updated cmd
axsaucedo Apr 27, 2020
aca71f4
Added functionality for gitignore
axsaucedo Apr 28, 2020
af822ee
Added geventhttpclient functionalitty
axsaucedo Apr 28, 2020
115a67b
Added changes to gang scheduler
axsaucedo Apr 30, 2020
8eb4bd1
Removed gang scheduler folder from separate to add as part of SC package
axsaucedo Apr 30, 2020
02df23b
REmoved input data file
axsaucedo Apr 30, 2020
30e76a8
Added functioning parallel processing with scaled workers
axsaucedo Apr 30, 2020
c6bd502
Added example seldon batch store
axsaucedo Apr 30, 2020
53109dd
Added command line to setup.py
axsaucedo Apr 30, 2020
ec8383e
ADded batch processing capabilities
axsaucedo Apr 30, 2020
d029abb
ADded readme for 2nd option
axsaucedo Apr 30, 2020
1191305
ADded host instead of endpoint
axsaucedo May 2, 2020
0d520a0
Adedd helm chart
axsaucedo May 2, 2020
2ebafa3
Added batch_processor in python
axsaucedo May 2, 2020
845937f
Added gitignore in s2i python folder
axsaucedo May 2, 2020
3bc2be3
Updated and docuemnted helm charts
axsaucedo May 2, 2020
5480795
Added updated examples with simple insights
axsaucedo May 2, 2020
7bf1235
Updated readme
axsaucedo May 2, 2020
c2dd321
Added further work
axsaucedo May 12, 2020
6967a24
Added chagnes to storagepy
axsaucedo May 18, 2020
c43d250
Updated batch process and storage to upload file
axsaucedo May 18, 2020
8a47df5
Added update to use pvc instead of param
axsaucedo Jun 3, 2020
834373e
Updated readme to use argo workflows
axsaucedo Jun 4, 2020
1f6300e
Merge remote-tracking branch 'upstream/master' into 1391_batch_proces…
axsaucedo Jun 4, 2020
97c0f30
Updated python installer
axsaucedo Jun 4, 2020
513d575
Updated chart to latest
axsaucedo Jun 4, 2020
a73d5af
Updated processor to keep uid
axsaucedo Jun 4, 2020
2f5fa56
Added changes to batch scheduler
axsaucedo Jun 5, 2020
2e0c64a
Added refactored changes for batch processor compoennt
axsaucedo Jun 5, 2020
f924862
Added further documentation
axsaucedo Jun 5, 2020
3fd9d7b
Updated helm chart to include benchmarking parmaeters
axsaucedo Jun 5, 2020
7ba99bd
Added e2e test for batch processor
axsaucedo Jun 5, 2020
fa517f6
Added testing gitigmore
axsaucedo Jun 5, 2020
9915fdf
Removed sampel files that are no longer used
axsaucedo Jun 5, 2020
9bd5707
updated storagepy to reflext expexted changes
axsaucedo Jun 5, 2020
5711352
Updated the location of the files
axsaucedo Jun 5, 2020
03df116
ADDED kubeflow pipelines example
axsaucedo Jun 5, 2020
bcda39f
Updated integration test which ensures all tests are successful!
axsaucedo Jun 5, 2020
94c2721
Updated kubeflow notebook
axsaucedo Jun 5, 2020
8f81c64
Updated testing log
axsaucedo Jun 5, 2020
d4771c7
Updated the pipeline to create seldondeployemnt with workflow id name
axsaucedo Jun 5, 2020
6588963
Added batch-wide ID to identify a specific batch and made it paramete…
axsaucedo Jun 6, 2020
797d0e9
UPdated argo workflows parameters to align with patch params2
axsaucedo Jun 6, 2020
10e62c1
Updated kubeflow pipelines notebook to current final example
axsaucedo Jun 6, 2020
9fde697
Renamed folders for argo and kubeflow to reflext batch in name
axsaucedo Jun 6, 2020
359adfd
Added nblink for batch processing in notebooks
axsaucedo Jun 6, 2020
0c403e2
Added parameters to click function and documentation so it's created …
axsaucedo Jun 6, 2020
9e62a85
Added documentation for batch processor component
axsaucedo Jun 6, 2020
4848cb2
Added image for the knative streaming
axsaucedo Jun 6, 2020
babcf09
Added reference for stream processing knative eventing
axsaucedo Jun 6, 2020
74f4b02
Updated gitignore to point to the scripts testing folder
axsaucedo Jun 6, 2020
92a9ba9
Added further performance information on batch docs
axsaucedo Jun 6, 2020
6d30a5d
Updated to use minio notebook defaults
axsaucedo Jun 8, 2020
dae4216
Added mkdir for assets notebook
axsaucedo Jun 8, 2020
bbdfaf2
Updated index for individual batch processing
axsaucedo Jun 8, 2020
e96cc20
Added newline for index
axsaucedo Jun 8, 2020
0b0e95b
Removed top leavel header
axsaucedo Jun 8, 2020
cb00200
Fixed typos
axsaucedo Jun 8, 2020
e9e4527
Added images in the docs folder so they apear in docs
axsaucedo Jun 8, 2020
cab0ee2
Updated to add argo workflows installation and service rolebaindg
axsaucedo Jun 8, 2020
98c2d5d
Added line that outlines issue with argo on kind
axsaucedo Jun 9, 2020
eac9649
Added note for fix for argo, and added command to support later argo …
axsaucedo Jun 9, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/source/examples/argo_workflows_batch.nblink
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"path": "../../../examples/batch/argo-workflows-batch/README.ipynb"
}
Binary file added doc/source/examples/assets/kubeflow-pipeline.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions doc/source/examples/kubeflow_pipelines_batch.nblink
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"path": "../../../examples/batch/kubeflow-pipelines-batch/README.ipynb"
}
9 changes: 9 additions & 0 deletions doc/source/examples/notebooks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,15 @@ Advanced Machine Learning Insights
Tabular, Text and Image Model Explainers <explainer_examples>
Outlier Detection on CIFAR10 <outlier_cifar10>

Batch Processing with Seldon Core
-----

.. toctree::
:titlesonly:
axsaucedo marked this conversation as resolved.
Show resolved Hide resolved

Batch Processing with Argo Workflows <argo_workflows_batch>
Batch Processing with Kubeflow Pipelines <kubeflow_pipelines_batch>

axsaucedo marked this conversation as resolved.
Show resolved Hide resolved

MLOps: Scaling and Monitoring and Observability
-----
Expand Down
Binary file added doc/source/images/batch-processor.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/source/images/batch-workflow-managers.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/source/images/stream-processing-knative.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
38 changes: 22 additions & 16 deletions doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,20 +64,38 @@ Documentation Index

.. toctree::
:maxdepth: 1
:caption: Language Wrappers (Production)
axsaucedo marked this conversation as resolved.
Show resolved Hide resolved
:caption: Production

Supported API Protocols <graph/protocols.md>
CI/CD MLOps at Scale <analytics/cicd-mlops.md>
Metrics with Prometheus <analytics/analytics.md>
Payload Logging with ELK <analytics/logging.md>
Distributed Tracing with Jaeger <graph/distributed-tracing.md>
Replica Scaling <graph/scaling.md>
Custom Inference Servers <servers/custom.md>

.. toctree::
:maxdepth: 1
:caption: Batch Processing with Seldon

Overview of Batch Processing <servers/batch.md>

.. toctree::
:maxdepth: 1
:caption: Language Wrappers

Python Language Wrapper [Production] <python/index.rst>
Python Language Wrapper <python/index.rst>

.. toctree::
:maxdepth: 1
:caption: Incubating Projects

Java Language Wrapper [Incubating] <java/README.md>
Java Language Wrapper <java/README.md>
Metadata <reference/apis/metadata.md>
R Language Wrapper [ALPHA] <R/README.md>
NodeJS Language Wrapper [ALPHA] <nodejs/README.md>
Go Language Wrapper [ALPHA] <go/go_wrapper_link.rst>
Stream Processing with KNative <streaming/knative_eventing.md>
Metadata [Incubating] <reference/apis/metadata.md>

.. toctree::
:maxdepth: 1
Expand All @@ -86,18 +104,6 @@ Documentation Index
Ambassador Ingress <ingress/ambassador.md>
Istio Ingress <ingress/istio.md>

.. toctree::
:maxdepth: 1
:caption: Production

Supported API Protocols <graph/protocols.md>
CI/CD MLOps at Scale <analytics/cicd-mlops.md>
Metrics with Prometheus <analytics/analytics.md>
Payload Logging with ELK <analytics/logging.md>
Distributed Tracing with Jaeger <graph/distributed-tracing.md>
Replica Scaling <graph/scaling.md>
Custom Inference Servers <servers/custom.md>

.. toctree::
:maxdepth: 1
:caption: Advanced Inference
Expand Down
8 changes: 8 additions & 0 deletions doc/source/python/api/seldon_core.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,14 @@ seldon\_core.api\_tester module
:undoc-members:
:show-inheritance:

seldon\_core.batch\_processor module
------------------------------------

.. automodule:: seldon_core.batch_processor
:members:
:undoc-members:
:show-inheritance:

seldon\_core.flask\_utils module
--------------------------------

Expand Down
149 changes: 149 additions & 0 deletions doc/source/servers/batch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
# Batch Processing with Seldon Core

Seldon Core provides a command line component that allows for highly parallelizable batch processing with the horizontally scalable seldon core kubernetes model deployments.

For stream processing with Seldon Core please see [Stream Processing with KNative Eventing](../streaming/knative_eventing.md).

axsaucedo marked this conversation as resolved.
Show resolved Hide resolved
![](../images/batch-processor.jpg)

## Horizontally Scalable Workers and Replicas

The parallelizable batch processor worker allows for high throughput as it is able to leverage the Seldon Core horizontal scaling replicas as well as autoscaling, and hence providing flexibility to the user to optimize their configuration as required.

axsaucedo marked this conversation as resolved.
Show resolved Hide resolved
The diagram below shows a standard workflow where data can be downloaded and then uploaded through an object store, and the Seldon model can be created and deleted when the job finishes successfully.

![](../images/batch-workflow-manager-integration.jpg)

## Integration with ETL & Workflow Managers

The Seldon Batch component has been built to be modular and flexible such that it can be integrated across any workflow managers.

This allows you to leverage Seldon on a large number of batch applications, including triggers that have to take place on a scheduled basis (e.g. once a day, once a month, etc), or jobs that can be triggered programmatically.

axsaucedo marked this conversation as resolved.
Show resolved Hide resolved
![](../images/batch-workflow-managers.jpg)

## Hands on Examples

We have provided a set of examples that show you how you can use the Seldon batch processing component:

* [Batch Processing with Argo Workflows](../examples/argo_workflows_batch.html)
* [Batch Processing with Kubeflow Pipelines Example](../examples/kubeflow_pipelines_batch.html)

## High Level Implementation Details

### CLI Parameters

To get more insights on each of the commands available you can interact with the batch processor component as follows:

```bash
$ seldon-batch-processor --help

Usage: seldon-batch-processor [OPTIONS]

Command line interface for Seldon Batch Processor, which can be used to
send requests through configurable parallel workers to Seldon Core models.
It is recommended that the respective Seldon Core model is also optimized
with number of replicas to distribute and scale out the batch processing
work. The processor is able to process data from local filestore input
file in various formats supported by the SeldonClient module. It is also
suggested to use the batch processor component integrated with an ETL
Workflow Manager such as Kubeflow, Argo Pipelines, Airflow, etc. which
would allow for extra setup / teardown steps such as downloading the data
from object store or starting a seldon core model with replicas. See the
Seldon Core examples folder for implementations of this batch module with
Seldon Core.

Options:
-d, --deployment-name TEXT The name of the SeldonDeployment to send the
requests to [required]

-g, --gateway-type [ambassador|istio|seldon]
The gateway type for the seldon model, which
can be through the ingress provider
(istio/ambassador) or directly through the
service (seldon)

-n, --namespace TEXT The Kubernetes namespace where the
SeldonDeployment is deployed in

-h, --host TEXT The hostname for the seldon model to send
the request to, which can be the ingress of
the Seldon model or the service itself

-t, --transport [rest|grpc] The transport type of the SeldonDeployment
model which can be REST or GRPC

-a, --data-type [data|json|str]
Whether to use json, strData or Seldon Data
type for the payload to send to the
SeldonDeployment which aligns with the
SeldonClient format

-p, --payload-type [ndarray|tensor|tftensor]
The payload type expected by the
SeldonDeployment and hence the expected
format for the data in the input file which
can be an array

-w, --workers INTEGER The number of parallel request processor
workers to run for parallel processing

-r, --retries INTEGER The number of retries for each request
before marking an error

-i, --input-data-path PATH The local filestore path where the input
file with the data to process is located

-o, --output-data-path PATH The local filestore path where the output
file should be written with the outputs of
the batch processing

-m, --method [predict] The method of the SeldonDeployment to send
the request to which currently only supports
the predict method

-l, --log-level [debug|info|warning|error]
The log level for the batch processor
-b, --benchmark If true the batch processor will print the
elapsed time taken to run the process

-u, --batch-id TEXT Unique batch ID to identify all datapoints
processed in this batch, if not provided is
auto generated

--help Show this message and exit.

```

### Identifiers

Each data point that is sent to the Seldon Core model contains the following identifiers in the request metadata:
* Batch ID - A unique identifier which can be provided through CLI or is automatically generated
* Batch Instance ID - A generated unique identifier for each datapoint processed
* Batch Index - The local ordered descending index for the datapoint relative to the input file location

These identifiers are added on each request as follows:

```
seldon_request = {
<data>: <current_batch_instance>,
"meta": {
"tags": {
"batch_id": <BATCH_ID>
"batch_instance_id": <BATCH_INSTANCE_ID>
"batch_index": <BATCH_INDEX>
}
}
}
```

This allows the requests to be identified and matched against the initial request in the data.

### Performance

The implementation of the module is done leveraging Python's Threading system.

Benchmarking was carried out using vanilla Python requests module to assess performance of Threading vs Twisted vs AsyncIO. The results showed better performance with Asyncio, however given that the logic in the worker is quite minimal (ie sending a request) and most of the time is waiting for the response, the implementation with Python's native threading was able to perform at speeds that were efficient enough to very easily scale to thousands of workers.

axsaucedo marked this conversation as resolved.
Show resolved Hide resolved
However currently the implementation uses the Seldon Client which does not leverage quite a few optimization requirements to increase the performance of processing, such as re-using a requests.py session. However even without these optimisations the worker will still reach a highly concurrent performance, and these optimizations will be introduced as adoption of this component (and feedback) grows.

2 changes: 2 additions & 0 deletions doc/source/streaming/knative_eventing.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ Seldon has an integration with KNative eventing that allows for real time proces

This allow Seldon Core users to connect SeldonDeployments through triggers that will receive any relevant Cloudevents.

![](../images/stream-processing-knative.jpg)

## Triggers

The way that KNative Eventing works is by creating triggers that send any relevant Cloudevents that match a specific setup into the relevant addressable location.
Expand Down
2 changes: 2 additions & 0 deletions examples/batch/argo-workflows-batch/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
assets/input-data.txt
assets/output-data.txt
Loading