Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use production server by default #2047

Merged
merged 9 commits into from
Jul 7, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/python/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ You can use the following links to navigate the Python seldon-core module:
Wrap using S2I <python_wrapping_s2i.md>
Wrap using Docker <python_wrapping_docker.md>
Seldon Python Client <seldon_client.md>
Seldon Python Server <python_server.md>
Python API reference <api/modules>


98 changes: 18 additions & 80 deletions doc/source/python/python_component.md
Original file line number Diff line number Diff line change
Expand Up @@ -317,45 +317,33 @@ class UserCustomException(Exception):

```

### Gunicorn (Alpha Feature)
## Multi-value numpy arrays

To run your class under gunicorn set the environment variable `GUNICORN_WORKERS` to an integer value > 1.
By default, when using the data ndarray parameter, the conversion to ndarray (by default) converts all inner types into the same type. With models that may take as input arrays with different value types, you will be able to do so by overriding the `predict_raw` function yourself which gives you access to the raw request, and creating the numpy array as follows:

```
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: gunicorn
spec:
name: worker
predictors:
- componentSpecs:
- spec:
containers:
- image: seldonio/mock_classifier:1.0
name: classifier
env:
- name: GUNICORN_WORKERS
value: '4'
terminationGracePeriodSeconds: 1
graph:
children: []
endpoint:
type: REST
name: classifier
type: MODEL
labels:
version: v1
name: example
replicas: 1
import numpy as np

class Model:
def predict_raw(self, request):
data = request.get("data", {}).get("ndarray")
if data:
mult_types_array = np.array(data, dtype=object)

# Handle other data types as required + your logic
```

## Gunicorn and load

If the wrapped python class is run under [gunicorn](https://gunicorn.org/) then as part of initialization of each gunicorn worker a `load` method will be called on your class if it has it. You should use this method to load and initialise your model. This is important for Tensorflow models which need their session created in each worker process. The [Tensorflow MNIST example](../examples/deep_mnist.html) does this.
If the wrapped python class is [served under Gunicorn](./python_server) then as
part of initialization of each gunicorn worker a `load` method will be called
on your class if it has it.
You should use this method to load and initialise your model.
This is important for Tensorflow models which need their session created in
each worker process.
The [Tensorflow MNIST example](../examples/deep_mnist.html) does this.

```
```python
import tensorflow as tf
import numpy as np
import os
Expand Down Expand Up @@ -383,56 +371,6 @@ class DeepMnist(object):
return predictions.astype(np.float64)
```

### Single-threaded Flask for REST (experimental)

To run your class single-threaded with Flask set the environment variable `FLASK_SINGLE_THREADED` to 1. This will set the `threaded` parameter of the Flask app to `False`. It is not the optimal setup for most models, but can be useful when your model cannot be made thread-safe like many GPU-based models that deadlock when accessed from multiple threads.

```
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: flaskexample
spec:
name: worker
predictors:
- componentSpecs:
- spec:
containers:
- image: seldonio/mock_classifier:1.0
name: classifier
env:
- name: FLASK_SINGLE_THREADED
value: '1'
terminationGracePeriodSeconds: 1
graph:
children: []
endpoint:
type: REST
name: classifier
type: MODEL
labels:
version: v1
name: example
replicas: 1

```

## Multi-value numpy arrays

By default, when using the data ndarray parameter, the conversion to ndarray (by default) converts all inner types into the same type. With models that may take as input arrays with different value types, you will be able to do so by overriding the `predict_raw` function yourself which gives you access to the raw request, and creating the numpy array as follows:

```
import numpy as np

class Model:
def predict_raw(self, request):
data = request.get("data", {}).get("ndarray")
if data:
mult_types_array = np.array(data, dtype=object)

# Handle other data types as required + your logic
```

## Integer numbers

The `json` package in Python, parses numbers with no decimal part as integers.
Expand Down
168 changes: 168 additions & 0 deletions doc/source/python/python_server.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
# Seldon Python Server

To serve your component, Seldon's Python wrapper will use
[Gunicorn](https://gunicorn.org/) under the hood by default.
Gunicorn is a high-performing HTTP server for Unix which allows you to easily
scale your model across multiple worker processes and threads.

.. Note::
Gunicorn will only handle the horizontal scaling of your model **within the
same pod and container**.
To learn more about how to scale your model across multiple pod replicas see
the :doc:`../graph/scaling` section of the docs.

## Workers

By default, Seldon will only use a **single worker process**.
However, it's possible to increase this number through the `GUNICORN_WORKERS`
environment variable.
This variable can be controlled directly through the `SeldonDeployment` CRD.

For example, to run your model under 4 workers, you could do:

```yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: gunicorn
spec:
name: worker
predictors:
- componentSpecs:
- spec:
containers:
- image: seldonio/mock_classifier:1.0
name: classifier
env:
- name: GUNICORN_WORKERS
value: '4'
terminationGracePeriodSeconds: 1
graph:
children: []
endpoint:
type: REST
name: classifier
type: MODEL
labels:
version: v1
name: example
replicas: 1

```

## Threads

By default, Seldon will process your model's incoming requests using a pool of
**10 threads per worker process**.
You can increase this number through the `GUNICORN_THREADS` environment
variable.
This variable can be controlled directly through the `SeldonDeployment` CRD.

For example, to run your model with 5 threads per worker, you could do:

```yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: gunicorn
spec:
name: worker
predictors:
- componentSpecs:
- spec:
containers:
- image: seldonio/mock_classifier:1.0
name: classifier
env:
- name: GUNICORN_THREADS
value: '5'
terminationGracePeriodSeconds: 1
graph:
children: []
endpoint:
type: REST
name: classifier
type: MODEL
labels:
version: v1
name: example
replicas: 1

```

### Disable multithreading

In some cases, you may want to completely disable multithreading.
To serve your model within a single thread, set the environment variable
`FLASK_SINGLE_THREADED` to 1.
This is not the most optimal setup for most models, but can be useful when your
model cannot be made thread-safe like many GPU-based models that deadlock when
accessed from multiple threads.


```yaml
apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
name: flaskexample
spec:
name: worker
predictors:
- componentSpecs:
- spec:
containers:
- image: seldonio/mock_classifier:1.0
name: classifier
env:
- name: FLASK_SINGLE_THREADED
value: '1'
terminationGracePeriodSeconds: 1
graph:
children: []
endpoint:
type: REST
name: classifier
type: MODEL
labels:
version: v1
name: example
replicas: 1

```

## Development server

While Gunicorn is recommended for production workloads, it's also possible to
use Flask's built-in development server.
To enable the development server, you can set the `SELDON_DEBUG` variable to
`1`.

```yaml
apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: flask-development-server
spec:
name: worker
predictors:
- componentSpecs:
- spec:
containers:
- image: seldonio/mock_classifier:1.0
name: classifier
env:
- name: SELDON_DEBUG
value: '1'
terminationGracePeriodSeconds: 1
graph:
children: []
endpoint:
type: REST
name: classifier
type: MODEL
labels:
version: v1
name: example
replicas: 1

```
74 changes: 74 additions & 0 deletions python/seldon_core/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
import os
import logging

from typing import Dict, Union
from gunicorn.app.base import BaseApplication

logger = logging.getLogger(__name__)


def accesslog(log_level: str) -> Union[str, None]:
"""
Enable / disable access log in Gunicorn depending on the log level.
"""

if log_level in ["WARNING", "ERROR", "CRITICAL"]:
return None

return "-"


def threads(threads: int, single_threaded: bool) -> int:
"""
Number of threads to run in each Gunicorn worker.
"""

if single_threaded:
return 1

return threads


class StandaloneApplication(BaseApplication):
"""
Standalone Application to run a Flask app in Gunicorn.
"""

def __init__(self, app, options: Dict = None):
self.application = app
self.options = options
super().__init__()

def load_config(self):
config = dict(
[
(key, value)
for key, value in self.options.items()
if key in self.cfg.settings and value is not None
]
)
for key, value in config.items():
self.cfg.set(key.lower(), value)

def load(self):
return self.application


class UserModelApplication(StandaloneApplication):
"""
Gunicorn application to run a Flask app in Gunicorn loading first the
user's model.
"""

def __init__(self, app, user_object, options: Dict = None):
self.user_object = user_object
super().__init__(app, options)

def load(self):
logger.debug("LOADING APP %d", os.getpid())
try:
logger.debug("Calling user load method")
self.user_object.load()
except (NotImplementedError, AttributeError):
logger.debug("No load method in user model")
return self.application
Loading